Archive for March, 2008

Why do you do it that way? (or, design rationale)

Thursday, March 20th, 2008

Yesterday I was explaining to some librarians how Evergreen relates Items (the actual physical barcoded material that circulates) to Volumes (where the Call Number lives) to Bib Records (which contains the MARC), and one person was curious and asked, “why do you do it that way?” The short answer is because it’s good design, but the question momentarily threw me for a loop because the implication is that they’re used to systems which do not do it that way.

Imagine that the structure used for items in the database for your ILS is laid out something like this:

Item Table
———-
Internal ID
Creation Date
Barcode
Call Number
…other fields

In some automation systems, that Call Number field will be a free-text label. If you want to change the Call Number for an Item, then that field will necessarily get changed. If you want to change the Call Number for a large grouping of Items, then you will have to change the field for all those items (sometimes one at a time).

In Evergreen, that Call Number field will actually be a foreign key to another table in the database, one that looks something like this:

Volume
——-
Internal ID
Owning Library
Call Number Label
…other fields

This allows a Call Number to be shared amongst a group of items, and be modified with a single edit.

This is also an example of database normalization.

Why is this important? It has to do with redundancy of information (and I’m not talking about backup storage, RAID’s, etc).

Human beings love redundancy when it comes to communication; our meanings get emphasized by our body language, and the syntax and structure of our languages, both spoken and written, encode information in multiple ways. But redundancy opens the door to discrepancy, and discrepancy leads to ambiguity. Human beings can deal with ambiguity, but computers (and most software developers, and maybe catalogers) can’t abide it. “Time flies like an arrow, but fruit flies like a banana”.

A computer might resolve ambiguity in a non-obvious way, or worse, it may resolve ambiguity in a manner that coincidentally matches a human’s expectation, only to cause trouble later when you start getting data anomalies.

Normalization is a best practice design technique for designing relational database schemas, though there are cases when you want to make a trade-off and de-normalize certain data structures. But those cases are usually optimizations that should occur after you already have a good design as your starting point.

Relational databases are powerful enough that they have sunk into the consciousness of librarians, to the extent that they put down “Must be built on a relational database” in their RFP’s. But most legacy ILS products actually use hierarchal databases, which are good for some things, but not for others. The design decisions made when you go with a hierarchal database are very different than the ones you make when you start with a relational database, and I worry about the legacy products that have “tacked on” relational databases for buzzword and RFP-compliance. Legacy automation may now have relational databases, but are they actually using them as a relational database? Some are, but I know of at least one that isn’t.

Let’s return to Evergreen for a moment. Because Call Number Volumes are represented as their own entities, you have the option of thinking of them differently. For example, you can move a set of items to a different “call number”, one which may be associated with a different Bib Record altogether (or you could move a volume itself to a new bib record). You could even change which library “owns” a volume, and suddenly all the items attached to that volume have a new owner. And you’re able to place a hold on a specific “volume”, in addition to title-level and copy-level holds (and in Evergreen, meta-record holds across editions and formats).

But you can also do what you may already be used to, and change the Call Number for an item (or a batch of items) from an Item Editing interface, and not have to know about the more flexible structures that are being manipulated underneath.

Because we already have Volumes, we’re also in a better place for adding Serials.

There are fewer widespread repercussions to changing an end-user interface than there are with changing your database schema, so that’s why it’s important to have a good database design from the beginning, so that your interfaces have more options. To steal a sentiment from Mr. Miyagi, Evergreen has strong root.

– Jason

Congrats to not one, but two other Open Source ILSs

Tuesday, March 18th, 2008

First I’d like to welcome NewGenLib to the virtual family of FOSS ILSs. In truth, we’ve known about them for a while and have been looking at their serials interfaces during our ACQ/SER design, but now that eIFL is covering them, well… ;) It’s great to see another entrant, and one that has already found an itch to scratch. I’m sure cross-pollination is in the stars as they seem to have an interesting system.

Next up, a pair of kudos to Koha.

Over the past weekend they added, at a mailing list member’s request, a call number browser inspired by Evergreen’s, which we call Shelf Browse. In Evergreen, because it supports a hierarchical organization of libraries, you can actually browse an entire system or even consortium as one huge virtual shelf! It’s a very nifty feature, and one that we know the PINES patrons have been making good use of (to the tune of 66,965 and counting so far this year, and about 300,000 times in 2007) since Evergreen launched in September of 2006. Now Koha will have a similar feature at the request of a small church Library! This, my friends, is Open Source at work.

By way of evidence from our users, I’ll mention that Evergreen provides call number / shelf browse as a “Quick Search” from the advanced search interface, which is useful to Evergreen users and may be useful for Koha patrons as well. In any case, good work.

I also noticed that Koha has incorporated, as of November of last year according to their source repository’s timestamps, the SIP2 code that David Fiander and Bill Erickson wrote for Evergreen. We’re glad to see the code that GPLS funded is going to good use in and inspiring other projects!

Three and a half years ago, when I first joined PINES and the Evergreen team, there was a dream and a small test server. Now we’ve written more than a quarter of a million lines of code, and that code runs the day-to-day operations of one state-wide library consortium (biggest in the world, he bragged ;) ) with at least two more in the works, and is helping to build a province-wide consortium in Canada — and let’s not forget the Laurentian/McMaster/Windsor “Unholy Trinity.” These are amazing, and they fill my heart with a satisfaction that is difficult to describe, but none of those things, even as possibilities, are why I signed up. I joined this effort because I believe in Open Source software. I believe whole-heartedly that it is a force for positive change in an industry I love, and fits perfectly with the mission of libraries.

Again, congrats to both NewGenLib and Koha, and let’s keep the cross-pollination going.

–miker

Laundry List

Saturday, March 15th, 2008

It’s been a while, but here’s just some of what I’ve personally been doing during this latest stretch of radio silence:

  • In-database circulation permit calculation — ~5ms instead of ~100ms for arbitrarily complex circ rules
  • In-database hold permit calculation — ~5ms, down from about 50ms
  • In-database relevance rank adjustments — configure relative weights and have them take effect immediately
  • Native SRU interface (full CQL bibliographic searching context set) with Z39.50 based on Simple2ZOOM
  • Rewritten search code — no more stalls on overly common search terms, and much faster in general
  • Per-object permissions
  • I18N infrastructure for in-database strings (Library names, etc)
  • Infrastructure required for in-database record ingest (coming this summer)
  • Acquisitions and serials data modeling
  • Exposing many preexisting back-end features to the OPAC (advanced query syntax, resort and limit to available after search, etc)
  • Work on some other Evergreen-based ideas

And yes, like everybody else, I’ve added Google Book Search to the Evergreen catalog.

This doesn’t even scratch the surface of the development that’s been going on recently. ACQ is coming along, the OPAC is getting friendlier, patron self-service options are expanding, full I18N is now just a matter of string removal (and we have both French-Canadian and simplified Chinese translations) … the list is far too long for me to remember on a cloudy Saturday morning.

–miker

“We want to emulate the PINES experience”

Friday, March 7th, 2008

One of the interesting aspects of working with Evergreen is the phone calls from large consortia or state libraries wishing to start large resource sharing networks like PINES. I believe there was a latent demand by the library community–particularly from library users–for ILS software capable of managing large networks that has now been met by Evergreen.

While Evergreen is happy running on small libraries, it does have a unique niche in large consortia because of its distributed database architecture, the OpenSRF backend, and its robustness. This structure allows small libraries to run Evergreen with one server and large consortia to run Evergreen by adding more servers.

A few days ago I reported on the highest circulations so far in PINES for a day (96,000), an hour (11.300) and a minute (548). What was not mentioned is that these transactions occurred while there were one thousand terminals logged into the database being used by library staff who were checking out those materials, cataloging books, and doing other activities that changed the database. There are 275 library outlets spread across a state using Evergreen and this network circulates about 19 million items a year.

The underlying breakthrough is not only the ability of the database to scale but also the ability to handle a high level of changes from many sources to the database–the heart of so much in the consortium. No other ILS software, proprietary or open source can currently handle this kind of large consortium so gracefully.

In database speak, data silos are relatively small and separate databases that talk to each other with difficulty. Silos are the bane of analysts who have to pierce the spread-out databases to get a coherent picture of, say, the state of a company when each department has its own data in formats that do not integrate with those of other departments. It is similar in the library world: many small libraries that communicate only with difficulty. Any citizen of Georgia can get a PINES card and we know that library patrons are bypassing non-PINES libraries in order to get access to PINES at member libraries. By those actions, they are able to use the materials in a large, virtual library. When library users have a choice, they will break down library silos. Welcome to a long tail world.

Now, the politics is a bit awkward because there are librarians who either do not see the handwriting on the wall or prefer business as usual. In the age of Google, library users have been educated to expect better than that from their libraries.

It is curious that it took leadership from a state library and an open source community to create software with Evergreen’s capabilities and that it never came from legacy vendors.

Bob Molyneux

Code4lib 2008 (Mini) Roundup

Monday, March 3rd, 2008

I’ve been to my share of library conferences over the last couple of years. They all have something to offer in their own way: networking, schmoozing, wacky vendor displays, swag, etc. It’s been my experience, though, that as a geek/developer, I’m often a little disappointed at the content — it’s just not geared toward someone like me. This is why I love Code4lib. The conference is densely packed with interesting technical presentations. They are the kind of talks that not only inspire you, they give you something concrete to take back home. In addition to the presentations, you are surrounded by an interesting gang of library technologists, who, I get the sense, all have tricks up their sleeves. It makes for a fulfilling week, to say the least.

Some highlights

My week started with the Evergreen pre-conference, skillfully lead by Dan Scott. Dan blogged about it here. Apart from some technical woes, a lot was seen and (hopefully) learned by all. We performed the install, imported bib and holdings data, implemented new functionality (new OpenSRF method and OPAC UI) to email a user their password if they have forgotten it, and took a quick look at some of the staff client interfaces.

Conference day 1 began with Brewster Kahle’s keynote address on his work with archive.org and openlibrary.org. Brewster’s vision and chutzpah is refreshing and inspiring. The goal of “One wiki page per book” sounds so simple (and obvious) when inserted into a slide show, but the implementation will require tremendous work and resources. I applaud their efforts, not only in principle, but because I think it will lead to better information and resource sharing in the long run.

After lunch, Winona Salesky and Michael Park did a joint presentation called “XForms for Metadata Creation”, which included two different MODS editors developed with XForms. I liked this talk in particular for two reasons. For starters, the demos I saw were that of a simple (yet powerful) bibliographic data creation interface, which I see as potentially useful for any ILS. Additionally, the use of XForms, which I only have a basic understanding of, gave me a chance to see some new (for me) technology in action. My interest is piqued.

I skipped the breakout sessions that day to put together a (5 minute) lightning talk on Pylons. One of the interesting challenges of lightning talks is the use of a shared PC and what amounts to a moratorium on web access for the sake of efficiency. My presentation was a set of images and screenshots, the first a crane on a sunny Portland day, which seemed apropos of the topic. I discussed one particular aspect of Pylons, which allows you to easily plug in custom pre and post-processing middleware applications for your web apps. I demonstrated a simple XML validator and a highlighting plugin which bolded pre-defined terms in the HTML on its way to the client. These gave me the chance to dig a little deeper into Pylons, an architecture which I think has a lot of interesting potential.

Wednesday morning I learned a lot about RDA. Later, I thought the discussion on of the ILS Discovery Interface Task Force segued well into Ross Singer’s lightning talk on Jangle, which, as Ross pointed out, could be used to implement the proposed standardized ILS interface.

I was not able to attend the last day, so I missed a chunk of the conference, including Dan’s CouchDB talk. Glad it went well, Dan!

Wow, now I realize why most folks blog about conferences as they are happening ;) It’s a lot to digest, especially for a conference as dense as code4lib. There were a lot of great presentations, breakout sessions, and lightning talks this year and this post only comments on a few of them. Next year’s conference will be held in Providence, RI, and if you can, I would recommend checking it out.

-bill