Why do you do it that way? (or, design rationale)


Yesterday I was explaining to some librarians how Evergreen relates Items (the actual physical barcoded material that circulates) to Volumes (where the Call Number lives) to Bib Records (which contains the MARC), and one person was curious and asked, “why do you do it that way?” The short answer is because it’s good design, but the question momentarily threw me for a loop because the implication is that they’re used to systems which do not do it that way.

Imagine that the structure used for items in the database for your ILS is laid out something like this:

Item Table
———-
Internal ID
Creation Date
Barcode
Call Number
…other fields

In some automation systems, that Call Number field will be a free-text label. If you want to change the Call Number for an Item, then that field will necessarily get changed. If you want to change the Call Number for a large grouping of Items, then you will have to change the field for all those items (sometimes one at a time).

In Evergreen, that Call Number field will actually be a foreign key to another table in the database, one that looks something like this:

Volume
——-
Internal ID
Owning Library
Call Number Label
…other fields

This allows a Call Number to be shared amongst a group of items, and be modified with a single edit.

This is also an example of database normalization.

Why is this important? It has to do with redundancy of information (and I’m not talking about backup storage, RAID’s, etc).

Human beings love redundancy when it comes to communication; our meanings get emphasized by our body language, and the syntax and structure of our languages, both spoken and written, encode information in multiple ways. But redundancy opens the door to discrepancy, and discrepancy leads to ambiguity. Human beings can deal with ambiguity, but computers (and most software developers, and maybe catalogers) can’t abide it. “Time flies like an arrow, but fruit flies like a banana”.

A computer might resolve ambiguity in a non-obvious way, or worse, it may resolve ambiguity in a manner that coincidentally matches a human’s expectation, only to cause trouble later when you start getting data anomalies.

Normalization is a best practice design technique for designing relational database schemas, though there are cases when you want to make a trade-off and de-normalize certain data structures. But those cases are usually optimizations that should occur after you already have a good design as your starting point.

Relational databases are powerful enough that they have sunk into the consciousness of librarians, to the extent that they put down “Must be built on a relational database” in their RFP’s. But most legacy ILS products actually use hierarchal databases, which are good for some things, but not for others. The design decisions made when you go with a hierarchal database are very different than the ones you make when you start with a relational database, and I worry about the legacy products that have “tacked on” relational databases for buzzword and RFP-compliance. Legacy automation may now have relational databases, but are they actually using them as a relational database? Some are, but I know of at least one that isn’t.

Let’s return to Evergreen for a moment. Because Call Number Volumes are represented as their own entities, you have the option of thinking of them differently. For example, you can move a set of items to a different “call number”, one which may be associated with a different Bib Record altogether (or you could move a volume itself to a new bib record). You could even change which library “owns” a volume, and suddenly all the items attached to that volume have a new owner. And you’re able to place a hold on a specific “volume”, in addition to title-level and copy-level holds (and in Evergreen, meta-record holds across editions and formats).

But you can also do what you may already be used to, and change the Call Number for an item (or a batch of items) from an Item Editing interface, and not have to know about the more flexible structures that are being manipulated underneath.

Because we already have Volumes, we’re also in a better place for adding Serials.

There are fewer widespread repercussions to changing an end-user interface than there are with changing your database schema, so that’s why it’s important to have a good database design from the beginning, so that your interfaces have more options. To steal a sentiment from Mr. Miyagi, Evergreen has strong root.

— Jason