Throwing down the gauntlet


  • Alternative OPACs
  • Personal Card Catalogs
  • Bookbags
  • Record Tagging
  • Reading Lists
  • User built RSS Feeds of reading lists and searches
  • Interactive searches
  • Saved searches
  • Search history
  • … tons of other things we’ve collectively imagined …

What do they conceptually have in common? Ideally, they could all be built on top of a platform that publishes a rich and easily understood API that exposes our catalogs. An API that also allows us to hook into circulation information and the business logic that drives that circulation. A secure API with a rich and flexible permission system that can restrict access to sensitive data. And, perhaps most importantly, an extensible API that allows administrators and developers to extend the depth, breadth and quality of services provided by the entire system.

What do the prototypes of these things (that some very smart and creative people have put together) actually have in common today? Well, not that. What they have in common today is blood, sweat and tears. Chewing gum and bailing wire. They are mostly, and unfortunately, research projects and personal experiments built on the shifting sands of first-generation XML layers on top of third-generation opaque catalogs. I say this not to denigrate the value of these creations, or indeed the need for them in order to keep libraries relevant in a “post-Google” world, but simply to state the truth. All of these enhancements are subject to the inevitable API “enhancement” of the ILS by the vendor that will break these hard won features, or the absorption and generification into the vendor’s product that will require costly and time consuming “upgrades” and will ultimately remove locally developed features.

Now, I know vendors are made of people too, and we need to work with them to get things done. And we can’t all build everything we need on our own, or even hire programmers to customize what we have. But I’ll bet it struck a chord with quite a few of you out there. So, apologies to any vendors whose feelings I may have hurt, and to any pioneers bleeding, sweating and crying to improve their patrons’ experience any way they can.

So what is this really about? It’s about “ILS as a Platform” being more than a Library 2.0 dream. It’s about a simple, scalable framework that allows developers (and coder-librarians) to extend services quickly, deeply, and in the way they feel most comfortable. It’s about, well, OpenSRF.

We’ve mentioned OpenSRF before, and we’re pretty proud of it. It has made creating complex, layered services very easy, and it’s built in load balancing and failover means that scalability is achieved by tossing inexpensive commodity servers at a cluster instead of having to replace one huge server with another even beefier server. And best of all, because the SRF in OpenSRF stands for Service Request Framework, OpenILS/Evergreen is, by definition, an ILS Platform. OpenILS/Evergreen is, at it’s core, just a set of cooperating OpenSRF Applications that just happen to implement the functions needed by a public library consortium the size and structure of PINES.

I could go on gushing for hours about how cool OpenSRF is, and how, because of it’s support of (and in fact insistence on) decoupled, simple, layered services, it has the power to make the world a better place. And I will, if you see me in person and ask, “So, what’s with this OpenSRF thing?” But I won’t — for now. Here’s what I am going to do: to prove how simple it is to create new OpenSRF apps to expose or create functionality for OpenILS/Evergreen, I’m going to build one here.

A little OpenILS/Evergreen background
OpenILS is, well, an ILS. It deals with MARC records, transactions against the items those records represent, and manages who can be involved in those transactions, and how. The catalog portion of the database is comprised mainly of three tables: biblio.record_entry which contains the MARC records in MARCXML format, asset.call_number which manages (you guessed it) call numbers and tracks the owner of items attached to the call number, and asset.copy which holds item barcodes and item specific information. There are many surrounding tables that supply statuses, locations, notes, locally defined categorical data, searchable metadata extracted from the MARC and other useful bits, but those three comprise the base core of the catalog.

We store records in MARC because that’s what libraries use, but it also happens to be the most fine-grained metadata format around. That means that we should be able to, in principle, crosswalk nearly any metadata format to MARC and back and get out the original, making MARC an essentially lossless intermediate format. Which is good, because that’s what the project is going to be about.

All of our top-level search results for anything that is not unique to a specific bibliographic record, such as title, author and subject, come back as metarecords. These are similar to, but not quite the same as, FRBR work sets. We go as far as including movie adaptations of works and the like in a metarecord. Our grouping fingerprint is also different that traditional MARC-to-FRBR mapping fingerprints. We use a MARC ‘Type of Record’ specific algorithm to choose an appropriate title and extract first word in the first defined author name field, then normalize and combine those to create a fingerprint. We have found this to be much more reliable given the state and “completeness” of a great deal of our data.

What’s the plan, Stan?
The purpose of this server is to expose our bibliographic catalog in multiple, admin definable, user chosen formats for remote consumption or local extension. This will also end up serving as the basis for the second generation of our WoRM server, which (in addition to needing a new, less scary name) is the service that maps individual records into metarecords. It will also be able to return fully constructed metarecords, in MODS format, for use by the open-ils.search OpenSRF application, which is the primary service used by our OPAC and is responsible for turning the basic records from the catalog into something both useful and efficient for end-user display.

Here’s the plan:

  1. Create a new OpenSRF Application to implement our new API: open-ils.supercat
  2. Create a basic retrieval service The current bare-metal open-ils.storage server provides much more information than is needed for most record display purposes. It will still exist and be accessible to backend applications, but this new service will allow clients to do what they want with the bibliographic data without having to know anything more than is absolutely necessary to retrieve the records. The purpose of this new retrieval service is to return just the requested (meta)records, in XML, and in the format requested, without needing to sift through anything else. This should consist of two OpenSRF methods:
    • open-ils.supercat.record.marcxml.retrieve
    • open-ils.supercat.metarecord.mods.retrieve

    These methods will take an OpenILS/Evergreen record or metarecord ID (respectively) as their only parameter, and will be built on top of the current open-ils.storage application.

  3. Create a crosswalking service This will use the retrieval methods described in (2) and XSLT crosswalk entries in the OpenSRF Settings Server to return records and metarecords in any format it understands, including the native MARCXML format as described above. There will also be methods for listing the available record and metarecord formats. This layer’s API will initially consist of:
    • open-ils.supercat.record.[format].retrieve
    • open-ils.supercat.metarecord.[format].retrieve
    • open-ils.supercat.formats.record.list
    • open-ils.supercat.formats.metarecord.list

    The first alternate formats I will add support for are records in Dublin Core and MODS, and both records and metarecords as RSS 2.0 items.

Does doing something like this require some investment of time? Sure. Can you create new, core services for any other existing ILS without learning all of its internals or a proprietary record format? Nope. Can you be assured that the underlying API you’ve developed against won’t simply go away without warning in a future version? With OpenILS/Evergreen, because it is open source and you will always have access to both the current code and the developers (of which you can be one), it’s a much safer bet.

Down the road, I plan to create a set of methods for importing, updating and deleting records through this OpenSRF application, though that will require integration with the existing OpenILS Auth application as well as permission management. However, I don’t believe that those services will be any more difficult or heavy than the services I’ve described here.

None of the code for this exists yet, and I will do my best to keep our blog updated with my progress. The next month is going to be pretty hectic around here while we clean up for the Beta release sometime around April, but the amount of code that the above API represents is fairly minimal, almost to the point of trivial, so I expect to finish this up by the beginning of March. If I’m feeling saucy (and home-time permits) it will only take a weekend.

Please feel free to comment here if you have any questions. I do take a lot of what I’ve explained for granted, so just ask if there’s something you would like clarified — or just join our development mailing list and talk to me (or any of us) directly.