If you had asked me three weeks ago how one might go about creating a federated search interface that could intelligently combine the results from academic and public libraries, I would have shrugged. If you had then asked how one might go about integrating search results from other sources, such as online journal indexes, weblogs, Amazon.com and Google I might have laughed. But if you had waited another week and asked I would have pointed you to this.
Joshua Ferraro from LibLime and Koha and I have been talking on and off in the #code4lib IRC channel over the last few months about ways that our respective Open Source ILSs could interoperate. One of the first and most obvious areas we wanted to target was searching. Specifically, how could two different ILSs provide a common search interface such that the results from each could be used at the same time. We saw this as a way not only for patrons of each library to have access to more resources, but as a way to strengthen both of our respective projects by learning from each others design decisions.
The discussion had come up perhaps four or five times and we had tossed some general ideas around, but we hadn’t really started down the road to creating any working prototypes. Then, about two weeks ago, Ross Singer from Georgia Tech jumped in. One of us said something about Amazon.com’s A9 search engine and the underlying technology, OpenSearch. Ross mentioned that he had created an OpenSearch interface for the GIL Universal Catalog and said that it only took him about 30 minutes of work once he got his head around OpenSearch. Well, Joshua and I, as public library developers, couldn’t let ourselves get shown up by some academic library developer! (Just kidding, Ross ;-)) Long story short, by the next morning Joshua and I both had our respective catalogs set up as search targets at A9.
But I wanted more. Not being satisfied with simply allowing anyone to find resources in the PINES catalog using a standard interface, I set out to define a way for any OpenSearch source to integrate it’s results with any other OpenSearch source.
A9’s implementation of OpenSearch is great for displaying multiple sources side by side, but you can’t integrate the results. The problem is that there isn’t a good way to decide the relative order of the results from multiple sources. There needs to be some normalized rank sent with the result list in order to intermingle the result items. So I decided to do to OpenSearch what OpenSearch did to RSS 2.0. I added an extension that allows a source to assign a rank to each item in the result set that represents the relevance of the item to the search terms on a scale of 0 to 100, with 100 being the best. Since this percent relevance is measured against the user’s search terms, and all OpenSearch sources get the same search terms, the rank ends up being an indirect relationship between the results from all of the OpenSearch sources that support this extension. The portal then sorts the results as they come in and the items naturally fall into the correct order.
I got this extension working for the PINES Evergreen demo and then asked Ross and Joshua to add support for ranking to their OpenSearch interfaces. Ross’s source was a particularly natural target for this proof of concept portal, since there has been talk of creating a federated search interface that covers both academic and public libraries in Georgia. He had no problem getting ranking working, though his catalog doesn’t support true result ranking at the moment. We decided that simply assigning a rank from 0 to 100 would be a good start, and it turns out that it’s not a bad way to fake ranking on sources that cannot naturally provide it. Joshua is working on getting ranking information working for his catalog as well.
So, that’s where we are right now. Joshua had no trouble getting a version of the portal going as well, and all of this took about three days. To me, the amazing thing about all of this is that the demo portal at gapines.org shows just the very tip of the iceberg.
We’ve started talking about integrating automatic ILL by putting a link in each of the result items that starts an NCIP transaction.
Another issue we’ve been working on is what it would mean to integrate Web search results into the ILS search results. Or for that matter what would it mean to integrate online journal search results, which are generally unavailable unless one is logged into a service with a subscription to the target journals. For now the best we can do is defer to established methods and simply add an extra results column for each unranked source, the way that A9 does. In the future we’ll be working on a way to integrate results from different types of sources in an intelligent way. One idea I’ve been mulling over is to have each source define a service level and allow each portal to group results from similar service levels together.
The other major issue that isn’t handled in the current demo portal is targeted searching — that is, searching by title, author or subject instead of simply by keyword. Ross gives a good overview of the problem, and the beginnings of a solution, in a recent blog entry. Building on his initial ideas, each source that supports targeted searches could provide information about how to ask questions in the right format. For example, Ross’s OpenSearch source supports CQL, and it should tell the portal as much in it’s OpenSearch Description Document. The portal would then be responsible for providing an advanced or targeted search interface to the user, and then sending the search request to each source using the correct format. Ross’ source would get a CQL query, and Amazon.com, which does not support CQL, would get a standard keyword search string.
Of course all of this is something for the future, not to mention a bit outside the scope of the Evergreen project proper. Even so, it’s only taken us about a week to get this far. With some luck we’ll be able to find time for some of these new questions in the near future. Because of this experiment I am certain of one thing, though: there is no good reason that libraries, and especially ILS vendors, can’t play nicely with each other and with the wider Web. It is in our patrons best interest, and more to the point it’s what they expect from us. I’m just glad to have found my way to a community that feels the same.