This is the first of what I hope will be informative and helpful progress reports for the Evergreen software development project. In this report I plan on giving a broad, project-wide overview of events, decisions, issues, and milestones that have occurred recently within the project. I also hope this report elicits responses and comments from you, the reader. My goal is to compile this sort of report at least monthly. The official target audience for this report is the PINES Executive Committee, but, of course, I welcome anyone to read and comment. I should also mention that since this is my first report, and I have to cover multiple months, this will most likely be one of my larger reports.
Assembling The Development Team
The first task we focused on was assembling the people who would form the core of the software development team. We currently have four staff members working on the development project as their primary duty:
- Brad LaJeunesse, PINES System Administrator, will be providing system administration and project management. Brad is a librarian, and has extensive knowledge of library data, practices, and policies.
- Jason Etheridge, PINES System Support Specialist, will primarily develop the staff client and online public access catalog (OPAC). Jason will also assist in the modeling and the development of the business logic for the application. Jason has a wealth of experience with library data.
- Bill Erickson, PINES Systems Developer, will lead the development effort of the application business logic and middleware, including data modeling, data transfer, and general application logic.
- Mike Rylander, PINES Database Developer, will be primarily responsible for the development of the storage and persistence model, and will assist in overall application and data model design.
(Note: The above was copied from this FAQ entry.)
From a personal perspective, I have to mention that the development team members started working together like a well-oiled machine almost from the very beginning.
Teaching MARC to a Programmer
One of our first tasks was to acclimate the new programmers to the PINES project and to library data and practices. We went to a PINES library, and the programmers learned about library operations and needs. We also talked in-depth about library-specific data issues– namely, MARC.
Mike spent a significant amount of time in his Bibliographic Data Storage and Access blog entry talking about MARC and the issues associated with storing that data in a modern relational database, and I don’t want to repeat everything he’s already said, but there are a couple important points I’d like to talk about.
MARC is a really old standard. It’s really, really old when we look at it in the context of how quickly technology changes. The MARC format was written in a different time, in a different environment with different issues, needs, and available technology. MARC also has a long history, and has changed some over time. This fact, at times, can make it difficult to work with (and to store in a modern relational database).
As a database person, Mike’s first instinct was to break up a MARC record into small little pieces, and to store each little piece separately in the database. If a user needed to get that MARC record back out of the database, the software would have to retrieve each little piece of the record out of the database and reconstruct it again. This operation makes perfect sense to any programmer who works with relational databases, and it’s how any database guru worth his salt would attack this type of problem.
The problem is that MARC has lots of nasty little gotcha’s. For example, any cataloger will tell you that there are tags that depend on other tags: the data in the 008 depends on what’s in the leader. In the leader itself, there is a field for how long the record is. In other words, there is a good amount of logic and rules that would have to be built into storing a MARC record in this fashion, and we weren’t sure about the ultimate benefits of doing so.
After thinking about this, we decided on a different path: store the MARC record as a whole in a single database table. We would then extract and copy important information from that MARC record (fields we want to search on) to other, additional tables in the database. For example, we would extract and store the title, author, ISBN, title control number, subject headings, and any other important data in their own separate tables. But we would also leave that data intact in the MARC record.
As you can see, the negative side to this method is that we store some data twice: once in the MARC record proper itself, and a second time if it is an important (indexed) field. We also have to do two database writes for certain single changes. For example, if you changed a title’s subject heading, the software would have to first update the MARC record itself, then update the separately stored subject heading field.
However, the positive side to this method is that we don’t have to (basically) code all of the MARC logic and rules into our database– we just use the existing MARC tools to edit the single stored MARC record itself, and then make any appropriate changes to the separately stored data.
Communication
The robustness of the communication layer of the software is of great importance to the success of the project. If the staff client can’t effectively tell the server-side software what to do, then the software is pretty worthless. We also had some rather specific needs in the communication layer.
- First, as any PINES person knows, communication from the libraries to the central PINES server occurs via an internet connection. This connection isn’t always stable, so the communications layer of the software needs to be robust enough to gracefully handle networking problems.
- Second, the central PINES staff needed a way to broadcast important messages to the PINES libraries in an efficient manner. We’ve found in PINES that email is not always the best means to get an important message to many people at once. We certainly can’t phone or fax 250 locations. So, the software needs to have an integrated “public announcement system” of sorts.
- Third, the communications software needed to be able to handle a high rate of communication back and forth from the client to the server-side software.
Taking all of these needs into consideration, we selected a software package called Jabber to fill the communications slot. Bill wrote an excellent blog entry, Messaging, on Jabber and how we’re working with it.
There is also a diagram of the flow of a message through the communication systems… but the explanation of that will probably require a whole other blog entry.
Staff Client
The last issue I’m going to cover in this report is the staff client. As with the other components, we had specific needs. We wanted a technology that was platform independent, and could run on PC or Mac. We liked the idea of being able to roll out (or roll back!) incremental updates or small bug fixes immediately to all connected clients. On another level, we wanted a client that was aethestically pleasing.
With those needs in mind, we selected XUL. XUL, which stands for XML User Interface Language is part of the Mozilla program. You may of heard of the Firefox, an up and coming web browser whose interface is written in XUL.
Hidden important announcement: We will have screenshots of a staff client posted on this website before we break for the Thanksgiving holiday.
If you are interested in seeing what a client may look like written in XUL, first download the Firefox web browser. After you have installed Firefox, run it and take a look at MAB (The Mozilla Amazon Browser). This interface connects to Amazon in a very similar way that we’d connect to a library database. Note that it’s a fully-functional client, but it’s also a webpage.
Looking Into the Future
We’re coming to the end of the major development of the communication layer. We are also getting comfortable with our data modeling and storage schemes, so there will be less and less flux on the database side of things. In other words, the basic underlying infrastructure is largely in-place at this point.
From here, we will start implementing what most people would consider actual useful functionality. For example, we may create an “edit MARC” screen on the client-side, and we will go through each function and each piece of data that screen will need on the other layers. Make sure it can get to the correct MARC record, make sure it can save the MARC record, ensure that MARC standards are enforced, etc. This means that more visible and more accessible evidence of progress will start appearing more often.
If you have any questions or comments, please let us know.
–Brad