For the Good of the Many


Last time I wrote about our UTF-8 escaping machinery. It’s not that involved, but it’s something critical to get Right(tm) for proper unicode support. It’s also something that took me far too much time to complete, being something that nearly all modern data handling programs need to do. Let’s all hope that someone can benefit from my travails, my dark journey into the Mordor that is the Unicode and UTF-8 RFCs.

With that behind me, and nothing but green pastures of holiday vacation time ahead, I ambled over to the #code4lib IRC channel to see what was up. Well, lo and behold, someone was heaving mightily at a pile of MARC that was claiming to be about books but was most assuredly describing some conference proceedings. This got me thinking about the code we have buried in Evergreen for investigating these sorts of arcana encoded in library data. And it is with that introduction, and a hope for some extra positive karma, that I present to you, dear reader, a pair files that I hope will be of use in one endeavor or another, should you need to know what the percent cloud cover is contained in your remote sensing image or you just want to know if you have any Earth moon globes made of skins (yes, skins).

Both of these files are Javascript — they are used inside the Evergreen staff client, and on the server side for implementing the circulation matrix — but they are (nearly) JSON/YAML, so they should be very easy to port to nearly any programming language.

  • fixed_fields.js — This file encodes the positions and sane defaults for data in the leader, 008 and 006 fields of MARC21 fixed field elements. This is derived from the OCLC MARC documentation, and is ready for use in your program today. ( ~12h of straight data entry)
  • phys_char.js — Herein lies a complete decoding of the 007 field. This is used to describe the physical characteristics of standard media, though some of the standards seem to date back to approximately just before the dawn of recorded history. It too is based on some docs (well, one big HTML table) from OCLC. ( ~2d of mind numbing cut-and-paste data entry)

So there you have it. Let’s hope, again, that some poor soul out there can be spared the loss of many hours and IQ points transcribing this information again. Don’t say I never did nothin’ for ya.