I have alluded in several posts to the disparities one finds in looking at distributions of library data. By “distributions” I am talking here about making observations and generalizations when one looks at all the data from a set of libraries.

I am going to discuss a fact of library life we all know about, give you a few numbers, and discuss a few implications. I take circulation figures (TOTCIR as NCES calls it) for all public libraries in the United States for fiscal year 2005. These are the latest national-level data we have from NCES. I use the data as I recompiled them in a dataset that began when I was at the U.S. National Commission on Libraries and Information Science. I have continued to update that series since then. Exhausting documentation exists on that site.

In that year in the dataset, there were 8,957 libraries and they circulated 2 billion items. Of those total circulations, 1.8 billion were reported for the highest quartile. That is, the 2,240 library systems that had the highest number of circulations accounted for 86% of all circulations. Those libraries in the first quartile, that is, those 2,240 with the lowest circulations had 13.9 million circulations or fewer than 1% of the total. This relationship—the big are awfully big and the small are really small—is an observable fact in most variables (staff, income, expenditures, holdings, and so on) and in every universal library dataset I have worked on. It is a characteristic of our national library system. The term in Statistics to describe this kind of distribution is “skew” and library distributions typically are skewed.

There are two kinds of implications, I think, to this characteristic. One deals with information policy and the second with the design of integrated library systems. Considering the information policy implications first, there are, it seems to me, three aspects that arise when we consider that the structure of library resources in this country are so disparate.

First is the effect of differential resources in a consensual democracy where an informed citizenry is a foundational element.

Second, what effects do differential resources have on the post-Enlightenment notion that was particularly important in the history of U.S. libraries: the library as the university of the common man? This question still is important because of the necessity of continuing education in an era with so many dynamic changes in our economy and where people need to retool for new kinds of jobs.

Third, given that the small libraries are so small, their staffs are also small. Andrea Neiman, of the Kent County Public Library, Chestertown, Maryland reminded me the other day when we were chatting about the implications of skewness of something important, that is, what we know about the small libraries which is not much. Consider: the top quartile of public libraries employs 85% of all public total full time equivalent staff while the bottom quartile employs fewer than 1%–numbers similar to what we saw with circulations, of course. The upper boundary of this quartile is less than 1.2 FTEs–the rest are smaller. Thus, these libraries will not likely be adequately represented at conferences nor in decision making bodies.

The policy implications are for another place and time, of course, and we all know of the halting attempts to address this problem, that is, to breakdown the information silos faced by users of libraries.

A second implication of the skewness has to do with the design of library automation systems. They have handled the fact of the distribution of library resources awkwardly and that, in turn, follows from the unsystematic way these resources have been designed traditionally. I will leave to my colleague Mike Rylander to discuss what he has explained to me about how ILSs were designed but there was market segmentation: big ILSs and small ILSs as a result of the limitations of early design decisions and capabilities of the ILSs. The influence these design limitations had on information policy would be a fascinating subject to explore.

In any case, Evergreen is, currently, unique in that its design encompasses very big to pretty small libraries and its ability to handle diverse consortia is also unique—and valuable as I have discussed in several previous posts. The PINES experience indicates that good design can address the information policy aspect of skewed library distributions.

    So, not knowing if this makes sense from a statistical perspective – but don’t those larger libraires also have a much larger proportion of the population tht they serve, so it follows that they would have a much larger percentage of the circulation. (???)

    Does this skew still exist when you look at circulation per capita? I am assuming that it still does, but I wonder if it is to the same degree of “skewness”.

    It seems that this would be a more – accurate is not the term that I am looking for, but I am stumped – precise(?) way to measure the impact of size (collection, staffing, budget, etc.) on circulation patterns and pick out those libraries that are anomalies to highlight possible libraries that are leaders in delivering materials to their patrons.

    Circs per capita is a fascinating statistic, I think. I have never done an analysis on its distribution but that is an easy enough thing to do so I will do it. In any case, I think this number may provide some practical value when combined with the kind of data we are seeing from PINES and will see from other consortia.

    But, the number is a ratio so if circs per capita is, say, 10, a library with a population served of 5,000 would have an annual circulation of 50,000 while one with a population served of 500,000 would have annual circulations of 5,000,000. Both circs and population served would go up proportionally.

    Circs per capita vary rather considerably across the country. There are several factors that seem to “explain” (in a Statistical sense) circs per capita. One is that circs per capita go up with social and economic statistics (SES).

    There is also a regional factor. State public library figures show that states in the South, generally, have low circs per capita. They are highest in Ohio. I have speculated that the explanatory factor might be found in applying some of the observations in Professor David Hackett Fischer’s stunning Albion’s Seed about the persistence of culture and the migration patterns of settlers from England in the US to the study of libraries.

    And it would be more than circs per capita that indicate an interest in libraries. If you stand in the libraries at the Universities of Ohio, Michigan, Illinois, Minnesota, and Indiana University, it is striking. These same states have high circs per capita in their public libraries and these states have also invested in a series of great libraries. Think about Illinois. They built a big building and they knew it would grow so they put a parking lot around the building to provide room for expansion in the future. To quote Spengler from another context: “That is greatness, that is what it means to be a thoroughbred.”

    The Midwest was settled by different folks from those that settled the South but even with regional differences, there are local factors, as mentioned with the SES. For what it is worth, I did some analysis of Canadian public libraries and it indicated that their circs per capita are higher than in the U.S. I tell folks that and the common response is that it is the long winters in Canada. The winters are long in Maine, too.

    To do what you suggest in the third paragraph, an interesting bit of analysis is to do a regression using population served to predict circulations.

    If you take all the libraries in the US, there is some average circs per capita. Let’s say it’s 10, just to make it easy. What you would do next is plot the difference from this average to all the different figures and you get an interesting graph.

    Briefly, there is only one library with a population served greater than 500,000 that is above the national average (Queens Borough Public Library). There are very interesting clusters, though, and one I think is fascinating are the ‘edge cities,’ a group of which have very much larger than the average. This group includes Multnomah County Library in Oregon, King County Library System in Washington…among many others.

    So what? What makes the libraries with 2 different from those with 20? Can we use what we know to help the libraries with 2 increase their number? What if we develop a good idea of what predicts circs per capita and we have a library that “should” (by the model) have 5 but it has 10 circs per capita? What are they doing that all the other libraries with 5 might learn from them? What do their collections look like, for instance? How do they differ? What circulates the most at the library that is used more?

