I asked a library system to furnish their whole catalog of books, music, and movies in an open format (JSON, XML, or CSV). They refused, saying that the database is extremely large, composed of several hundred thousand bibliographic records that reference over 2 million documents. They say the database is highly dynamic and it would be obsolete by the time they export the data and likely not useful to more than one person.
So they have opted to limit everyone to using their web-based search. Is my request unreasonable? Or their response?
I’m trying to get a basic idea of the size we are talking about. I’m guessing 100,000 bibliographic records would consume roughly 100mb uncompressed (guessing an avg. record would not exceed 1k). And since text compresses very well, a zipped JSON would be what, ~10mb per 100k records? I believe a zip file of 900,000 bibliographies would be ~65mb.
The library did not give precise figures but I would like to work out what level of crazy my request is. Do any libraries in the world export a dataset of 100s of 1000s of book and media titles? Because if it’s done /somewhere/, it would give a clue about the reasonableness of my request.
I’ll give a couple use cases in case anyone is wondering how direct DB access would be useful.
Use case 1:
- fetch a list of titles of interest, e.g. award-winners (books, scripts, actors, musicians, directors, etc), or a list of banned books, because if it’s banned somewhere maybe it piques your curiosity
- search the library’s DB for matches against a list
If the list is more than ~15 or so items, you’re fucked because library query forms rarely accept a list as input. And as soon as you need to specify other criteria like works in English with a date range, the chance of a web form doing the job becomes increasing unlikely.
Use case 2: Suppose you are boycotting something or want to avoid something or someone (e.g. you want to avoid Tom Hanks because he is a sell-out with no sense of brand protection, who will act in any garbage film if it pays enough)
- fetch a list of titles you want to avoid (e.g. if you boycott Disney, get a list of Disney titles; or get a list of movies Tom Hanks was in)
- search the library’s DB for whatever you are looking for, but exclude matches against a list
Or you have a looooonng list of movies you have already seen or books you have read. Obviously you might want to exclude them from your queries.
Use case 3: The library has an extremely limited sense of genres. A conversation went like this:
Me: “Where is the EDM section? Where is the ambient and trip-hop section?” Librarian: “what’s that?” Me: Electronic music. Librarian: those would be under “rock”. Me: What about world music, like Ravi Shankar (classical Indian)? Librarian: check jazz
Fuck me. No wonder the rock and jazz sections are so huge and there’s little else. Picking through it would be unsurmountable and the web DB likely has the same sloppy genre problem. I suspect what has happened is young ppl just don’t do libraries much and they probably use Spotify or similar online surveillance system for music. In fact I rarely even see people browsing the music these days. So the library organisation just did not keep up genres and no one noticed because they are online. So again, like use case 1 it would be useful to find the intersection between a list of titles of interest and the library DB.
I have to wonder if the /real/ problem is that the library thinks I would be the sole user of the exported DB. I can understand resistence to doing a significant amount of work for just one person. But I would expect many people to have search needs that these GUI webforms cannot handle, no? And from there it would be the subset of those people who know SQL.
I’m friends with a few librarians. They will absolutely move mountains for you to connect you with what you need. But as I read, it doesn’t even really seem like this list would ultimately be useful to even you?
So, to me it seems like the main reason you would want this list in the first place is not to waste time deciding on something only for it to not be at the library in the end.
Otherwise you could just use any other resource (e.g. a list of banned books from elsewhere) for discovery, right? Then just go check out that content directly, which it seems like their system is dialed in well for?
But libraries have what’s called an “inter library loan” system, usually you can place a request online. I’ve actually never yet encountered a situation where I couldn’t access something I wanted, even pretty obscure textbooks and reference manuals.
Why not compile your own lists from other sources that are more suitable for your needs, then meet your library where they’re at when it’s time to go access the content?
You seem to be saying: find out what you want and ask the library for it. I tried that once. I asked for a particular book. The librarian basically said “nope, we don’t have it… but English bookstore X might have it”.
Note that I am generally interested in English content in a non-English region. There are some English books and media for whatever reason (certainly with DVDs it’s because the original film is often in English) but asking them to procure something in English is probably a long shot… a bigger ask than asking them for a DB of what they have.
So, to me it seems like the main reason you would want this list in the first place is not to waste time deciding on something only for it to not be at the library in the end.
The point in doing an SQL intersection between a long list of some sort and their DB is to find what they already have that may be interesting. It’s not to discover what they don’t have.
Maybe don’t try asking for a specific book. You might have more luck expressing you’re after English-language items and asking how you might explore your options.
(dupe deleted)
You’re going about this all wrong and quite selfishly.
Look at from the library’s perspective. You’re asking them to do a lot of extra work that they can’t afford and might not even have the capacity to do.
What you should do is volunteer to work there. Do what they ask of you, learn about their system and why things are the way they are. Then, start to make suggestions and volunteer to make changes and including securing the funding to get done what you’d like to see done.
Libraries are community nonprofits. They need collaboration not to be told what they’re doing wrong.
I’ve worked in several libraries, public and academic. From my experience, your request is unusual, but not inherently unreasonable.
From a purely technical standpoint, exporting several hundred thousand bibliographic records is not some impossible task. Library catalog metadata is overwhelmingly text-based and compresses extremely well. Even a very large MARC/XML/JSON export would likely land somewhere in the low single-digit GB range uncompressed, and substantially smaller compressed. Modern library systems already perform routine backups and data migrations of this scale.
The more plausible reasons are institutional and operational; for example, many libraries are locked into proprietary ILS/LSP ecosystems (Alma, Sierra, Polaris, etc.) that complicate direct export workflows; and metadata may include licensed enrichments or vendor-supplied records with contractual restrictions.
That said, your broader complaint about search capability is legitimate. But some of your expectations may reflect a mismatch between what modern library catalogs are designed to do versus what discovery ecosystems now do elsewhere. Historically, catalogs were much more central discovery tools. Increasingly, though, discovery happens upstream through Google Scholar, Worldcat, etc.
Libraries increasingly function less as primary discovery environments and more as fulfillment and access layers, while librarians continue to gatekeep the role of expert resource-finders between information systems and (honestly) an increasingly information-illiterate public. In practice, many patrons (and all librarians that I’ve worked with in Canadian libraries) now discover works elsewhere and use the catalog merely to check availability or place ILL requests.
As for whether libraries elsewhere expose bulk catalog data: yes, in some contexts. Bulk bibliographic exports are common internally for: backups, migrations, consortium synchronization, analytics, archival purposes, third-party integrations, etc.
Some academic/open-data projects and national libraries also expose large metadata datasets publicly. So the concept itself is not exotic.
So for your needs, you might want to looking into using APIs or bulk datasets that already exist, like: WorldCat/OCLC APIs Open Library Library of Congress datasets Wikidata/SPARQL MusicBrainz
These are often much better suited for computational querying than a local OPAC.

