I asked a library system to furnish their whole catalog of books, music, and movies in an open format (JSON, XML, or CSV). They refused, saying that the database is extremely large, composed of several hundred thousand bibliographic records that reference over 2 million documents. They say the database is highly dynamic and it would be obsolete by the time they export the data and likely not useful to more than one person.
So they have opted to limit everyone to using their web-based search. Is my request unreasonable? Or their response?
I’m trying to get a basic idea of the size we are talking about. I’m guessing 100,000 bibliographic records would consume roughly 100mb uncompressed (guessing an avg. record would not exceed 1k). And since text compresses very well, a zipped JSON would be what, ~10mb per 100k records? I believe a zip file of 900,000 bibliographies would be ~65mb.
The library did not give precise figures but I would like to work out what level of crazy my request is. Do any libraries in the world export a dataset of 100s of 1000s of book and media titles? Because if it’s done /somewhere/, it would give a clue about the reasonableness of my request.
I’ll give a couple use cases in case anyone is wondering how direct DB access would be useful.
Use case 1:
- fetch a list of titles of interest, e.g. award-winners (books, scripts, actors, musicians, directors, etc), or a list of banned books, because if it’s banned somewhere maybe it piques your curiosity
- search the library’s DB for matches against a list
If the list is more than ~15 or so items, you’re fucked because library query forms rarely accept a list as input. And as soon as you need to specify other criteria like works in English with a date range, the chance of a web form doing the job becomes increasing unlikely.
Use case 2: Suppose you are boycotting something or want to avoid something or someone (e.g. you want to avoid Tom Hanks because he is a sell-out with no sense of brand protection, who will act in any garbage film if it pays enough)
- fetch a list of titles you want to avoid (e.g. if you boycott Disney, get a list of Disney titles; or get a list of movies Tom Hanks was in)
- search the library’s DB for whatever you are looking for, but exclude matches against a list
Or you have a looooonng list of movies you have already seen or books you have read. Obviously you might want to exclude them from your queries.
Use case 3: The library has an extremely limited sense of genres. A conversation went like this:
Me: “Where is the EDM section? Where is the ambient and trip-hop section?” Librarian: “what’s that?” Me: Electronic music. Librarian: those would be under “rock”. Me: What about world music, like Ravi Shankar (classical Indian)? Librarian: check jazz
Fuck me. No wonder the rock and jazz sections are so huge and there’s little else. Picking through it would be unsurmountable and the web DB likely has the same sloppy genre problem. I suspect what has happened is young ppl just don’t do libraries much and they probably use Spotify or similar online surveillance system for music. In fact I rarely even see people browsing the music these days. So the library organisation just did not keep up genres and no one noticed because they are online. So again, like use case 1 it would be useful to find the intersection between a list of titles of interest and the library DB.
I have to wonder if the /real/ problem is that the library thinks I would be the sole user of the exported DB. I can understand resistence to doing a significant amount of work for just one person. But I would expect many people to have search needs that these GUI webforms cannot handle, no? And from there it would be the subset of those people who know SQL.

You seem to be saying: find out what you want and ask the library for it. I tried that once. I asked for a particular book. The librarian basically said “nope, we don’t have it… but English bookstore X might have it”.
Note that I am generally interested in English content in a non-English region. There are some English books and media for whatever reason (certainly with DVDs it’s because the original film is often in English) but asking them to procure something in English is probably a long shot… a bigger ask than asking them for a DB of what they have.
The point in doing an SQL intersection between a long list of some sort and their DB is to find what they already have that may be interesting. It’s not to discover what they don’t have.
Maybe don’t try asking for a specific book. You might have more luck expressing you’re after English-language items and asking how you might explore your options.