cross-posted from: https://lemmy.sdf.org/post/53671520
I asked a library system to furnish their whole catalog of books, music, and movies in an open format (JSON, XML, or CSV). They refused, saying that the database is extremely large, composed of several hundred thousand bibliographic records that reference over 2 million documents. They say the database is highly dynamic and it would be obsolete by the time they export the data and likely not useful to more than one person.
So they have opted to limit everyone to using their web-based search. Is my request unreasonable? Or their response?
I’m trying to get a basic idea of the size we are talking about. I’m guessing 100,000 bibliographic records would consume roughly 100mb uncompressed (guessing an avg. record would not exceed 1k). And since text compresses very well, a zipped JSON would be what, ~10mb per 100k records? I believe a zip file of 900,000 bibliographies would be ~65mb.
The library did not give precise figures but I would like to work out what level of crazy my request is. Do any libraries in the world export a dataset of 100s of 1000s of book and media titles? Because if it’s done /somewhere/, it would give a clue about the reasonableness of my request.
I’ll give a couple use cases in case anyone is wondering how direct DB access would be useful.
Use case 1:
- fetch a list of titles of interest, e.g. award-winners (books, scripts, actors, musicians, directors, etc), or a list of banned books, because if it’s banned somewhere maybe it piques your curiosity
- search the library’s DB for matches against a list
If the list is more than ~15 or so items, you’re fucked because library query forms rarely accept a list as input. And as soon as you need to specify other criteria like works in English with a date range, the chance of a web form doing the job becomes increasing unlikely.
Use case 2: Suppose you are boycotting something or want to avoid something or someone (e.g. you want to avoid Tom Hanks because he is a sell-out with no sense of brand protection, who will act in any garbage film if it pays enough)
- fetch a list of titles you want to avoid (e.g. if you boycott Disney, get a list of Disney titles; or get a list of movies Tom Hanks was in)
- search the library’s DB for whatever you are looking for, but exclude matches against a list
Or you have a looooonng list of movies you have already seen or books you have read. Obviously you might want to exclude them from your queries.
Use case 3: The library has an extremely limited sense of genres. A conversation went like this:
Me: “Where is the EDM section? Where is the ambient and trip-hop section?” Librarian: “what’s that?” Me: Electronic music. Librarian: those would be under “rock”. Me: What about world music, like Ravi Shankar (classical Indian)? Librarian: check jazz
Fuck me. No wonder the rock and jazz sections are so huge and there’s little else. Picking through it would be unsurmountable and the web DB likely has the same sloppy genre problem. I suspect what has happened is young ppl just don’t do libraries much and they probably use Spotify or similar online surveillance system for music. In fact I rarely even see people browsing the music these days. So the library organisation just did not keep up genres and no one noticed because they are online. So again, like use case 1 it would be useful to find the intersection between a list of titles of interest and the library DB.
I have to wonder if the /real/ problem is that the library thinks I would be the sole user of the exported DB. I can understand resistence to doing a significant amount of work for just one person. But I would expect many people to have search needs that these GUI webforms cannot handle, no? And from there it would be the subset of those people who know SQL.

I use sqlite. They are probably using some heavier duty db but for sqlite exporting JSON is trivial so I would be surprised if other DBs did not have a similar mechanism. And to be clear, I said to the library that I prefer JSON but would handle whatever open format they prefer, be it XML or CSV.
This does not sound like a realistic problem. I might imagine if they had a DB of all ISBNs, they would obviously have to use a query that limits to their catalog. Apart from that, I don’t see what would be inappropriate. If it’s in their catalog, why hide it? Not sure what you have in mind but I should say it’s not the US where there would be some right wing concern to prevent children from getting sex education type of material, or the Christian right trying to make Darwin’s theories hard to reach.
If you are thinking in terms of sensitive info, like accounts of people and what they borrow, it would be irresponsible if that kind of info were not in a separate table.
I was expecting my request to be ignored, as open data requests often are – and rarely fufilled in my experience even when they answer. But in the case at hand, they first responded favorably, saying essentially: we can give you some data but your request is vague… what exactly do you want? I basically replied with “everything”. So they were not opposed to exporting some data, but the volume involved (100s of 1000s of records) seems to be a show-stopper.
I might agree that it’s a bit much to serve one person. They also said it would take disproportionate resources when they have a whole public to serve. But I was figuring “build it, and they will come”. There would be a first person to make a request. I am a bit disappointed that if it were made available that we could not expect many people to exploit the option to be free from the UIs limitations.
It is simple in sqlite (which is purpose-built to be simple and small,) so you assume all other databases are equally simple. You then expect library staff to be standing by ready to help with your demands.
Well prepare to be shocked: That expectation is absurdly naive and self-centered. YTA
It’s the other way around. I expect a simple DB to be more basic. A more complex DB should be even more featured. If, for example, an Oracle DB cannot easily handle the job that a small and simple home kit can, Oracle should be embarrassed.
Yikes! What leads you to think this is about me? It’s about databases. I did not invent sqlite. It was an example. You can fuck off with your vitriol.
Yes, it is unreasonable. As you have already been told. You asked, but didnt like the answer. Again, you have only a rudimentary understanding of the problem but base everything off your experience and your needs.
I was looking for good answers. Convincing answers. Which I expected to correspond with data volume.
In any case, good answers are defensible, should the occasion arise. When you cannot defend your answers, it indicates a lack of justified confidence despite an expectation that others adopt some kind of blind confidence in your answers.
The thread has those aplenty, it is just that you are confidently incorrect so you dont like the answers.