I’m looking for a service I could install to archive a huge pile of letters, preferably in PDF form, to a database. I’m living in a country where paper is still king, and digital services are either non-existent, or loathed (Germany). My current situation is that I have a mailbox with lots of PDFs all over the place, but also many folders of paper sent in 2007 etc. that I have to keep, but I also have to find them every five years or so.

So what I’d like to have is a service to my homelab, where I could scan these and copy these, that would index them, clean them, OCR them and all that good stuff. It should have really good metadata abilities, because my files are usually named in a very random way, so if I could copy these, and quickly categorize them, that would be really awesome.

There is one service called Papermerge, that kind of fits to my use-case. I spent one afternoon with it, and there were a few issues:

  • crashes quite often
  • when sending a large folder of PDFs, uses all the CPU and crashes again
  • categorizing functions are not very good, it takes time to get everything together and clean when organizing files

This might not be very interesting if your country has digital services for everything, but for us needing to suffer this paper madness, a service to do so would be great.

  • @TCB13@lemmy.world
    link
    fedilink
    English
    01 year ago

    If you have a docker environment I suggest just pulling a container up3, throwing all your documents in it and see if it would save you time or cost you time. Would be an hour well spent!personally the OCR alone is it worth it for me - my country still loves paper letters and being able to copy text out of that is awesome (IBAN, account numbers, etc - all the stuff that’s suspectible to typos).

    Yes I understand the pain and I usually go with Acrobat to do OCR of scanned documents. Now tell me something, are you sure docker and paperless will be around in 10 or 20 years? How are you planning to deal with that long term? I’ve documents from the 90’s copied over from floppy disks and whatnot a simple flash drive or hard drive plugged into my computer works as a quick backup for everything. Extra layers of protection can be added, but generally speaking files are easier to copy and checksum across time and media than some software with hundreds of dependencies, a webserver and whatnot.

    • SciPiTie
      link
      fedilink
      English
      31 year ago

      Worst case I have all my OCRed documents as raw files which I can migrate to whereever.

      Files still exist. For my case encrypted as well. My backups roll on the data, not the container.

      But I’m not trying to convince you, I tried answering the questions :)

      And two answer your last question clearly: I survived before paperless, I’d get along without it. I find a new tool to mitigate my manual labor as good as possible - if that’s not possible then jo harm done. I know I’m flexible, I can learn new tools and I’m never vendor or tool locked-in. I have a high level of self confidence when it comes to my tool chain and how I’d adapt any part of it - from password manager to cloud storage and my mail flow.

      To be honest I couldn’t self host anything if I’d had the fear of being lost if a tool is discontinued.