Forgive my ignorance, but I’ve got a question concerning OCR tools. Until now, I have utilized a paid service to upload, scan, convert them to searchable documents, and store my handwritten Uni notes. Handwritten because, frankly, my brain seems to engage with the content “better” than by digital note-taking.

It worked fine for what I needed, so I have never investigated open-source or had actual ownership/control over my uploaded notes before. As my work expands and the database of notes grows, maintaining data privacy is a huge concern, and I do not want to use the same system for interviews and such. My Uni has been, well, unhelpful sadly.

Are there any recommendations for having a similar system that puts more control and privacy in my hands?

  • Thank for for the great responses so far. I’ve encountered some limitations due to university provided laptop (Power/OS of Windows 11) and my own coding inexperience. However, I am exploring a setup that employs Docker and Paperless NGX. I’ve yet to upload hand written notes in PDF format, but as captured via a phone camera the OCR is abysmal. For typed PDF, the OCR is perfect. It parsed through, with no errors, a 100 page contract document and provided the text for import into an analytical program.