Forgive my ignorance, but I’ve got a question concerning OCR tools. Until now, I have utilized a paid service to upload, scan, convert them to searchable documents, and store my handwritten Uni notes. Handwritten because, frankly, my brain seems to engage with the content “better” than by digital note-taking.

It worked fine for what I needed, so I have never investigated open-source or had actual ownership/control over my uploaded notes before. As my work expands and the database of notes grows, maintaining data privacy is a huge concern, and I do not want to use the same system for interviews and such. My Uni has been, well, unhelpful sadly.

Are there any recommendations for having a similar system that puts more control and privacy in my hands?

  • I work in a digitalisation environment, we use OCR in different ways, sometimes with tesseract and sometimes with adobe. Both are differently effective. Tesseract needs training and adobe has mostly a propetary better recognition. Handwriting is mostly a special part which needs manual control.

    In my private environment I use a mix with paperless-ngx (which only does tesseract-ocr if it doesn’t is already OCR recognised). Paperless is able to change and export the output of the PDFs in a json database which I partly convert to trilium (a database based notebook).

    Didn’t found a better solution yet and it isn’t mostly not handwritten.