Home / Projects / Screenshot to Text

Screenshot to Text

Production OCR that handles the messy stuff — tables, handwriting, complex layouts — with a high enough accuracy floor that businesses can put it in front of their customers.

Category  AI · OCR SaaS Stack  Python · Flask · Postgres · Celery LLM  GPT-4 Vision Status  In development

The problem

Most OCR is good enough for printed paragraphs and useless for everything else. The moment a document has a table, a handwritten signature, a stamp, or a column break, accuracy collapses and someone has to re-key it. For businesses processing thousands of scans a week, that’s a real cost.

What we’re building

Screenshot to Text is a Flask-based SaaS that accepts images and PDFs, runs them through a layered pipeline — Tesseract for the easy text, GPT-4 Vision for the hard stuff (tables, handwriting, mixed layouts) — and returns structured output with per-block confidence. Stripe handles billing on a per-page tiered plan. Cloudflare R2 stores the originals.

The AI angle

Two-pass extraction. The first pass is deterministic OCR; the second pass uses GPT-4 Vision only on regions the first pass scored below a confidence threshold. This keeps cost down (Vision calls are expensive) while raising the floor on accuracy where it matters.

How it’ll be used

  • Bookkeeping firms processing client receipts at scale.
  • Healthcare admin teams digitising patient intake forms.
  • Legal teams with handwritten case notes that need to be searchable.

Where we are

Core extraction pipeline works. Stripe billing and per-tenant rate limiting are in. Public launch is gated on an SLA we can confidently sell — we’re currently soak-testing against a target of 99.5% per-page accuracy on the standard test set.