The problem

Most OCR is good enough for printed paragraphs and useless for everything else. The moment a document has a table, a handwritten signature, a stamp, or a column break, accuracy collapses and someone has to re-key it. For businesses processing thousands of scans a week, that’s a real cost.

What we’re building

Screenshot to Text is a Flask-based SaaS that accepts images and PDFs, runs them through a layered pipeline — Tesseract for the easy text, GPT-4 Vision for the hard stuff (tables, handwriting, mixed layouts) — and returns structured output with per-block confidence. Stripe handles billing on a per-page tiered plan. Cloudflare R2 stores the originals.

The AI angle

Two-pass extraction. The first pass is deterministic OCR; the second pass uses GPT-4 Vision only on regions the first pass scored below a confidence threshold. This keeps cost down (Vision calls are expensive) while raising the floor on accuracy where it matters.

How it’ll be used

Bookkeeping firms processing client receipts at scale.
Healthcare admin teams digitising patient intake forms.
Legal teams with handwritten case notes that need to be searchable.

Where we are

Core extraction pipeline works. Stripe billing and per-tenant rate limiting are in. Public launch is gated on an SLA we can confidently sell — we’re currently soak-testing against a target of 99.5% per-page accuracy on the standard test set.

Screenshot to Text

The problem

What we’re building

The AI angle

How it’ll be used

Where we are

Want to build something like this?