The problem
Most OCR is good enough for printed paragraphs and useless for everything else. The moment a document has a table, a handwritten signature, a stamp, or a column break, accuracy collapses and someone has to re-key it. For businesses processing thousands of scans a week, that’s a real cost.
What we’re building
Screenshot to Text is a Flask-based SaaS that accepts images and PDFs, runs them through a layered pipeline — Tesseract for the easy text, GPT-4 Vision for the hard stuff (tables, handwriting, mixed layouts) — and returns structured output with per-block confidence. Stripe handles billing on a per-page tiered plan. Cloudflare R2 stores the originals.
The AI angle
Two-pass extraction. The first pass is deterministic OCR; the second pass uses GPT-4 Vision only on regions the first pass scored below a confidence threshold. This keeps cost down (Vision calls are expensive) while raising the floor on accuracy where it matters.
How it’ll be used
- Bookkeeping firms processing client receipts at scale.
- Healthcare admin teams digitising patient intake forms.
- Legal teams with handwritten case notes that need to be searchable.
Where we are
Core extraction pipeline works. Stripe billing and per-tenant rate limiting are in. Public launch is gated on an SLA we can confidently sell — we’re currently soak-testing against a target of 99.5% per-page accuracy on the standard test set.