How to Add AI to an Existing SaaS Without Rewriting It

The rewrite trap

The first thing most founders want to do when they decide to add AI is rewrite their product. New architecture, new stack, AI-native from the ground up. This is almost always the wrong call. You have a working product, paying customers, and production data. A rewrite puts all three at risk for the sake of catching up to a trend.

The better question is: which one workflow in your product would be meaningfully better if an LLM touched it? Start there. Build a clean boundary around it. Swap out the old logic for an AI-powered endpoint. Ship it. Then pick the next one.

The three workflows worth retrofitting first

After building Email Triage, Everyring.ai, and several client retrofits, we keep landing on the same three categories as the highest-value first moves:

1. Draft generation

Anywhere your users are writing from scratch — replies, reports, summaries, proposals — an LLM can produce a first draft that your user edits down to 10 seconds of work. The AI doesn’t need to be right, it needs to be 80% right and fast. Draft generation is the single easiest retrofit because the blast radius of a bad output is low: the user sees it before it goes anywhere.

2. Classification and triage

Anywhere you have incoming items that need to be routed, prioritised, or labelled — support tickets, leads, emails, documents — a classifier prompt outperforms rules-based systems by a wide margin on edge cases. The output is structured (a label, a score, a category), which means it slots directly into your existing database schema. No UI changes required.

3. Structured extraction

Anywhere unstructured text needs to become structured data — receipts, contracts, forms, call transcripts — an LLM with a well-designed output schema replaces a fragile regex pipeline. Tools like Claude’s tool use or structured outputs make this reliable enough for production.

The integration pattern

The implementation is simpler than it looks. You add one new API endpoint to your existing backend. That endpoint receives the relevant context (the email, the ticket, the document), calls an LLM with a prompt you control, and returns a structured response. Your existing frontend and database don’t change. The LLM is a function call, not an architecture.

In pseudo-code, the endpoint looks like this:

POST /api/ai/triage
Input: { item_id, content, context }
→ fetch full context from your DB
→ build prompt
→ call LLM API
→ parse structured response
→ write result back to your DB
Output: { priority, category, suggested_action }

Your existing UI reads from the same database it always has. The AI result is just another field.

What actually takes time

The API call is not the work. The work is:

Prompt engineering. Getting the model to produce consistently structured output on your specific data distribution takes iteration. Budget a week of prompt work before you call it done.
Handling model failures. LLMs occasionally return malformed JSON, refuse to answer, or produce outputs outside your expected range. Your retry logic and fallback behaviour matters.
Evaluation. You need a way to measure whether the AI output is good. Eyeballing a sample isn’t enough at scale. Even a simple spreadsheet-based eval set of 50 real examples beats nothing.
Cost monitoring. Token costs at low volume feel trivial. At 10,000 calls/day they are not. Wire up cost tracking before you launch.

Use a gateway, not direct API calls

Don’t hard-code calls to api.anthropic.com directly. Route through a gateway like LiteLLM or OpenRouter. You get model switching without code changes, request logging, rate-limit handling, and cost tracking for free. When Anthropic releases a better model next quarter, you change one config value, not ten API call sites.

We self-host LiteLLM across our entire portfolio. Every product routes through the same gateway. This is part of our AI infrastructure work for clients too — it’s the first thing we stand up.

A realistic timeline

A single well-scoped AI workflow — from discovery to production — takes four weeks in our experience. Week one is discovery and architecture. Weeks two and three are the build. Week four is integration testing, evaluation, and deploy. If it’s taking longer than that, the scope is too broad.

If you want to talk through what that looks like for your product, drop us a note. We’ll tell you honestly whether your workflow is a good first candidate.

We build production AI, not prototypes. If you’re looking to ship something like what’s described here — see how we work or start a project brief →