The retrofit approach
The most common mistake with AI integration is treating it as an architecture decision. It isn’t. The LLM is a function call. You add one new endpoint, route the relevant context to it, get a structured response back, and write it to your existing database. Your frontend doesn’t change. Your schema doesn’t change. Your users get a new capability.
We’ve run this pattern across email triage, missed-call follow-up, content automation, CV scoring, and subscription cancellation — all as retrofits onto existing backends. The approach is proven and the timelines are predictable.
The three retrofits worth doing first
1. Draft generation
Anywhere your users write from scratch — replies, reports, summaries, proposals — an LLM can produce a first draft that reduces their work to a ten-second edit. Draft generation is the lowest-risk first move: the user sees the output before it goes anywhere, so a bad draft costs nothing but a click.
2. Classification and triage
Anywhere you have incoming items that need routing, prioritising, or labelling — support tickets, leads, emails, documents — a classifier prompt outperforms rules-based logic on edge cases by a significant margin. The output is structured (a label, a score, a category), which slots directly into your existing schema without UI changes.
3. Structured extraction
Anywhere unstructured text needs to become structured data — receipts, contracts, forms, call transcripts — an LLM with a well-designed output schema replaces a fragile regex pipeline. Claude’s tool use and structured outputs make this reliable enough for production.
How the engagement works
Week one: discovery and scope
We map the workflow end-to-end, identify the single highest-ROI insertion point for AI, and agree on a success metric before we write a line of code. We also build the eval set — 30–50 real examples from your data that we’ll use to measure output quality throughout the build.
Weeks two and three: build
We build behind a feature flag. Your existing workflow doesn’t change; the AI version runs in parallel until we’ve validated it against the eval set. Daily written updates. No surprises.
Week four onward: rollout and measure
We enable the feature flag for a subset of users, monitor output quality and cost, and either roll out fully or iterate. You get a handover document with the prompt history, eval results, and cost projections before we close the engagement.
How we measure success
We agree on a primary success metric before we start — success rate against historical data, time saved per action, deflection rate, or model cost per successful outcome. We run the eval before full rollout. If the metric isn’t met, we don’t charge for the rollout sprint.
Frequently asked questions
Do you need access to our codebase?
We need to understand your data model and your API surface. Whether we work in your repo or build a standalone service that integrates with your API depends on your preference and your security requirements.
What if the first workflow doesn’t perform well?
We agree on the eval criteria upfront. If performance doesn’t hit the target, we iterate on the prompt, the context strategy, or the model choice before rollout. We don’t ship something we can’t measure.
How do you handle data privacy?
We route all LLM calls through a self-hosted LiteLLM gateway by default, which means your data stays within a controlled boundary. For sensitive data, we can use on-premise models or Anthropic’s enterprise API with data processing agreements in place.
Can you work with our existing backend language/framework?
We work primarily in Python (FastAPI, Django) and Node.js (Fastify, Express). For integrations with other stacks, we can build a standalone AI microservice that your existing backend calls — language-agnostic from your side.