The test before you build
Before scoping any AI feature, we ask one question: can you define what success looks like as a number you already track? Not a new metric created to justify the feature — an existing one. Time per task, support tickets opened, churn rate, content output per headcount, conversion rate.
If you can’t name the number before you build, you almost certainly won’t be able to attribute a change in it to the AI feature after you build. The feature becomes “hard to measure” — which is usually true and means it doesn’t show up in your P&L.
Features that reliably move the needle
Support deflection
An AI that handles the most common 40% of support queries before they reach a human agent is directly measurable: tickets per day, resolution time, cost per ticket. The constraint is that it has to work reliably on the common cases — users who get a bad answer from an AI and then have to escalate cost more than the original ticket. Build the deflection layer for the easy, repetitive queries only. Let humans handle the rest.
This is the AI feature with the clearest and fastest ROI calculation in B2B software. You know your support cost per ticket. You can measure deflection rate. The maths writes itself.
Draft generation for high-volume writing tasks
Anywhere a human is writing the same type of thing repeatedly — support replies, status reports, outreach emails, job postings, product descriptions — an AI draft they edit down to a final version compresses time from minutes to seconds per item. The output per headcount number goes up measurably.
This is what our Ghost Writer does for content pipelines: detect the topic, generate the draft, run it through quality checks, publish. A task that took a human two hours now takes two minutes of review.
Lead and ticket triage
Routing and prioritisation are invisible work that adds up fast. A classifier that correctly sorts incoming leads by intent, urgency, or segment means your sales or support team is always working the highest-value item first. The downstream metric — conversion rate, response time to high-priority leads — is something you almost certainly already track.
This is the core of Email Triage: not replacing the human, but making sure the human is always looking at the right thing next.
Extraction that eliminates manual data entry
Structured data extracted from unstructured sources — receipts, invoices, call transcripts, contracts — directly replaces human hours. The metric is straightforward: how many documents does a person process per hour before and after. We built Receipt Bridge specifically to eliminate manual expense entry for field teams. The before/after is a concrete headcount-hours number.
Features that rarely show up in the P&L
AI chatbots on marketing sites
Almost universally these look good in demos and produce negligible measurable impact. Visitors who want to contact you use your contact form. The chatbot adds complexity, costs money to run, and the conversion lift is rarely attributable and rarely significant. We’ve seen a few exceptions — when the chat replaces a human doing live sales qualification — but as a default, this is low-ROI work.
Summaries nobody asked for
Adding an AI summary to the top of every dashboard widget, every email thread, every document feels valuable. In practice, if users weren’t reading the underlying content before, they’re not going to act on a summary of it either. Summaries earn their keep only when the underlying content is genuinely too long to skim and decisions depend on it.
AI features that require users to change their workflow
If the value of your AI feature depends on users doing something differently — switching to a new interface, adding a step, changing a habit — the adoption curve will kill your ROI measurement before you can make a fair assessment. The best AI features are invisible: they improve an output the user was already getting without changing how they work.
The rule
Build AI features that replace work the system was already doing (classification, triage, first drafts, extraction) before you build AI features that add new work to the system (chat interfaces, proactive recommendations, summaries). The former improves the economics of what you already have. The latter bets on adoption that may not materialise.
If you want a second opinion on which AI feature to build first, send us a note. We have a consistent view on this after 35 products.