The Problem
Orum's parallel dialer connects sales reps to live prospects at high volume — the core product worked. But after every call, SDRs were expected to log notes that would inform the next touch, help sales leaders coach, and give RevOps the account-level data they needed.
The data told a clear story: SDRs were only taking notes 40% of the time, and spending 2–6 minutes on post-call disposition when they did. That's 15–20% of a rep's working day on a task they kept cutting when quota pressure hit.
Without reliable notes, organizations couldn't build the account intelligence to improve their sales motion. The data gap compounded — and Orum's competitors were moving faster on this exact problem.
The Insight
Most AI product failures aren't model failures — they're context failures. The model gets thin inputs and produces thin outputs, and the team blames the technology. The fix isn't a better model. It's better context architecture.
Orum's first attempt at AI meeting summaries generated directly from raw transcripts — and the results were predictably poor. Speaker order jumbled, filler included, no signal about what actually happened. Users opened the summaries once and stopped using them.
The problem wasn't the AI. It was that we were asking the model to summarize everything rather than extract specific things. The gap between "summarize this call" and "extract the talk time ratio, key objections, committed next steps, and a coaching signal" is the difference between a feature that fails and one that sticks.
The Approach
I ran discovery with AEs and SDRs across market segments, plus Orum's internal sales team — 25 reps total. Three themes were consistent:
- They needed tools that increased call volume, not just note quality
- Automating routine tasks so they could focus on higher-value conversation work
- Better pre-call context to be more prepared when a prospect picked up
The research reframed the product question. The goal wasn't "better notes" — it was saving 2–5 minutes per call so reps could dial again faster. Notes were the mechanism, not the outcome.
This led to the core design: AI-generated summaries structured around specific outputs that reps and managers actually used — not a generic summary of everything that was said.
Building It
Prompt Architecture
I served as the prompt engineer throughout. The structured extraction approach replaced the open-ended summarization with a specific schema: talk time ratio, key objections raised, committed next steps, and a coaching signal based on how the rep handled pivots.
Getting this right required iterating against real call transcripts — running them through prompts and comparing output against notes the SDRs had actually saved. We reverse-engineered accuracy from the ground up.
The Determinism Trade-off
One unexpected constraint: pushing for higher accuracy (increasing determinism) made summaries more rigid and less useful for the long tail of unusual call types. We landed on an 80% similarity threshold — accurate enough to be relied on, flexible enough to handle edge cases. Attempting to push higher degraded output quality in ways that were harder to explain to users than "sometimes it misses something."
Shadow Testing
Before shipping any UI, we released the feature to production without a visible interface. The AI generated summaries in the background; we compared them against what reps actually saved and evaluated similarity nightly. This gave us a real signal on accuracy before any user ever saw the output.
Internal Beta
Launched with Orum's internal SDR team. Evaluated similarity scores each night and shipped prompt refinements in response. Improved from 70% to 85% similarity over two weeks. That cleared the bar for external release.
External Launch
Shipped as a beta for top-tier package clients. Strong initial interest, with two early challenges that required rapid iteration: UI placement wasn't immediately obvious (slowing adoption in the first week), and enterprise clients raised data privacy questions about call recording. Both were addressed before GA.
Post-launch, incorporated RLHF and human-in-the-loop processes to keep improving accuracy without requiring a full model retrain.
Results
Within the first three months post-launch:
The feature became the top-cited reason customers renewed. The efficiency gain wasn't just time saved per call — it was the downstream effect: sales leaders finally had reliable account-level data to work with, and RevOps could actually build on it.