AI Note-Taking for Sales Reps

Context

Orum's core product accelerates cold calling for outbound sales teams — connecting SDRs to live prospects faster so they can make more calls per day. The primary user is an outbound SDR running high-volume call blocks, often 100+ dials in a session.

SDRs are expected to both make calls and maintain detailed prospect and account-level notes after every conversation. The problem: they only did it 40% of the time. Not because they didn't care — because the post-call workflow ate into time they needed to dial again.

40% of calls resulted in notes being taken

15–20% of an SDR's day spent on data entry

2–6 min spent dispositioning each call

Without reliable notes, organizations couldn't build the account-level intelligence needed to improve their sales motion. The data gap compounded over time — and it was getting worse as call volumes increased.

Hypothesis: By automating post-call note-taking, we can increase SDR note coverage and efficiency — saving time on post-call tasks and allowing SDRs to make more calls during a call block.

Research

The research process combined qualitative interviews, quantitative session analysis, and competitive intelligence — each layer stress-testing the hypothesis before we committed to a solution.

Qualitative: SDR Interviews

I reached out to 25 SDRs across SMB, mid-market, and enterprise segments — including Orum's internal sales team. Three consistent themes emerged:

Tools to increase efficiency and call volume per block
Automation of routine post-call tasks to free time for higher-value work
Better pre-call context to be more prepared when a prospect picks up

"I need a tool that can automatically capture and summarize my calls. Taking notes manually is time-consuming and often distracts me from the conversation."

"I wish there was a way to reduce the administrative tasks after each call. If I could save even a few minutes per call, I could make significantly more calls during my workday."

Quantitative: Session Data

I overlaid the qualitative findings with data from thousands of user sessions in Heap and session replay analysis. The numbers confirmed the pattern: post-call disposition was the single biggest time sink in the SDR workflow — and the first thing cut when quota pressure hit.

Data entry tasks collectively consumed 15–20% of an SDR's working day. That's not a minor inefficiency — that's a structural drag on capacity.

Competitive Analysis

Win/loss reports showed competitors were innovating faster in this space. Orum was largely at parity with the field and hadn't moved meaningfully beyond its initial product-market fit. The competitive read: build something that adds strategic user value, not just feature parity.

Research Conclusion

Post-call tasks are the largest time sink in an SDR's day — but also the first to be dropped when quotas need to be met. SDRs value preparation. Better capture before, during, and after a call correlates directly with better performance. The problem was real, validated, and worth solving.

Solution

I mapped the problem space into an Opportunity Solutions Tree. The outcome: improve the efficiency and accuracy of SDR note-taking. Three opportunity branches emerged, each with candidate solutions.

Opportunity

Prompts & Suggestions

Note template

Questionnaire

Opportunity

Automated Entry

Post-call AI note-taking ✓

Opportunity

Real-Time Assistance

Streaming conversational analysis

The prioritization decision was clear on an impact/effort matrix: AI post-call notes sat in the high-impact / low-effort quadrant. Real-time conversational analysis was technically compelling but high-effort with lower immediate user impact — deferred to the roadmap.

The chosen solution: use generative AI to analyze the call transcript post-call and surface a structured summary including talk time ratio, pain points, next steps, and follow-up items. The SDR could accept, edit, or discard before logging. No manual note-taking required.

The key design principle: the summary should appear as the call is ending, with minimal latency, so the SDR can move immediately to the next dial.

Building It

I worked in tight collaboration with the Lead Product Designer and engineering team from the start. The build moved through three distinct phases.

Proof of Concept

Engineering · Product

Created a local POC using training data from internal SDRs. I manually fed transcripts into an OpenAI prompt to generate summaries and compared the output against notes the SDRs had actually saved. This confirmed feasibility before any engineering resources were committed.

Prompt Engineering

Product · Engineering

I served as the prompt engineer throughout the project. The goal: structured extraction with minimal hallucinations and artifacts. We reverse-engineered the process — running real transcripts through prompts and comparing against actual SDR notes to iterate toward accuracy.

One key tension: increasing determinism (to reduce hallucinations) decreased the creativity of summaries, which produced its own quality issues. We landed on an 80% similarity threshold as the right balance — accurate enough to be useful, flexible enough to handle the variety of real conversations.

Pipeline Integration

Engineering

Connected pipelines to support streaming transcription, allowing the AI summary to be generated in near-real time as the call ended. Leveraged the existing design system component library to minimize frontend build time and hit the 4–6 week MVP target.

Design and Validation

Design · Product

Built four low-fidelity mockups with the Lead Product Designer, ran internal testing to refine design and UX placement, then developed medium-fidelity prototypes for structured user testing before finalizing the spec.

Launch

Pre-Release: Shadow Testing

Before shipping any UI, we released the feature to production without a visible interface. The AI was generating summaries in the background, and we compared them against the notes SDRs were actually saving. We evaluated similarity for every conversation and used those scores to drive nightly prompt refinements.

Target: 80% similarity. Attempts to push higher led to increased determinism and worse output quality — a real tradeoff that shaped the final model configuration.

Internal Beta

Launched with Orum's internal SDR team. Evaluated similarity scores each night and shipped prompt updates in response. Initial struggles with hallucinations and artifact-heavy outputs required rapid iteration on the creativity/determinism balance. Over two weeks, similarity ratings improved from 70% to 85%. That signal cleared the bar for external release.

External Beta

Released as a beta feature to clients on the top-tier package. Initial interest was strong. Two early challenges required rapid response:

UI placement: The placement of the AI summary panel wasn't immediately obvious to users — slowing initial adoption. Refined quickly based on session data.
Data concerns: Enterprise clients raised questions about whether their call data would be used to train the AI model. Addressed directly in the GTM messaging and onboarding flow.
Infrastructure: Network congestion caused API latency spikes. Implemented defensive coding to ensure the summarization process wouldn't be delayed or lost.

GTM

Built the GTM motion in parallel with engineering: LinkedIn posts to drive feature awareness, an email drip campaign to existing clients, and an Appcues onboarding flow to guide first-time users through the experience. Closed the loop with a post-use survey and an option to book time to discuss usability directly.

Post-Launch Refinement

Incorporated RLHF (human feedback as a reward signal to adjust model behavior) and HITL (human-in-the-loop intervention for edge cases) to continue improving accuracy after GA. Both mechanisms helped the model handle the long tail of unusual call scenarios without requiring a full model retrain.

Results

Within the first three months post-launch:

30% increase in SDR efficiency

90% note coverage (up from 40% baseline)

The feature became one of the top-cited reasons customers renewed. The gain wasn't just time saved per call — it was the downstream effect on account intelligence. Sales leaders and RevOps teams finally had reliable data to work with.

Lessons

01 Enterprise users need opt-out. Power users have established note-taking systems they trust. Forcing AI-generated notes on them created friction. Opt-out — especially at the account admin level — should have been in the MVP.

02 UI placement matters as much as the feature. We underestimated how much the placement of the summary panel affected adoption. Users who missed it in the first session were less likely to come back to it. First-time UX placement deserved more testing before GA.

03 AI sentiment inference is hard. The model could summarize facts reliably, but inferring sentiment from a sales conversation — whether a prospect was genuinely interested or just being polite — was inconsistent. A problem worth revisiting with RAG.

04 Determinism and creativity are in tension. Pushing the model toward higher accuracy made summaries more rigid and less useful in edge cases. The 80% similarity threshold was the right call — but it took real data to confirm it, not intuition.

05 Next: RAG. Accuracy is good without a custom language model, but retrieval-augmented generation would allow summaries to incorporate account-level context from the CRM — making them significantly more useful for multi-touch prospects.