X DLC-M05-001 · Analytics Break-Even Calculator.xlsx — Excel
FileHomeInsertPage LayoutFormulasDataReviewView
C8 ƒx =IF(AnalyticsSpend<LinearCost-AnalyticsCost, "YES", "NO")
DLC-M05-001 · Live Calculator
Analytics Break-Even Calculator
Matter:
Input Calculated Adjust yellow cells — green cells recompute automatically.
ABCDEFG
1Corpus & Baseline
2ParameterValueUnitNotes
3Corpus sizedocsTotal documents eligible for review after processing.
4Expected richnessdecimalBest guess at responsive rate.
5Linear review cost per docUSDContract reviewer, all-in. Include QC and platform.
6Baseline linear-review costUSDN × cost/doc.
7Analytics Options — Compare
8AnalyticFixed CostPer-DocDocs to ProcessTotal CostDocs EliminatedBreak-Even?
9Near-duplicate detection%
10Email threading (inclusive)%
11Concept clustering%
12Communication analytics%
13Entity extraction / PII%
14TAR 2 · Continuous Active Learning%
15GenAI first-pass summarization%
16GenAI Q&A over corpus%
17Recommended Stack & Total
18MetricValue
19Analytics that break even
20Total analytics spend (checked)
21Estimated docs eliminated (combined)
22Estimated total cost with analytics
23Savings vs. linear review
24Guidance
25Elimination rates are cumulative and non-additive. Applying near-duplicate detection AND email threading does not eliminate 55% of documents — the two overlap. This calculator uses the maximum single elimination rate as the conservative floor; use in-tool testing to confirm actual overlap.
26When analytics fail to break even: the corpus is likely too small, richness is too low, or the analytic is priced for a larger matter than yours. Ask the vendor if a fixed-fee model exists for corpora your size.
27Do not stack GenAI Q&A with GenAI summarization — they solve different problems and combined they inflate cost without meaningfully reducing linear review.
Numbers are illustrative. Real vendor pricing varies. Use DLC-M05-002 for a snapshot of published 2026-Q1 rates.
Break-Even Vendor Rates Overlap Matrix Scenarios Ready · 100%
W Analytics Break-Even — Printed Reference Sheet
DLC-M05-001 · Printed Reference
Analytics Break-Even — Rules of Thumb
Rev.

The core question

Every analytic — from near-dupe detection to GenAI Q&A — has a cost. That cost is worth paying only when it eliminates more review time than it costs to run. This reference gives you the shortcuts to answer that question at the SOW stage, before you commit.

Break-even matrix

AnalyticMin corpusSweet spotBest when
Near-duplicateAnyAnyAlways. It's cheap and it works.
Email threading (inclusive)> 5k emails50k–2M emailsVolume is email-heavy; threading is standard practice.
Concept clustering~30k75k–500kCorpus is text-rich and you don't know the vocabulary yet.
Communication analytics~10k emails/msgs50k+ commsCustodian identification matters; social-graph analysis needed.
Entity extraction / PIIAnyAny regulated-data matterYou need to find named entities or PII at scale.
TAR 2 · CAL~30k75k–2MCorpus is large, richness is moderate, reviewer hours are the constraint.
GenAI summarizationAny10k–200kDocuments are long-form (contracts, reports); reviewer time on first-read is the bottleneck.
GenAI Q&AAnyNot size-driven — task-drivenInvestigation phase, deposition prep, or issue-focused fact-finding.

What vendors tend to charge for

  1. Fixed setup / index build — often one-time per matter.
  2. Per-document processed — the recurring cost that scales with corpus.
  3. Per-query, per-search — some AI-enabled search charges apply here.
  4. Per-model-call — GenAI tools price per LLM API call, sometimes with token pass-through.
  5. Regex vs. AI-search premiums — many platforms charge more for AI-enabled searches than for regular boolean; ask.
The pricing trap Vendors quote list price. On matters of any size, negotiate. On matters of significant size, do not accept the first quote — competitive pricing exists. Fair is fair.