Analytics Break-Even Calculator

X DLC-M05-001 · Analytics Break-Even Calculator.xlsx — Excel

FileHomeInsertPage LayoutFormulasDataReviewView

C8 ƒ_x =IF(AnalyticsSpend<LinearCost-AnalyticsCost, "YES", "NO")

DLC-M05-001 · Live Calculator

Analytics Break-Even Calculator

Matter:

Input Calculated Adjust yellow cells — green cells recompute automatically.

	A	B	C	D	E	F	G
1	Corpus & Baseline
2	Parameter	Value	Unit	Notes
3	Corpus size		docs	Total documents eligible for review after processing.
4	Expected richness		decimal	Best guess at responsive rate.
5	Linear review cost per doc		USD	Contract reviewer, all-in. Include QC and platform.
6	Baseline linear-review cost	—	USD	N × cost/doc.
7	Analytics Options — Compare
8	Analytic	Fixed Cost	Per-Doc	Docs to Process	Total Cost	Docs Eliminated	Break-Even?
9	Near-duplicate detection			—	—	%	—
10	Email threading (inclusive)			—	—	%	—
11	Concept clustering			—	—	%	—
12	Communication analytics			—	—	%	—
13	Entity extraction / PII			—	—	%	—
14	TAR 2 · Continuous Active Learning			—	—	%	—
15	GenAI first-pass summarization			—	—	%	—
16	GenAI Q&A over corpus			—	—	%	—
17	Recommended Stack & Total
18	Metric	Value
19	Analytics that break even	—
20	Total analytics spend (checked)	—
21	Estimated docs eliminated (combined)	—
22	Estimated total cost with analytics	—
23	Savings vs. linear review	—
24	Guidance
25	Elimination rates are cumulative and non-additive. Applying near-duplicate detection AND email threading does not eliminate 55% of documents — the two overlap. This calculator uses the maximum single elimination rate as the conservative floor; use in-tool testing to confirm actual overlap.
26	When analytics fail to break even: the corpus is likely too small, richness is too low, or the analytic is priced for a larger matter than yours. Ask the vendor if a fixed-fee model exists for corpora your size.
27	Do not stack GenAI Q&A with GenAI summarization — they solve different problems and combined they inflate cost without meaningfully reducing linear review.

Numbers are illustrative. Real vendor pricing varies. Use DLC-M05-002 for a snapshot of published 2026-Q1 rates.

Break-Even Vendor Rates Overlap Matrix Scenarios Ready · 100%

W Analytics Break-Even — Printed Reference Sheet

DLC-M05-001 · Printed Reference

Analytics Break-Even — Rules of Thumb

Rev.

The core question

Every analytic — from near-dupe detection to GenAI Q&A — has a cost. That cost is worth paying only when it eliminates more review time than it costs to run. This reference gives you the shortcuts to answer that question at the SOW stage, before you commit.

Break-even matrix

Analytic	Min corpus	Sweet spot	Best when
Near-duplicate	Any	Any	Always. It's cheap and it works.
Email threading (inclusive)	> 5k emails	50k–2M emails	Volume is email-heavy; threading is standard practice.
Concept clustering	~30k	75k–500k	Corpus is text-rich and you don't know the vocabulary yet.
Communication analytics	~10k emails/msgs	50k+ comms	Custodian identification matters; social-graph analysis needed.
Entity extraction / PII	Any	Any regulated-data matter	You need to find named entities or PII at scale.
TAR 2 · CAL	~30k	75k–2M	Corpus is large, richness is moderate, reviewer hours are the constraint.
GenAI summarization	Any	10k–200k	Documents are long-form (contracts, reports); reviewer time on first-read is the bottleneck.
GenAI Q&A	Any	Not size-driven — task-driven	Investigation phase, deposition prep, or issue-focused fact-finding.

What vendors tend to charge for

Fixed setup / index build — often one-time per matter.
Per-document processed — the recurring cost that scales with corpus.
Per-query, per-search — some AI-enabled search charges apply here.
Per-model-call — GenAI tools price per LLM API call, sometimes with token pass-through.
Regex vs. AI-search premiums — many platforms charge more for AI-enabled searches than for regular boolean; ask.

The pricing trap Vendors quote list price. On matters of any size, negotiate. On matters of significant size, do not accept the first quote — competitive pricing exists. Fair is fair.