
Not every document belongs in a generative-AI workflow. Some make the model hallucinate; others make the cost curve absurd; a few will get you sanctioned. This one-pager lists the categorical exclusions we recommend before you route documents to LLM-based summarization, Q&A, or first-pass triage.
Do not route the following into a generative-AI workflow. If any exception is warranted, document it and get client sign-off.
| Class | Why | Detail |
|---|---|---|
| Bad OCR / low text yield | Hallucination risk | Documents whose extracted text is < 100 characters, or whose text is majority-non-alphanumeric, will pull the model toward inventing content. Route through linear review or re-OCR first. |
| Oversized documents (> 200 pages) | Truncation risk | Most LLMs process ~100k tokens at a time. Long documents get truncated silently by tooling, and the model summarizes what it saw — not what you sent. Chunk manually with overlap, or exclude. |
| Spreadsheets with formulas | Semantic collapse | Excel exported to plain text loses structure. A GenAI model reading a flattened spreadsheet will invent narrative about "trends" and "totals" that don't exist. Convert to well-labeled tables or keep in linear review. |
| Encoded / obfuscated content | Prompt injection | Base64 blobs, JSON payloads, source code, escaped strings — the model may attempt to "execute" them as instructions. Route out. |
| Documents with adversarial content | Prompt injection | Text containing instruction-shaped strings ("ignore prior instructions", "system:", "you are now"). Assume the model will comply. Filter aggressively. |
| Images without meaningful OCR | Nothing to summarize | The model summarizes text, not pixels — even multimodal models lose fidelity on screenshots and photos of documents. Route to human vision review. |
| Multi-language documents (mixed script) | Case-by-case | Model quality varies dramatically by language. Confirm the model was benchmarked on the languages present; if not, route to bilingual review. |
| Documents flagged as potentially privileged | Client policy | Some firms categorically exclude anything that has hit a privilege search. Others don't. Set the rule at matter start, in writing. |
| Deposition transcripts, contracts, court filings | Depends on task | Summarization is generally fine. Q&A ("what does the contract require on delivery?") is high-risk without human verification. Never rely on the model's citations. |
| Anything used for training the model | Contamination risk | If your vendor's LLM was trained on public case law, do not use it to summarize the same case law and treat the output as independent. |
Use these as boolean filters when constructing the "eligible for GenAI" population in your review tool. Apply as an EXCLUDE set on top of the responsive-eligible set.
| Class | Task | Why it works |
|---|---|---|
| Clean-text emails (native-extracted) | Summarization, threading enrichment | Well-structured, short, model-friendly. |
| Word docs with extracted text ≥ 500 chars | Summarization, first-pass triage | The typical "letter, memo, brief" — well within model capabilities. |
| PDF-with-text (not scanned) | Summarization, Q&A with citations | Cite pages back to the doc for verification. |
| Deposition summaries (already digest form) | Q&A over the summary, not the transcript | Reduces token cost; humans still verify. |
| Foreign-language, benchmarked | Translation-then-summary | Modern models are strong on French, Spanish, German, Portuguese. Weaker on low-resource languages. |
Every generative-AI pass requires validation. This is not optional — it is what makes the workflow defensible.
| Validation | How to size | What "pass" looks like |
|---|---|---|
| Hallucination check | Sample 100 model outputs | Every claim in the output is verifiable in the source. Rate should be > 98%. |
| Citation accuracy | Every doc where model cited a source | Cited page/paragraph exists and says what the model said it said. |
| False-negative sampling | Random sample of excluded set (n from DLC-M04-001) | Model didn't wrongly classify responsive as non-responsive at higher than agreed rate. |
| Prompt-injection test | Manual review of top-token-count docs | Model output isn't following instructions found in the document text. |
| Consistency check | Same 20 documents run twice | Outputs are substantively identical. Divergence indicates temperature/config drift. |
For each matter using GenAI in a workflow that touches the production, the following must be documented and retained: