W DLC-M04-002 · GenAI Exclusion Rules.docx — Word
FileHomeInsertDesignLayoutReferencesReviewView
DLC-M04-002 · Module 04 Deliverable
GenAI Exclusion Rules
Matter

Not every document belongs in a generative-AI workflow. Some make the model hallucinate; others make the cost curve absurd; a few will get you sanctioned. This one-pager lists the categorical exclusions we recommend before you route documents to LLM-based summarization, Q&A, or first-pass triage.

Categorical Exclusions

Do not route the following into a generative-AI workflow. If any exception is warranted, document it and get client sign-off.

ClassWhyDetail
Bad OCR / low text yield Hallucination risk Documents whose extracted text is < 100 characters, or whose text is majority-non-alphanumeric, will pull the model toward inventing content. Route through linear review or re-OCR first.
Oversized documents (> 200 pages) Truncation risk Most LLMs process ~100k tokens at a time. Long documents get truncated silently by tooling, and the model summarizes what it saw — not what you sent. Chunk manually with overlap, or exclude.
Spreadsheets with formulas Semantic collapse Excel exported to plain text loses structure. A GenAI model reading a flattened spreadsheet will invent narrative about "trends" and "totals" that don't exist. Convert to well-labeled tables or keep in linear review.
Encoded / obfuscated content Prompt injection Base64 blobs, JSON payloads, source code, escaped strings — the model may attempt to "execute" them as instructions. Route out.
Documents with adversarial content Prompt injection Text containing instruction-shaped strings ("ignore prior instructions", "system:", "you are now"). Assume the model will comply. Filter aggressively.
Images without meaningful OCR Nothing to summarize The model summarizes text, not pixels — even multimodal models lose fidelity on screenshots and photos of documents. Route to human vision review.
Multi-language documents (mixed script) Case-by-case Model quality varies dramatically by language. Confirm the model was benchmarked on the languages present; if not, route to bilingual review.
Documents flagged as potentially privileged Client policy Some firms categorically exclude anything that has hit a privilege search. Others don't. Set the rule at matter start, in writing.
Deposition transcripts, contracts, court filings Depends on task Summarization is generally fine. Q&A ("what does the contract require on delivery?") is high-risk without human verification. Never rely on the model's citations.
Anything used for training the model Contamination risk If your vendor's LLM was trained on public case law, do not use it to summarize the same case law and treat the output as independent.

Filter Logic — Copy-Paste Ready

Use these as boolean filters when constructing the "eligible for GenAI" population in your review tool. Apply as an EXCLUDE set on top of the responsive-eligible set.

EXCLUDE FROM GENAI IF ANY OF: ExtractedTextLength < 100 OR PageCount > 200 OR FileExtension IN ('xlsx','xls','csv') AND FormulaCount > 0 OR FileType IN ('source_code','json','xml','base64') OR PrivilegeFlag = 'Potentially Privileged' OR TextContains(['ignore prior instructions','system:','you are now','disregard']) OR OCRConfidence < 0.85 OR LanguageDetection NOT IN ApprovedLanguages OR FileType IN ('audio','video') AND TranscriptExists = FALSE

Documents We Recommend For GenAI

ClassTaskWhy it works
Clean-text emails (native-extracted)Summarization, threading enrichmentWell-structured, short, model-friendly.
Word docs with extracted text ≥ 500 charsSummarization, first-pass triageThe typical "letter, memo, brief" — well within model capabilities.
PDF-with-text (not scanned)Summarization, Q&A with citationsCite pages back to the doc for verification.
Deposition summaries (already digest form)Q&A over the summary, not the transcriptReduces token cost; humans still verify.
Foreign-language, benchmarkedTranslation-then-summaryModern models are strong on French, Spanish, German, Portuguese. Weaker on low-resource languages.
The one universal rule Every GenAI output that a lawyer will rely on gets a spot-check by a human. Recall Mata v. Avianca: the sanction wasn't for using AI. It was for filing what the AI produced without checking. Context always matters.

Post-GenAI Validation

Every generative-AI pass requires validation. This is not optional — it is what makes the workflow defensible.

ValidationHow to sizeWhat "pass" looks like
Hallucination checkSample 100 model outputsEvery claim in the output is verifiable in the source. Rate should be > 98%.
Citation accuracyEvery doc where model cited a sourceCited page/paragraph exists and says what the model said it said.
False-negative samplingRandom sample of excluded set (n from DLC-M04-001)Model didn't wrongly classify responsive as non-responsive at higher than agreed rate.
Prompt-injection testManual review of top-token-count docsModel output isn't following instructions found in the document text.
Consistency checkSame 20 documents run twiceOutputs are substantively identical. Divergence indicates temperature/config drift.

Disclosure & Documentation

For each matter using GenAI in a workflow that touches the production, the following must be documented and retained:

  1. The vendor, model version, and any temperature / top-p settings used.
  2. The exclusion rules applied (this document, plus any matter-specific additions).
  3. The validation protocol run, and its results.
  4. Any prompt templates used, including system prompts.
  5. The name of the human reviewer who signed off on the GenAI output before it influenced a coding call.
Standing orders A growing number of federal courts (N.D. Tex., D.D.C., E.D. Pa. among them) require disclosure of AI use in filings. Check the local rules before every filing. When in doubt, disclose.