
How to Write AI-Friendly Documents That Don’t Confuse Your LLM
If you’ve ever watched a large language model squint at your PDF and make confident nonsense, this post is for you. Writing for humans is one thing. Writing for humans and machines? That’s an art. The good news: a few practical tweaks can make your documents dramatically easier for an AI to understand, retrieve, and summarize without inventing a plot twist.
Below is a friendly, slightly cheeky guide to structuring documents so generative AI tools can do their best work.
I. Why structure matters for AI
Before an AI can use your document, it has to:
- Extract the text, usually via OCR for PDFs.
- Split the text into chunks small enough for the model’s context window.
- Retrieve the most relevant chunks using semantic similarity.
If your document makes those steps easy, you’ll get accurate, grounded answers. If not, you get hallucinations with a side of frustration.
II. The DOs: Make your document a delight for AI
- Use clear sections and consistent headings:
- Mark sections and sub-sections in a predictable way. Consistency helps both OCR and later chunking.
- Pro tip: Don’t get fancy with “creative” heading formats. Pick a style and stick to it.
🤖 Oh, a title in ALL CAPS, the next one in italics, and the third is a quote block? I love scavenger hunts.
- Keep each paragraph coherent and single-topic:
- Think “one idea per paragraph.” Your future self will thank you, too.
- Retrieval works by matching meaning. If a paragraph mixes unrelated facts, the model won’t know what it’s about.
🤖 Wait… this paragraph is about model accuracy, Tokyo weather, and your cat? My confidence just dropped 38%
- Group related information together:
- It’s better to have one thorough paragraph on a topic than three scattered mini-paragraphs saying similar things.
- This reduces the risk that chunking splits a topic in the wrong place and lowers retrieval quality.
- Prefer narrative text over tables for key facts:
- LLMs are trained primarily on text, not tabular data.
- If your insight lives in a table, restate the key points in prose right below it.
- Add short, precise examples:
- Examples help humans and models lock onto the topic. Keep them focused and on-theme.
- Examples help humans and models lock onto the topic. Keep them focused and on-theme.
- Use consistent terminology:
- If you call a thing “unit price” in one section and “per-item cost” in another, retrieval gets fuzzier. Choose one term.
III. The DON’Ts: Things that Trip AI up
- Long bullet lists without context:
- Chunking may split the list across sections, leaving orphaned bullets with no meaning.
- Alternative: convert to a short paragraph with commas or group bullets under subheadings.
- Images and diagrams without descriptions:
- Unless the system is explicitly multimodal, images are invisible. Add a brief text caption explaining what matters.
- Unless the system is explicitly multimodal, images are invisible. Add a brief text caption explaining what matters.
- Cross-references like “see Annex X” without in-line substance:
- Retrieval struggles when context is scattered. Summarize the essential content where it’s referenced.
- If you must reference, add a one or two-sentence recap in place.
🤖 See Annex X? Great. I’ll just teleport there with my non-existent document navigation powers.
- Contradictions:
- Two conflicting statements = a coin flip at retrieval time, or worse, a stuck model.
- Resolve contradictions or mark one as deprecated.
- Duplicate info at different detail levels without context:
- If both a general and a detailed version exist, specify the context for each. For example:
- “Ingredients overview for shoppers”
- “Ingredients for 8 servings for the cooking step”
- Without context, retrieval may pick the wrong one.
- If both a general and a detailed version exist, specify the context for each. For example:
- Tables as the only source of truth:
🤖 I see your beautiful 12-column spreadsheet.
Sadly, I was trained on text. I’ll just make something up, cool?
- Instead, keep the table, but mirror the key takeaways in prose. For instance:
- GPT costs $0.05 per email. For 10 emails, that’s $0.50. For 100, $5.00.”
- “Llama costs $0.006 per email. For 10 emails, $0.06. For 100, $0.60.”
IV. Practical Examples
- Good paragraph:
DeepSeek-R1 matches or exceeds ChatGPT on several benchmarks while costing much less per token.
For example, it scores 71.3 on GPQA Diamond and offers pricing around $1.10 per million tokens.
The main caveat is limited transparency around data handling policies.
- Not-so-good paragraph:
DeepSeek-R1 is strong, also the London Underground has 272 stations, and South Africa’s weather on Feb 12 was sunny.
- Good structure for a section:
## Model Performance and Cost
### Accuracy Benchmarks
### Token Pricing
### Data Governance Considerations
- Better than a big table:
GPT costs $0.05 per email.
Llama costs $0.006 per email.
At 100 emails, that’s $5.00 vs $0.60 respectively.
V. A simple Checklist You Can Reuse
Before publishing or handing a doc to your AI assistant, check:
- Headings and subheadings are clear and consistent
- One idea per paragraph
- Related information is grouped together
- Key facts are stated in text, even if you use tables or images
- No contradictions
- Similar topics use consistent terminology
- If duplicate info exists, each version has explicit context
If you nod yes to these, your LLM will nod back with better answers.
VI. Final thoughts
Writing for AI isn’t about turning your document into a robot manual. It’s about being kind to the future reader who may be an AI system trying to help a human quickly. Clear structure, coherent paragraphs, and text-first facts let your tools shine. And when your AI stops hallucinating, you can finally stop squinting.