How to Write AI-Friendly Documents That Don’t Confuse Your LLM

Insights

5 min read

If you’ve ever watched a large language model squint at your PDF and make confident nonsense, this post is for you. Writing for humans is one thing. Writing for humans and machines? That’s an art. The good news: a few practical tweaks can make your documents dramatically easier for an AI to understand, retrieve, and summarize without inventing a plot twist.

Below is a friendly, slightly cheeky guide to structuring documents so generative AI tools can do their best work.

I. Why structure matters for AI

Before an AI can use your document, it has to:

Extract the text, usually via OCR for PDFs.
Split the text into chunks small enough for the model’s context window.
Retrieve the most relevant chunks using semantic similarity.

If your document makes those steps easy, you’ll get accurate, grounded answers. If not, you get hallucinations with a side of frustration.

II. The DOs: Make your document a delight for AI

Use clear sections and consistent headings:
- Mark sections and sub-sections in a predictable way. Consistency helps both OCR and later chunking.
- Pro tip: Don’t get fancy with “creative” heading formats. Pick a style and stick to it.

🤖 Oh, a title in ALL CAPS, the next one in italics, and the third is a quote block? I love scavenger hunts.

Keep each paragraph coherent and single-topic:
- Think “one idea per paragraph.” Your future self will thank you, too.
- Retrieval works by matching meaning. If a paragraph mixes unrelated facts, the model won’t know what it’s about.

🤖 Wait… this paragraph is about model accuracy, Tokyo weather, and your cat? My confidence just dropped 38%

Group related information together:
- It’s better to have one thorough paragraph on a topic than three scattered mini-paragraphs saying similar things.
- This reduces the risk that chunking splits a topic in the wrong place and lowers retrieval quality.
Prefer narrative text over tables for key facts:
- LLMs are trained primarily on text, not tabular data.
- If your insight lives in a table, restate the key points in prose right below it.
Add short, precise examples:
- Examples help humans and models lock onto the topic. Keep them focused and on-theme.
Use consistent terminology:
- If you call a thing “unit price” in one section and “per-item cost” in another, retrieval gets fuzzier. Choose one term.

III. The DON’Ts: Things that Trip AI up

Long bullet lists without context:
- Chunking may split the list across sections, leaving orphaned bullets with no meaning.
- Alternative: convert to a short paragraph with commas or group bullets under subheadings.
Images and diagrams without descriptions:
- Unless the system is explicitly multimodal, images are invisible. Add a brief text caption explaining what matters.
Cross-references like “see Annex X” without in-line substance:
- Retrieval struggles when context is scattered. Summarize the essential content where it’s referenced.
- If you must reference, add a one or two-sentence recap in place.

🤖 See Annex X? Great.
I’ll just teleport there with my non-existent document navigation powers.

Contradictions:
- Two conflicting statements = a coin flip at retrieval time, or worse, a stuck model.
- Resolve contradictions or mark one as deprecated.
Duplicate info at different detail levels without context:
- If both a general and a detailed version exist, specify the context for each. For example:
  - “Ingredients overview for shoppers”
  - “Ingredients for 8 servings for the cooking step”
- Without context, retrieval may pick the wrong one.
Tables as the only source of truth:

🤖 I see your beautiful 12-column spreadsheet. Sadly, I was trained on text. I’ll just make something up, cool?

Instead, keep the table, but mirror the key takeaways in prose. For instance:
- GPT costs $0.05 per email. For 10 emails, that’s $0.50. For 100, $5.00.”
- “Llama costs $0.006 per email. For 10 emails, $0.06. For 100, $0.60.”

IV. Practical Examples

Good paragraph:

DeepSeek-R1 matches or exceeds ChatGPT on several benchmarks while costing much less per token. For example, it scores 71.3 on GPQA Diamond and offers pricing around $1.10 per million tokens. The main caveat is limited transparency around data handling policies.

Not-so-good paragraph:

DeepSeek-R1 is strong, also the London Underground has 272 stations, and South Africa’s weather on Feb 12 was sunny.

Good structure for a section:

## Model Performance and Cost ### Accuracy Benchmarks ### Token Pricing ### Data Governance Considerations

Better than a big table:

GPT costs $0.05 per email. Llama costs $0.006 per email. At 100 emails, that’s $5.00 vs $0.60 respectively.

V. A simple Checklist You Can Reuse

Before publishing or handing a doc to your AI assistant, check:

Headings and subheadings are clear and consistent
One idea per paragraph
Related information is grouped together
Key facts are stated in text, even if you use tables or images
No contradictions
Similar topics use consistent terminology
If duplicate info exists, each version has explicit context

If you nod yes to these, your LLM will nod back with better answers.

VI. Final thoughts

Writing for AI isn’t about turning your document into a robot manual. It’s about being kind to the future reader who may be an AI system trying to help a human quickly. Clear structure, coherent paragraphs, and text-first facts let your tools shine. And when your AI stops hallucinating, you can finally stop squinting.

How to Write AI-Friendly Documents That Don’t Confuse Your LLM

I. Why structure matters for AI

II. The DOs: Make your document a delight for AI

III. The DON’Ts: Things that Trip AI up

IV. Practical Examples

V. A simple Checklist You Can Reuse

VI. Final thoughts

Exploring AI or Looking to Move Forward?

Related applications

AI Assisted Deviation Management

Skwiz for Intelligent Document Processing

MailFlow for Customer Support Automation

How to Write AI-Friendly Documents That Don’t Confuse Your LLM

I. Why structure matters for AI

II. The DOs: Make your document a delight for AI

III. The DON’Ts: Things that Trip AI up

IV. Practical Examples

V. A simple Checklist You Can Reuse

VI. Final thoughts

Exploring AI or Looking to Move Forward?

Related Resources

Explore more practical AI insights.

Combining LLMs and Knowledge Graphs: A Practical Guide

When AIs Start Talking: Multi-Agent Systems Explained Simply

Evolution of LLM Prompt Engineering

Combining LLMs and Knowledge Graphs: A Practical Guide

When AIs Start Talking: Multi-Agent Systems Explained Simply

Evolution of LLM Prompt Engineering

Related applications

AI Assisted Deviation Management

Skwiz for Intelligent Document Processing

MailFlow for Customer Support Automation