Public Accountability9 min read

What Metadata Tells You That Documents Won't

Every digital document carries a hidden autobiography. Who created it. When. With what software. On whose computer. This information exists whether the author wants it to or not, and it tells a story the document's contents were never meant to reveal.

Every digital document carries a hidden autobiography.

Who created it. When. With what software. On whose computer. Whether it was modified, how many times, and by whom. This information exists whether the author wants it to or not. It's baked into the file like a fingerprint pressed into wet concrete — invisible to anyone looking at the surface, permanent to anyone who knows where to dig.

It's called metadata. And once you learn to read it, you will never look at a government document the same way again.

I didn't start reading metadata because I'm a forensic analyst or because I have some kind of tech fetish. I started because I was looking at public records that didn't feel right. The dates seemed off. The formatting was inconsistent. Something about the documents felt assembled — like someone had built them to look official rather than creating them through the normal workflow that generates actual official documents.

So I looked underneath. And what I found was that documents lie but their metadata doesn't.


The Autobiography You Didn't Know You Were Writing

When you create a Word doc, an Excel sheet, or a PDF, your software embeds information about the file that doesn't appear anywhere in the visible content. You can't see it by reading the document. You can see it by reading the file. The author field. The name or username of whoever created it, pulled from the software's user profile — not typed in manually. If a document says it was created by "jsmith" but John Smith left the organization three years ago, that's not a typo. That's a question mark with a salary attached.

The creation date. When the file was first saved. Not when it was printed. Not when it was signed. Not when it claims to be from. When the actual bits were written to disk. A contract that says "Effective Date: January 2019" but has a creation timestamp of March 2025 is a document with an identity crisis that deserves your attention.

The modification history. How many times the file was edited and when. A "finalized" agreement that shows 47 modifications across 18 months isn't finalized — it's a living document wearing a finished costume.

The producing software. What application generated the file. This is where it gets fun. Government agencies use standardized systems: PeopleSoft for financials, SharePoint for document management, Microsoft Office through enterprise deployments with specific versioning. When a supposedly official agency document was generated by Adobe Acrobat.com CombinePDF Service — a free public web tool your nephew uses to merge homework PDFs — that's not a standard agency workflow. That's someone building a document outside the system and hoping you don't notice.

Embedded libraries. PDFs and Office documents sometimes reference the software libraries used to create them. If you find Apache POI (a Java library for programmatic spreadsheet manipulation) or PDFBox (a Java library for programmatic PDF creation) in a document that should have come out of an agency's normal business systems, you're looking at a file that was built by code. Not by a person clicking through menus. By a script. Someone wrote a program to generate that document. Think about why someone would do that and then tell me your blood pressure didn't change.


Why This Should Piss You Off

Public agencies produce documents through established workflows. Purchase orders come out of financial systems. Contracts go through legal review. HR records live in HR platforms. Each system leaves a consistent, verifiable fingerprint in the metadata — like a postmark on an envelope that tells you where it was actually mailed from, regardless of what the return address says.

When you request documents under the Public Records Act or FOIA, you're supposed to receive records as they exist in the agency's systems. The metadata should reflect the normal lifecycle of institutional documents — created during business operations, modified through standard review processes, stored on agency infrastructure.

When the metadata doesn't match that pattern, you have one of three situations: Innocent error. Someone creates a template on their personal laptop before importing it to the agency system. The author field shows their home username. The creation date is a weekend. Happens. Doesn't mean anything sinister.

System migration artifact. Agency switches platforms, documents get re-stamped. A 2018 document shows a 2022 creation date because that's when the SharePoint migration ran. Annoying but explicable.

Fabrication. Multiple documents with impossible authorship. Creation dates that postdate the events they describe by years. Production tools that aren't part of the agency's standard software suite. Files named with one date but metadata-stamped with another. Formatting that doesn't match the agency's document templates.

When you see one anomaly, it's noise. When you see a pattern of anomalies — documents that all share the same wrong author, the same non-agency production tool, the same suspiciously recent creation dates for allegedly historical records — you're not looking at errors. You're looking at a construction project. Someone built these documents after the fact and hoped you'd accept the surface without checking the foundation.


How to Check It Yourself

You don't need a computer science degree. You don't even need software you'd have to pay for.

On a Mac: Right-click any file. Get Info. Creation date, modification date, sometimes the producing application. Right there. Free. Three seconds.

On Windows: Right-click. Properties. Details tab. Author, creation date, modification date, application. Same deal.

For PDFs: A free command-line tool called exiftool will show you the full metadata profile — producer, creator application, embedded fonts, modification history. There are also online PDF metadata viewers if command line isn't your thing. Drag. Drop. Read.

For Excel files: Open it in a hex editor or use Python's openpyxl library to read the internal XML properties. Author, creation date, the application that generated the file, revision count. All there, all accessible, all things the person who created the file probably assumed you'd never look at.

The point isn't to become a forensic expert. The point is to know the information exists. That documents aren't just what's printed on the page — they're also what's written into the file. And the people who build documents outside normal systems are banking on one thing: that you'll read the content and skip the container.

Don't skip the container.

What This Actually Means

Metadata analysis isn't about catching people on technicalities or playing gotcha with file timestamps. It's about a basic question that most people never ask: are these documents what they claim to be?

When a government agency produces records in response to a public records request, those records carry the weight of official documentation. Court decisions get made based on them. Legal proceedings rely on them. Public trust is built on top of them. If the metadata tells a different story than the content — if the hidden autobiography contradicts the visible one — that's not a footnote. That's a credibility problem that taints everything stacked on top of those records.

I didn't get into metadata because I enjoy reading hex dumps at midnight. I got into it because something felt wrong and I needed to understand why. What I learned was that the most honest part of any document is the part the author didn't know was there.

The content tells you what someone wants you to believe. The metadata tells you what actually happened.

Pay attention to both.

Love that for transparency.