ai

What is LLMs.txt and Should You Add It to Your Website?

What is LLMs.txt and Should You Add It to Your Website?
what is llms.txt and should you add it to your website cover

Alright, so I’m not too confident about this one. Usually, when I write something, I like having an opinion about it before I write. But unfortunately, we live in a world where  AI “experts” are actually just hype marketers.

To be honest, it shouldn’t be this difficult. Or confusing. It should be straightforward, but unfortunately, like most things out of Silicon Valley, the so-called AI revolution is starting to look less disrupt-y and more tech bubble-y.

And if you’re a long-term reader, you know how much I loathe bubbles. Especially since 2022.

In fact, this blog’s happening because a report was submitted at work. And it includes a suggestion I made months ago which was ignored back then. But now it’s oh so critical.

And the suggestion’s related to the title of this blog: Llms.txt. Or LLMs.txt. I’ve seen it spelled both ways, and I’m not sure anyone knows which one is correct. 

But that’s not the point. 

Before I get carried away again, let me present to you the newest file to add to your site for AI visibility: LLMs.txt (we’re sticking to this spelling for now).

What exactly is LLMs.txt?

Well, if you look at the website, llmstxt.org, it’s pretty simple. And it kind of is, which is why I don’t get why there’s a debate here.

But we’ll get to that in a bit. What you should know is that, at its core, LLMs.txt is just a text file you can put on your website. The idea is that it tells AI tools—like ChatGPT or Google Gemini—what parts of your site are worth paying attention to. 

So, instead of crawling through every sidebar, footer, or random line of code, an AI can read this file and see the “highlight reel” of your content.

The concept was introduced in late 2024 by Jeremy Howard, co-founder of Answer.AI, who argued that language models waste too much time trying to make sense of messy webpages. LLMs.txt was his shortcut.

Where Does LLMs.txt Go?

Well, I don’t know about you but my website doesn’t have LLMs.txt at all right now. And I checked a few sites out (Semrush.com, SurferSEO.com etc) to see if everyone else was using it, and I just found 404 errors here.

But for the sake of this blog, let’s talk about where it should be if you’re going to use it.

Like sitemap.xml and robots.txt, the LLMs.txt file should sit in the root of your site. In plain English, this just means that if someone types yoursite.com/llms.txt into a browser, the file shows up. It’s kind of like the yoursite.com/sitemap.xml thing.

Why was LLMs.txt Created?

So, as I mentioned briefly before, Jeremy Howard is the one who came up with this. The LLMs.txt website does include his name and even the publication date: September 3, 2024. And it’s presented like a paper/proposal hybrid for AI bots. And the logic there is pretty sound. 

Howard’s argument is simple: language models scraping the web were drowning in junk code, sidebars, and ads. Meanwhile, publishers are frustrated because their content is used without credit or consent.

This is an actual issue. So, LLMs.txt is the a middle ground. It’s a way for site owners to say what can and can’t be used. Of course, whether that’s actual control or just the illusion of it is still up for debate.

Here’s the backdrop behind this:

  • AI scraping exploded: Bots like OpenAI’s GPTBot started crawling websites to feed training data.
  • Publishers pushed back: News outlets, SEO experts, and content creators complained about their work used to train these models or provide answers to users without credit.
  • Guardrails needed: LLMs.txt promises a way to set boundaries, kind of like robots.txt for search engines.

The issue is that, just like robots.txt, it only works if the bots respect it. “Good actors” might follow your rules. Everyone else? Not so much. And this is where it gets a bit murky.

Why was LLMs.txt Created?
why’s llms.txt created?

How Does LLMs.txt Work?

Okay, now that we know what it is and why it’s here, let’s look at how LLMs.txt works.

First of all, you need to understand that it’s not a robots.txt clone. LLMs.txt is a Markdown file LLMs read to get a clean “map” of your site at inference time (i.e., when they’re trying to answer a question using your content), so they don’t have to wade through HTML noise. The spec defines a simple structure LLMs can parse reliably:

  • H1 title (Required): Name of the site/project.
  • Short blockquote summary: One-paragraph context that helps the model understand what the site is about.
  • Optional free-text notes: Any crucial “read me first” guidance.
  • H2 “file list” sections: Each section is a bullet list of canonical links with optional one-line descriptions (e.g., “Docs,” “API,” “Guides”). This is the important stuff; it points models to the good stuff directly.

What Happens Behind the Scenes:

  1. An AI agent checks for /llms.txt and loads it.
  2. It parses the Markdown sections and link lists, treating them as a curated index of authoritative resources.
  3. Some tooling can convert that into a structured context (e.g., an XML bundle) to feed models that prefer strict schemas.

What This Does:

  • AI models can skip nav, ads, and JS cruft and jump straight to docs/primary pages you’ve listed.
  • You can signal priority and give lightweight guidance without reinventing your site architecture.

Here are the Limitations:

  • It’s advisory. There’s no enforcement; it only helps if the agent actually looks for and respects llms.txt.
  • It aims at inference-time use. It’s not a legal control over training data. Manage your expectations.

Which LLMs Currently Respect It?

And this is where things get a bit troublesome. Even though it’s been suggested and Semrush does mention that you’re missing it when you run an audit, no LLM is currently respecting it.

In fact,  Google’s John Mueller even weighed in on BlueSky

“FWIW no AI system currently uses LLMs.txt.” 

He added:

“I mean, I’m not the police, nor would they have anything to say about what you add to your site (well, mostly :-)). Here’s an idea – add something wild and unmentioned anywhere else into the file, and see if you can trigger it over time.”

And this is where it gets weird for me. Because if you ask the chatbots themselves, they’ll say they use or respect LLMs.txt. But we have an actual person working on Search at Google, who says that’s not the case. 

What Do We Do with this Information?

We’re having debates about it in the SEO space. In fact, the third party SEO agency we use at work also mentioned that they don’t know if they should recommend this.

You have AI shills loving it, but in practice no one’s really doing anything about it. Which is strange because this isn’t how it was when sitemaps and robots.txt files were introduced. I mean, to be fair, it’s possible that the historical retelling I’ve found about their adoption is sanitised. I mean, I was a literal kid when those things happened so I don’t know how the industry was doing.

And also social media wasn’t that big of a deal back then. But even talking to older SEOs, I’ve learned that it was a gradual shift and the industry adopted it completely, unanimously.

It wasn’t how chaotic the debate around LLMs.txt has been so far. OpenAI’s GPTBot, Anthropic’s ClaudeBot, and PerplexityBot aren’t even requesting the file. Google sometimes pings it, but that’s more out of curiosity than compliance.

So, What Does This Mean Right Now?

  • OpenAI (GPTBot): not using it.
  • Anthropic (ClaudeBot): not using it.
  • Google (GeminiBot): not using it (yet).
  • Meta: no comment, no evidence.
  • Other scrapers: lol, no. 

LLMs.txt isn’t a standard yet. It’s an idea that hasn’t caught on. The bots that matter aren’t looking for it, and the ones you don’t want around aren’t going to respect it anyway.

And as someone who has to make the decision on adding it to both my site, my freelance clients’ sites, and my company’s site, I don’t know what to do. Maybe I’m too chronically online and need to touch grass? Possibly.

But I’ve been ranting about AI (a bit here) and more so at work for over a year now, and it’s only now that SEO agencies are freaking out and offering services related to AI overviews or GEO or whatever they’re calling it this week.

Maybe this is how new tech and methodologies have always come around and now thanks to social media, we can finally see the chaos.

I’ll stop getting too contemplative here. But let me know what you think!

Should You Add an LLMs.txt File?

This is the big question. And the answer is…maybe.

There are a few decent reasons why adding one could make sense:

Pros:

  • Tells the “good bots” to back off: If OpenAI or Google ever decide to respect it, your boundaries will already be in place.
  • Protects premium or gated content: At least in theory, you can flag what you don’t want scraped.
  • Future-proofs you: If LLMs.txt does become a standard, you won’t be scrambling later.

But there are just as many downsides too.

Cons:

  • Doesn’t stop rogue crawlers: Bad actors will ignore it completely.
  • Could hurt visibility in AI results: If chatbots ever use it, blocking too much might keep your site out of AI-powered answers.
  • Not enforceable: It’s a request, not a law. If someone ignores it, your only recourse is frustration.

So should you bother? Right now, it really just depends. If you’re the type who likes drawing lines early, setting one up doesn’t hurt. But if you’re expecting it to actually stop scrapers today… it won’t. 

That’s also why I haven’t added this file yet. Because none of the big guys are doing it yet. And I’m a tiny fish in a big pond, and I’m also a bit worried if this file will attract the wrong kind of AI scrapers. 

The truth is that we don’t know what’s up here. Yet. 

But obviously, if this becomes the industry standard in future, I’ll be among the first to let you guys know here. And provide a tutorial on how to add it!

Should You Add an LLMs.txt File
should you add an llms.txt file?

Leave a Reply

Your email address will not be published. Required fields are marked *