Skip to main content

Internet Archive will ignore robots.txt files to keep historical record accurate

internet archive robots txt server
Internet Archive
The Internet Archive has announced that going forward, it will no longer conform to directives given by robots.txt files. These files are predominantly used to advise search engines on which portions of the page should be crawled and indexed to help facilitate search queries.

In the past, the Internet Archive has complied with instructions laid out by robots.txt files, according to a report from Boing Boing. However, it has been decided that the way that these files are calibrated is often at odds with the service that the site sets out to provide.

“Over time we have observed that the robots.txt files that are geared toward search engine crawlers do not necessarily serve our archival purposes,” stated a blog post that the organization published last week. “Internet Archive’s goal is to create complete ‘snapshots’ of web pages, including the duplicate content and the large versions of files.”

Robots.txt files are increasingly being used to remove entire domains from search engines following their transition from a live, accessible site to a parked domain. If a site goes out of business, and is rendered inaccessible in this way, it also becomes unavailable for viewing via the Internet Archive’s Wayback Machine. The organization apparently receives queries about these sites on a daily basis.

The Internet Archive hopes that disregarding robots.txt files will help contribute to an accurate representation of prior points in the web’s history, removing their capacity to muddy the waters with instructions intended for search engines.

The organization has already ceased referring to robots.txt files on sites and pages related to the U.S. government and the U.S. military, to account for the enormous changes that can be made to domains between one administration and the next. This decision has caused no major problems, so there are high hopes that discontinuing the use of the files more broadly will be helpful.

Brad Jones
Former Digital Trends Contributor
Brad is an English-born writer currently splitting his time between Edinburgh and Pennsylvania. You can find him on Twitter…
4 CPUs you should buy instead of the Ryzen 7 7800X3D
AMD Ryzen 7 7800X3D sitting on a motherboard.

The Ryzen 7 7800X3D is one of the best gaming processors you can buy, and it's easy to see why. It's easily the fastest gaming CPU on the market, it's reasonably priced, and it's available on a platform that AMD says it will support for several years. But it's not the right chip for everyone.

Although the Ryzen 7 7800X3D ticks all the right boxes, there are several alternatives available. Some are cheaper while still offering great performance, while others are more powerful in applications outside of gaming. The Ryzen 7 7800X3D is a great CPU, but if you want to do a little more shopping, these are the other processors you should consider.
AMD Ryzen 7 5800X3D

Read more
Even the new mid-tier Snapdragon X Plus beats Apple’s M3
A photo of the Snapdragon X Plus CPU in the die

You might have already heard of the Snapdragon X Elite, the upcoming chips from Qualcomm that everyone's excited about. They're not out yet, but Qualcomm is already announcing another configuration to live alongside it: the Snapdragon X Plus.

The Snapdragon X Plus is pretty similar to the flagship Snapdragon X Elite in terms of everyday performance but, as a new chip tier, aims to bring AI capabilities to a wider portfolio of ARM-powered laptops. To be clear, though, this one is a step down from the flagship Snapdragon X Elite, in the same way that an Intel Core Ultra 7 is a step down from Core Ultra 9.

Read more
Gigabyte just confirmed AMD’s Ryzen 9000 CPUs
Pads on the AMD Ryzen 7 7800X3D.

Gigabyte spoiled AMD's surprise a bit by confirming the company's next-gen CPUs. In a press release announcing a new BIOS for X670, B650, and A620 motherboards, Gigabyte not only confirmed that support has been added for next-gen AMD CPUs, but specifically referred to them as "AMD Ryzen 9000 series processors."

We've already seen MSI and Asus add support for next-gen AMD CPUs through BIOS updates, but neither of them called the CPUs Ryzen 9000. They didn't put out a dedicated press release for the updates, either. It should go without saying, but we don't often see a press release for new BIOS versions, suggesting Gigabyte wanted to make a splash with its support.

Read more