Comparison of web archiving services
The table below compares a number of web page archiving/snapshotting services I've used over the years. These are great for making a mirror of a web page on demand, that you can use later for a variety of purposes:
- as a copy, in case the original page is no longer accessible for whatever reason
- in particular, to cite scholarly articles and rest assured that the citation will work for years to come (this is the primary application of WebCite)
- as proof of the state of a particular page at a particular point in time
- since the copy of the page is stored by a 3rd party service, you could conceivably use it as legal evidence. For instance, if a web page is infringing on your intellectual property or impersonating you, you can take a web snapshot of it, which would carry far more weight than a local screenshot
- to search within the history of the web pages you've thus bookmarked - Pinboard (not compared below) offers this as a premium service
Candidates
Feature | Archive.is | WebCite | iterasi | freezePAGE | backupURL | Citebite | Rooh.it |
---|---|---|---|---|---|---|---|
Established | May 2012 | Feb 1999 | ? | ? | ? | ? | ? |
Cost | free | free, considering premium features | ? | ? | ? | ? | ? |
Expiration | none mentioned | No new material accepted after end of 2013 unless funding goal is met | ? | 31 days (!), except for premium accounts | Down itself, as of 2012-Sep-19 | ? | ? |
Archive 404 pages | yes | no | yes | no | no | ? | ? |
Archive pages requiring login | no | no | yes, using the browser extension | ? | no | ? | ? |
Archive embedded elements | yes | yes | yes | Expired - yes, within limitations | yes | ? | ? |
Archive hashtag pages | yes | no | ? | ? | ? | ? | ? |
Archive scripts (e.g. Google Maps) | yes, but scripting disabled, and page is 1024x768 | broken | broken, and scripting disabled | allegedly yes, but fails | no | no | no |
Archive Twitter status pages | yes | text disappears | yes | Expired - yes, with limitations | ? | ? | ? |
Bookmarklet | yes | yes | yes | browser button | no | yes | yes, but didn't work |
Browser extension | no | no | yes, IE7 and Firefox | ? | no | ? | ? |
DOI support | no | assign/retrieve | no | no | no | no | no |
Limitations | 1024 pixel width | none by design | 750Kb for page + embedded elements; 5MB storage space for unregistered users, 10 MB for registeres users; accounts closed after 30 days of inactivity (unregistered users) or 60 days (registered users), and archived URLs are deleted along with the account; no SSL pages | ? | Must enter some text that's found on the page, to highlight | ||
Override robots tags | yes | no, no1 (by design, to avoid copyright violations) | yes | yes | ? | ? | ? |
Override no-cache tag |
yes | no (by design, to avoid copyright violations) | yes, even when not using the browser extension | ? | ? | ? | ? |
Organize pages | no | can enter keywords, but apparently can't search by them | tags, folders, by date, search, publish collection, RSS | folders | no | no | ? |
Pop-up support (closing stays in archive) | no | yes | ? | ? | ? | ? | ? |
Private archiving | no | always | optional | ? | ? | ? | ? |
Signup required | no | no | required | optional with benefits (10MB storage space, must login once every 31 days to keep account active); otherwise, identified by browser cookie, with half the signed up allowances | optional, no benefits | no | ? |
Short URL | 15 characters | 32 characters | 21 characters | 42 characters | 27 characters | yes | ? |
Transparent URLs2 | all snapshots, not yours | yes | no | no | no | no | ? |
Usability | great; progress indicator | must enter an e-mail address to received archiving confirmation at | must navigate to "Completed archives" after archiving a page and wait for queuing | ? | must enter email to archive; no PHP error page | good | never worked for me |
View other snaphots of a URL | yes | yes | no | ? | ? | ? | ? |
Will be around for years | ? | probably, if it manages to raise $50k | ? | ? | ? | ? | ? |
eXtra features | 1024x768 image screenshot for extra accuracy, life feed, all domains archived | tags, description | ? | ? | ? | ? |
Footnotes
When attempting to archive certain pages, the notification email lists among the possible reasons for failure "The site in question refuses connections by crawling robots". Overriding the robots.txt file is ethically right because there is an actual user behind the request. ↩
A transparent URL like
http://www.webcitation.org/query?url=http%3A%2F%2Funcyclopedia.wikia.com%2Fwiki%2FNihilism&date=2009-08-23
preserves the original URL in case the archiving service is down. If a service with opaque URLs likehttps://iterasi.net/Viewer.aspx?RootAssetID=2961840
is down, there is no way of retrieving the original URL. One can bookmark transparent URLs in confidence that if the original page outlasts the archiving service, its URL will still be available. ↩
Showing changes from previous revision. Removed | Added