-
Notifications
You must be signed in to change notification settings - Fork 33
Wayback Machine APIs
Akash Mahanty edited this page Feb 18, 2022
·
8 revisions
There are 3 public APIs (Save, Availability, and CDX) of the Wayback Machine, this page contains basic information about these 3 APIs.
- Endpoint: https://web.archive.org/save/
- Purpose: Saving/Creating new archives of web pages.
- Return Type: The API can be used by parsing the response headers and response URL.
The saved URL has to be parsed from the headers, for example, response header:
{'Server': 'nginx/1.19.10', 'Date': 'Sun, 02 Jan 2022 10:54:09 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'x-archive-orig-date': 'Sun, 02 Jan 2022 10:46:06 GMT', 'x-archive-orig-server': 'mw1385.eqiad.wmnet', 'x-archive-orig-x-content-type-options': 'nosniff', 'x-archive-orig-p3p': 'CP="See https://en.wikipedia.org/wiki/Special:CentralAutoLogin/P3P for more info."', 'x-archive-orig-content-language': 'en', 'x-archive-orig-vary': 'Accept-Encoding,Cookie,Authorization', 'x-archive-orig-last-modified': 'Sun, 02 Jan 2022 09:30:45 GMT', 'x-archive-orig-content-encoding': 'gzip', 'x-archive-orig-age': '2', 'x-archive-orig-x-cache': 'cp4030 miss, cp4027 hit/1', 'x-archive-orig-x-cache-status': 'hit-front', 'x-archive-orig-server-timing': 'cache;desc="hit-front", host;desc="cp4027"', 'x-archive-orig-strict-transport-security': 'max-age=106384710; includeSubDomains; preload', 'x-archive-orig-report-to': '{ "group": "wm_nel", "max_age": 86400, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }', 'x-archive-orig-nel': '{ "report_to": "wm_nel", "max_age": 86400, "failure_fraction": 0.05, "success_fraction": 0.0}', 'x-archive-orig-permissions-policy': 'interest-cohort=()', 'x-archive-orig-x-client-ip': '207.241.232.35', 'x-archive-orig-cache-control': 'private, s-maxage=0, max-age=0, must-revalidate', 'x-archive-orig-accept-ranges': 'bytes', 'x-archive-orig-content-length': '164995', 'x-archive-orig-connection': 'keep-alive', 'x-archive-guessed-content-type': 'text/html', 'x-archive-guessed-charset': 'utf-8', 'memento-datetime': 'Sun, 02 Jan 2022 10:46:08 GMT', 'link': '<https://en.wikipedia.org/wiki/Social_media>; rel="original", <https://web.archive.org/web/timemap/link/https://en.wikipedia.org/wiki/Social_media>; rel="timemap"; type="application/link-format", <https://web.archive.org/web/https://en.wikipedia.org/wiki/Social_media>; rel="timegate", <https://web.archive.org/web/20051215000000/http://en.wikipedia.org/wiki/Social_media>; rel="first memento"; datetime="Thu, 15 Dec 2005 00:00:00 GMT", <https://web.archive.org/web/20220101114012/https://en.wikipedia.org/wiki/Social_media>; rel="prev memento"; datetime="Sat, 01 Jan 2022 11:40:12 GMT", <https://web.archive.org/web/20220102104608/https://en.wikipedia.org/wiki/Social_media>; rel="memento"; datetime="Sun, 02 Jan 2022 10:46:08 GMT", <https://web.archive.org/web/20220102104608/https://en.wikipedia.org/wiki/Social_media>; rel="last memento"; datetime="Sun, 02 Jan 2022 10:46:08 GMT"', 'content-security-policy': "default-src 'self' 'unsafe-eval' 'unsafe-inline' data: blob: archive.org web.archive.org analytics.archive.org pragma.archivelab.org", 'x-archive-src': 'spn2-20220102093111-wwwb-spn10.us.archive.org-8000.warc.gz', 'server-timing': 'captures_list;dur=275.334598, exclusion.robots;dur=0.096415, exclusion.robots.policy;dur=0.088356, RedisCDXSource;dur=1.634125, esindex;dur=0.008082, LoadShardBlock;dur=81.607259, PetaboxLoader3.datanode;dur=51.631773, CDXLines.iter;dur=18.885269, load_resource;dur=19.971806', 'x-app-server': 'wwwb-app204', 'x-ts': '200', 'x-tr': '910', 'X-location': 'All', 'X-Cache-Key': 'httpsweb.archive.org/web/20220102104608/https://en.wikipedia.org/wiki/Social_mediaIN', 'X-RL': '0', 'X-NA': '0', 'X-Page-Cache': 'MISS', 'X-NID': '-', 'Referrer-Policy': 'no-referrer-when-downgrade', 'Permissions-Policy': 'interest-cohort=()', 'Content-Encoding': 'gzip'}
- Endpoint: https://web.archive.org/cdx/search/cdx
- Complex querying, filtering, and analysis of Wayback capture data.
- Read more @ https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server
- Return Type: text/plain or JSON
- Endpoint: https://archive.org/wayback/available
- Purpose: Checking for archives and looking up for archives close to a specific date and time.
- Read more @ https://archive.org/help/wayback_api.php
- Return Type: JSON