Skip to content

Latest commit

 

History

History
572 lines (375 loc) · 26.9 KB

amp-cache-modifications.md

File metadata and controls

572 lines (375 loc) · 26.9 KB

AMP cache modifications best practices

These are guidelines for what AMP cache implementations should look like. Some items are required for overall security of the platform while others are suggestions for performance improvements. All modifications are made to both AMP and AMP4ADS documents except where noted.

For example, given a recent version of everything.amp.html, the output after modifications will be this version.

HTML Sanitization

The AMP Cache parses and re-serializes all documents to remove any ambiguities in parsing the document which might result in subtly different parses in different browsers.

All HTML comments are stripped

example
before after
<foo><!-- comment --></foo> <foo></foo>

Tag and attribute names are lowercased

example
before after
<P DATA-FOO=BAR> <p data-foo=BAR>

Attribute values are consistently quoted and escaped

example
before after
<p data-foo='< >'> <p data-foo="&lt; &gt">

All tags are closed, except for HTML5 void elements

Void elements are tags that have no end tag and also no contents. All other tags are closed.

example
before after
<foo><bar></foo>
<br/>
<foo><bar></bar></foo>
<br>

Whitespace inside tags is stripped

example
before after
<p data-foo=bar > <p data-foo=bar>

Text is escaped

example
before after
3 < 4 3 &lt; 4

Encoded text characters are simplified, using UTF-8 equivalent characters

example
before after
&nbsp; "\u00A0"
&#x61; a
&#00000000000039; &#39;

Move elements after <body>, which are only allowed in <body>, into the <body>. This includes text.

example
before after
<body></body><div>foo</div>text <body><div>foo</div>text</body>

URL Rewrites

The AMP Cache rewrites URLs found in the AMP HTML for two purposes. One is to rebase relative URLs found in the document so that the URL remains the same when loaded from the AMP Cache. The other reason is to improve performance by selecting a different equivalent resource. This includes rewriting image and font URLs to use a cached copy and rewriting AMP javascript URLs to use a copy with longer cache lifetimes.

All relative href , src, data-iframe-src and data-no-service-worker-fallback-shell-url URLs are rewritten as absolute URLs

data-iframe-src and data-no-service-worker-fallback-shell-url are part of <amp-install-serviceworker> spec.

example
before after
<a href=foo.html target=_top>Lorem ipsum</a> <a href=https://example.com/foo.html target=_top>Lorem ipsum</a>
<amp-list src="list.json" ...>...</amp-list> <amp-list src="https://example.com/list.json" ...>...</amp-list>
<amp-install-serviceworker data-iframe-src="sw.html"...></amp-install-serviceworker> <amp-install-serviceworker data-iframe-src="https://example.com/sw.html"...></amp-install-serviceworker>

All relative form[action] and form[action-xhr] URLs are rewritten as absolute URLs

example
before after
<form action=/subscribe>...</form> <form action=https://example.com/subscribe>...</form>

All relative @font-face CSS URLs are rewritten as absolute URLs

example
before after
@font-face {
font-family: Foo;
src: url(font.woff);
}
@font-face {
font-family: Foo;
src: url(https://example.com/font.woff);
}

All image URLs are rewritten as AMP cache URLs except those in amp-mustache template

example
before after
<amp-img src=https://example.com/foo.png></amp-img> <amp-img src=/i/s/example.com/foo.png></amp-img>
<amp-img srcset="https://example.com/bar.png 1080w, https://example.com/bar-400.png 400w"> <amp-img src="/i/s/example.com/bar.png 1080w, /i/s/example.com/bar-400.png 400w">
<amp-anim src=foo.gif></amp-anim> <amp-anim src=/i/s/example.com/foo.gif></amp-anim>
<amp-video poster=bar.png> <amp-video poster=/i/s/example.com/bar.png>

Anchor tags must have a target of _blank or _top

Condition: If <a> tag does not have attribute target=_blank or target=_top then add a target=. This added target= will be either be target=_blank or target=_top. If the document has <base target=...> of either _top or _blank then use that value. Otherwise all other target values are rewritten to _top.

example
before after
<a href=https://example.com/foo.html>Lorem ipsum</a> <a href=https://example.com/foo.html target=_top>Lorem ipsum</a>
<a href=https://example.com/bar.html target=_blank>Lorem ipsum</a> <a href=https://example.com/bar.html target=_blank>Lorem ipsum</a>
<a href=https://example.com/baz.html target=window>Lorem ipsum</a> <a href=https://example.com/baz.html target=_top>Lorem ipsum</a>
<head>
...
<base target=_blank>
...
</head>
<body>
...
<a href=https://example.com/foo.html>Lorem ipsum</a>
...
</body>
<head>
...
<base target=_blank>
...
</head>
<body>
...
<a href=https://example.com/foo.html target=_blank>Lorem ipsum</a>
...
</body>

Insert and Rewrite Tags

Insert <link as=script href=https://cdn.ampproject.org/v0.js rel=preload>

Before the AMP Runtime script tag, insert a link tag that tells the browser the AMP Runtime script tag is high priority despite being an async script tag.

example
before after
<head>
...
<script async src=https://cdn.ampproject.org/v0.js></script>
...
</head>
<head>
...
<link as=script href=https://cdn.ampproject.org/v0.js rel=preload>
<script async src=https://cdn.ampproject.org/v0.js></script>
...
</head>

Insert <link rel=icon>

When a given AMP document does not have a favicon present, insert one. Inserted tag is of the form <link href={document_protocol}://{document_domain}/favicon.ico rel=icon>.

Condition: No <link> tag present with attribute rel equal to any of the following: icon, icon shortcut, shortcut icon.

example
before after
<head>
...
</head>
<head>
...
<link href=https://example.com/favicon.ico rel=icon>
</head>
<head>
...
<link href=https://example.com/favicon.ico rel="icon shortcut">
...
</head>
<head>
...
<link href=https://example.com/favicon.ico rel="icon shortcut">
...
</head>

Rewrite <link rel=manifest> to <link rel=origin-manifest>

Condition: <link rel=manifest> tag present in the document.

example
before after
<head>
...
<link rel=manifest>
...
</head>
<head>
...
<link rel=origin-manifest>
...
</head>

Insert <meta content=always name=referrer> [required]

If the document was fetched from HTTP origins and does not have a meta referrer tag then insert one.

Condition: No <meta name=referrer ...> tag present and document was fetched from HTTP and not HTTPS.

example
before after
<head>
...
</head>
<head>
...
<meta content=always name=referrer>
</head>

Insert <meta content=noindex name=robots> [required]

AMP Cache pages should not show up in search result pages. The cache also uses robots.txt to enforce this.

example
before after
<head>
...
</head>
<head>
...
<meta content=noindex name=robots>
</head>

Rewrite all <meta> tags to be the first children of <head>.

Some <meta> tags are used by AMP Components and providing them before the AMP Component's script has loaded is important. As a result we move all <meta> tags to be the first children of <head>. An example of one of these tags is <meta name="amp-experiments-opt-in" ...>.

example
before after
<head>
<meta charset=utf-8>
...
<script async src=https://cdn.ampproject.org/v0.js></script>
<meta name="amp-experiments-opt-in" content="experiment-a,experiment-b">
</head>
<head>
<meta charset=utf-8>
<meta name="amp-experiments-opt-in" content="experiment-a,experiment-b">
<script async src=https://cdn.ampproject.org/v0.js></script>
...
</head>

Remove Tags and Attributes

Remove <link> resource hints

The AMP Cache removes any resource hints in the original document.

Condition Any <link> tag present with attribute rel equal to any of the following:

  • dns-prefetch
  • preconnect
  • prefetch
  • preload
  • prerender
example
before after
<head>
...
<link rel=dns-prefetch href=https://example.com>
...
</head>
<head>
...
</head>

Remove non-whitelisted <meta> tags

Condition: Remove any <meta> tags except for those that:

  • have attribute charset
  • do not have attributes content, itemprop, name and property
  • have attribute http-equiv
  • have attribute name with case-insensitive prefix amp-
  • have attribute name with case-insensitive prefix amp4ads-
  • have attribute name with case-insensitive prefix dc.
  • have attribute name with case-insensitive prefix i-amphtml-
  • have attribute name with case-insensitive prefix twitter:
  • have attribute name=apple-itunes-app
  • have attribute name=copyright
  • have attribute name=referrer [note: this may be inserted by AMP Cache]
  • have attribute name=robots [note: this is inserted by AMP Cache]
  • have attribute name=viewport
  • have attribute property with case-insensitive prefix "al:"
  • have attribute property with case-insensitive prefix "fb:"
  • have attribute property with case-insensitive prefix "og:"
example
before after
<meta charset=utf-8>
<meta http-equiv=content-language content=en>
<meta name=description content="An example AMP page">
<meta name=twitter:title content="AMP Example">
<meta charset=utf-8>
<meta http-equiv=content-language content=en>
<meta name=twitter:title content="AMP Example">

Remove amp-live-list children based on amp_latest_update_time parameter

This is discussed in detail at Server side filtering for amp-live-list

Remove attribute nonce

Condition: Remove nonce attribute from every tag.

example
before after
<script async custom-element=amp-youtube nonce=cryptohash src=https://cdn.ampproject.org/v0/amp-youtube-0.1.js></script> <script async custom-element=amp-youtube src=https://cdn.ampproject.org/v0/amp-youtube-0.1.js></script>

Optimizations

These are modifications that either reduce the byte size of the document or decreases the time to render. An AMP cache is not required to implement these.

The AMP engine javascript URL is rewritten to most recent stable version

If possible, rewrite to use the stable version. Otherwise use the unversioned path. The stable version takes the form <script async src=https://cdn.ampproject.org/rtv/{version}/v0.js></script>.

example
before after
<script async src=https://cdn.ampproject.org/v0.js></script> <script async src=https://cdn.ampproject.org/rtv/031485231782273/v0.js></script>

Insert <link href=https://fonts.gstatic.com rel="dns-prefetch preconnect" crossorigin>

The AMP Cache adds prefetch hint tags for browsers to assist in loading resources earlier and thus speed up page loads.

Condition: Has a stylesheet of the form: <link href=https://fonts.googleapis.com/... rel=stylesheet>.

example
before after
<head>
...
<link href=https://fonts.googleapis.com/css?family=Lato rel=stylesheet>
...
</head>
<head>
...
<link href=https://fonts.googleapis.com/css?family=Lato rel=stylesheet>
<link href=https://fonts.gstatic.com rel="dns-prefetch preconnect" crossorigin>
...
</head>

Prioritize AMP engine javascript and other render blocking scripts in <head>

The AMP Cache places the AMP engine javascript as the second child of <head> right after <meta charset=utf-8>. It then emits any other render blocking custom-element script tags followed by the remaining custom-element <script> tags in the document. Render blocking custom-element <script> tags are listed in SERVICES at render-delaying-services.js.

example
before after
<head>
...
<script async custom-element=amp-instagram src=https://cdn.ampproject.org/v0/amp-instagram-0.1.js></script>
<script async custom-element=amp-accordion src=https://cdn.ampproject.org/v0/amp-accordion-0.1.js></script>
<script async src=https://cdn.ampproject.org/v0.js></script>
<meta charset=utf-8>
...
</head>
<head>
<meta charset=utf-8>
<script async src=https://cdn.ampproject.org/v0.js></script>
<script async custom-element=amp-accordion src=https://cdn.ampproject.org/v0/amp-accordion-0.1.js></script>
<script async custom-element=amp-instagram src=https://cdn.ampproject.org/v0/amp-instagram-0.1.js></script>
...
</head>

Remove duplicate custom-element extensions in <head>

If a custom-element <script> tag is included more than once, the AMP Cache removes all but one.

example
before after
<head>
...
<script async custom-element=amp-accordion src=https://cdn.ampproject.org/v0/amp-accordion-0.1.js></script>
<script async custom-element=amp-instagram src=https://cdn.ampproject.org/v0/amp-instagram-0.1.js></script>
<script async custom-element=amp-accordion src=https://cdn.ampproject.org/v0/amp-accordion-0.1.js></script>
...
</head>
<head>
...
<script async custom-element=amp-accordion src=https://cdn.ampproject.org/v0/amp-accordion-0.1.js></script>
<script async custom-element=amp-instagram src=https://cdn.ampproject.org/v0/amp-instagram-0.1.js></script>
...
</head>

Remove unused custom-element extensions in <head> [WIP]

This is currently a work in progress.

If a custom-element <script> tag is included in <head> but not used in <body> then remove it. There are several exceptions to this listed under Condition.

Condition: Remove unused custom-element extensions with the following exceptions:

  • Do not remove any custom-element extensions if <amp-live-list> is present within the document
  • Do not remove any of the following custom-element extensions:
  • amp-access
  • amp-access-laterpay
  • amp-analytics
  • amp-auto-ads
  • amp-dynamic-css-classes
  • amp-form
  • amp-share-tracking
example
before after
<head>
...
<script async custom-element=amp-accordion src=https://cdn.ampproject.org/v0/amp-accordion-0.1.js></script>
<script async custom-element=amp-analytics src=https://cdn.ampproject.org/v0/amp-analytics-0.1.js></script>
<script async custom-element=amp-youtube src=https://cdn.ampproject.org/v0/amp-youtube-0.1.js></script>
...
</head>
<body>
...
<amp-youtube ...></amp-youtube>
...
</body>
<head>
...
<script async custom-element=amp-analytics src=https://cdn.ampproject.org/v0/amp-analytics-0.1.js></script>
<script async custom-element=amp-youtube src=https://cdn.ampproject.org/v0/amp-youtube-0.1.js></script>
...
</head>
<body>
...
<amp-youtube ...></amp-youtube>
...
</body>

Remove <script type=application/ld+json>...</script>

Remove JSON-based linked data from the document.

example
before after
<head>
...
<script async src=https://cdn.ampproject.org/v0.js></script>
<script type=application/ld+json>
{
"@context": "http://schema.org",
"@type": "Person",
"name": "Lorem Ipsum",
}
</script>
...
</head>
<head>
...
<script async src=https://cdn.ampproject.org/v0.js></script>
...
</head>

Remove insignificant whitespace in <head>

Remove whitespace in <head> except for tags that should preserve whitespace.

Condition: Remove whitespace except for within these tags:

  • <script>
  • <style>
example
before after
<head>
...
<meta charset=utf-8>
<style amp-custom>
body {
background-color: white;
}
</style>
...
</head>
<head>...<meta charset=utf-8><style amp-custom>
body {
background-color: white;
}
</style>...</head>

Remove unnecessary attribute value quotes in entire document [WIP]

This is currently a work in progress for AMP documents and implemented for AMP4ADS documents.

Remove quotes from around an attribute’s value unless the attribute’s value has an ASCII character in the set { 0x20(space), 0x22("), 0x27('), 0x3E(>), 0x60(`) }.

example
before after
<head>
...
<meta charset="utf-8">
<script async src="https://cdn.ampproject.org/v0.js"></script>
<link rel="icon shortcut" href="https://example.com/favicon.ico">
...
</head>
<head>
...
<meta charset=utf-8>
<script async src=https://cdn.ampproject.org/v0.js></script>
<link rel="icon shortcut" href=https://example.com/favicon.ico>
...
</head>

Additional Modifications for AMP for Ads (AMP4ADS) Documents

These are AMP4ADS specific modifications and not implemented for AMP documents.

Prioritize AMP4ADS engine javascript in <head>

The AMP Cache places the AMP4ADS engine javascript as the second child of <head> right after <meta charset=utf-8>.

example
before after
<head>
...
<script async custom-element=amp-instagram src=https://cdn.ampproject.org/v0/amp-instagram-0.1.js></script>
<script async custom-element=amp-accordion src=https://cdn.ampproject.org/v0/amp-accordion-0.1.js></script>
<script async src=https://cdn.ampproject.org/amp4ads-v0.js></script>
<meta charset=utf-8>
...
</head>
<head>
<meta charset=utf-8>
<script async src=https://cdn.ampproject.org/amp4ads-v0.js></script>
<script async custom-element=amp-accordion src=https://cdn.ampproject.org/v0/amp-accordion-0.1.js></script>
<script async custom-element=amp-instagram src=https://cdn.ampproject.org/v0/amp-instagram-0.1.js></script>
...
</head>

Record UTF-16 offsets for AMP4ADS engine and custom-element extensions in added JSON

AMP4ADS requires providing the start and end position of the block of <script> tags that represent the AMP4ADS engine and custom-element extensions. These string offsets are in UTF-16 encoding lengths. This data is provided in the amp-ad-metadata JSON as ampRuntimeUtf16CharOffsets. The amp-ad-metadata JSON is appended to the end of the <body>.

example
before after
<head>
<meta charset=utf-8>
<script async src=https://cdn.ampproject.org/amp4ads-v0.js></script>
<script async custom-element=amp-accordion src=https://cdn.ampproject.org/v0/amp-accordion-0.1.js></script>
<script async custom-element=amp-instagram src=https://cdn.ampproject.org/v0/amp-instagram-0.1.js></script>
...
</head>
<body>
...
</body>
<head>
<meta charset=utf-8>
<script async src=https://cdn.ampproject.org/amp4ads-v0.js></script>
<script async custom-element=amp-accordion src=https://cdn.ampproject.org/v0/amp-accordion-0.1.js></script>
<script async custom-element=amp-instagram src=https://cdn.ampproject.org/v0/amp-instagram-0.1.js></script>
...
</head>
<body>
...
<script type=application/json amp-ad-metadata>
{
"ampRuntimeUtf16CharOffsets" : [ 55, 337 ]
}
</script>
</body>

Record UTF-16 offsets for amp-access and amp-analytics JSON in added JSON

AMP4ADS requires providing the start and end position of amp-access and amp-analytics JSON. These string offsets are in UTF-16 encoding lengths. This data is provided in the amp-ad-metadata JSON as jsonUtf16CharOffsets. The amp-ad-metadata JSON is appended to the end of the <body>.

example
before after
<head>
...
<script id=amp-access type=application/json>
{
...
}
</script>
...
</head>
<body>
...
<amp-analytics>
<script type=application/json>
{
...
}
</script>
</amp-analytics>
...
</body>
<head>
...
<script id=amp-access type=application/json>
{
...
}
</script>
...
</head>
<body>
...
<amp-analytics>
<script type=application/json>
{
...
}
</script>
</amp-analytics>
...
<script type=application/json amp-ad-metadata>
{
"jsonUtf16CharOffsets" : {
"amp-access": [ 12, 92 ],
"amp-analytics": [ 105, 175],
}
}
</script>
</body>

Record UTF-16 offsets for CSS body selectors in added JSON

AMP4ADS requires providing the start and end position of CSS body selectors in <style amp-custom>. These string offsets are in UTF-16 encoding lengths. This data is provided in the amp-ad-metadata JSON as cssReplacementRanges. The amp-ad-metadata JSON is appended to the end of the <body>.

example
before after
<head>
...
<style amp-custom>
body { background-color: #eee; }
...
@media screen and (min-width:480px) {
body { color: #ccc; }
}
</style>
...
</head>
<body>
...
</body>
<head>
...
<style amp-custom>
body { background-color: #eee; }
...
@media screen and (min-width:480px) {
body { color: #ccc; }
}
</style>
...
</head>
<body>
...
<script type=application/json amp-ad-metadata>
{
"cssReplacementRanges" : [
[ 59, 63 ],
[ 139, 143 ],
}
</script>
</body>

Record custom-element extension in an added JSON <script>

AMP4ADS requires providing additional metadata of what custom-element extensions are included in the AMP4ADS document. This data is provided in the amp-ad-metadata JSON as customElementExtensions and extensions. The amp-ad-metadata JSON is appended to the end of the <body>.

customElementExtensions is just a list of the custom-element extension names in the document. This is now deprecated but still used.

extensions is a list of objects containing custom-element and src attributes, which are the name of the custom-element and the location of the extension respectively.

example
before after
<head>
...
<script async custom-element=amp-accordion src=https://cdn.ampproject.org/v0/amp-accordion-0.1.js></script>
<script async custom-element=amp-instagram src=https://cdn.ampproject.org/v0/amp-instagram-0.1.js></script>
...
</head>
<body>
...
</body>
<head>
...
<script async custom-element=amp-accordion src=https://cdn.ampproject.org/v0/amp-accordion-0.1.js></script>
<script async custom-element=amp-instagram src=https://cdn.ampproject.org/v0/amp-instagram-0.1.js></script>
...
</head>
<body>
...
<script type=application/json amp-ad-metadata>
{
"customElementExtensions" :
[ "amp-accordion", "amp-instagram" ],
"extensions" : [
{
"custom-element" : "amp-accordion",
"src" : "https://cdn.ampproject.org/v0/amp-accordion-0.1.js"
},
{
"custom-element" : "amp-instagram",
"src" : "https://cdn.ampproject.org/v0/amp-instagram-0.1.js"
}]
}
</script>
</body>

Record custom fonts in an added JSON <script>

AMP4ADS requires providing additional metadata of what custom fonts (href and if present media) were included in the AMP4ADS document. This data is provided in the amp-ad-metadata JSON as customStylesheets. The amp-ad-metadata JSON is appended to the end of the <body>.

example
before after
<head>
...
<link href=https://fonts.googleapis.com/css?family=Foo rel=stylesheet>
<link href=https://fonts.googleapis.com/css?family=Bar rel=stylesheet media=print>
...
</head>
<body>
...
</body>
<head>
...
<link href=https://fonts.googleapis.com/css?family=Foo rel=stylesheet>
<link href=https://fonts.googleapis.com/css?family=Bar rel=stylesheet media=print>
...
</head>
<body>
...
<script type=application/json amp-ad-metadata>
{
"customStylesheets" : [
{
"href" : "https://fonts.googleapis.com/css?family=Foo"
},
{
"href" : "https://fonts.googleapis.com/css?family=Bar",
"media" : "print"
}
}
</script>
</body>