forked from mislav/diveintohtml5
-
Notifications
You must be signed in to change notification settings - Fork 0
/
semantics.html
880 lines (636 loc) · 73.2 KB
/
semantics.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
<!DOCTYPE html>
<meta charset=utf-8>
<title>Semantics - Dive Into HTML5</title>
<!--[if lt IE 9]><script src=j/html5.js></script><![endif]-->
<link rel=stylesheet href=screen.css>
<style>
body{counter-reset:h1 3}
dl.col dt{float:left;clear:left}
dl.col dd{margin-left:7em}
dl.col dt,dl.col dd{padding-bottom:1.75em}
</style>
<link rel=stylesheet media='only screen and (max-device-width: 480px)' href=mobile.css>
<link rel=prefetch href=index.html>
<p>You are here: <a href=index.html>Home</a> <span class=u>‣</span> <a href=table-of-contents.html#semantics>Dive Into <abbr>HTML5</abbr></a> <span class=u>‣</span>
<h1><br>What Does It All Mean?</h1>
<p id=toc>
<p class=a>❧
<h2 id=divingin>Diving In</h2>
<p class=f><img src=i/aoc-t.png alt=T width=107 height=105>his chapter will take an <abbr>HTML</abbr> page that has absolutely nothing wrong with it, and improve it. Parts of it will become shorter. Parts will become longer. All of it will become more semantic. It’ll be awesome.
<p style="clear:both"><a href=examples/blog-original.html>Here is the page in question</a>. Learn it. Live it. Love it. Open it in a new tab and don’t come back until you’ve hit “View Source” at least once.
<p class=a>❧
<h2 id=the-doctype>The Doctype</h2>
<p>From the top:
<pre><code><!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"></code></pre>
<p>This is called the “doctype.” There’s a long history — and a black art — behind the doctype. While working on Internet Explorer 5 for Mac, the developers at Microsoft found themselves with a surprising problem. The upcoming version of their browser had improved its standards support so much, older pages no longer rendered properly. Or rather, they rendered properly (according to specifications), but people expected them to render <em>improperly</em>. The pages themselves had been authored based on the quirks of the dominant browsers of the day, primarily Netscape 4 and Internet Explorer 4. IE5/Mac was so advanced, it actually broke the web.
<p>Microsoft came up with a novel solution. Before rendering a page, IE5/Mac looked at the “doctype,” which is typically the first line of the <abbr>HTML</abbr> source (even before the <code><html></code> element). Older pages (that relied on the rendering quirks of older browsers) generally didn’t have a doctype at all. IE5/Mac rendered these pages like older browsers did. In order to “activate” the new standards support, web page authors had to opt in, by supplying the right doctype before the <code><html></code> element.
<p>This idea spread like wildfire, and soon all major browsers had two modes: “quirks mode” and “standards mode.” Of course, this being the web, things quickly got out of hand. When Mozilla tried to ship version 1.1 of their browser, they discovered that there were pages being rendered in “standards mode” that were actually relying on one specific quirk. Mozilla had just fixed its rendering engine to eliminate this quirk, and thousands of pages broke all at once. Thus was created — and I am not making this up — “<a href="https://developer.mozilla.org/en/Gecko's_%22Almost_Standards%22_Mode">almost standards mode</a>.”
<p>In his seminal work, <a href=http://hsivonen.iki.fi/doctype/>Activating Browser Modes with Doctype</a>, Henri Sivonen summarizes the different modes:
<blockquote>
<dl>
<dt>Quirks Mode
<dd>In the Quirks mode, browsers violate contemporary Web format specifications in order to avoid “breaking” pages authored according to practices that were prevalent in the late 1990s.
<dt>Standards Mode
<dd>In the Standards mode, browsers try to give conforming documents the specification-wise correct treatment to the extent implemented in a particular browser. <abbr>HTML5</abbr> calls this mode the “no quirks mode.”
<dt>Almost Standards Mode
<dd>Firefox, Safari, Chrome, Opera (since 7.5) and IE8 also have a mode known as “Almost Standards mode,” that implements the vertical sizing of table cells traditionally and not rigorously according to the CSS2 specification. <abbr>HTML5</abbr> calls this mode the “limited quirks mode.”
</dl>
</blockquote>
<p>(You should read the rest of Henri’s article, because I’m simplifying immensely here. Even in IE5/Mac, there were a few older doctypes that didn’t count as far as opting into standards support. Over time, the list of quirks grew, and so did the list of doctypes that triggered “quirks mode.” The last time I tried to count, there were 5 doctypes that triggered “almost standards mode,” and 73 that triggered “quirks mode.” But I probably missed some, and I’m not even going to talk about the crazy shit that Internet Explorer 8 does to switch between its four — four! — different rendering modes. <a href=http://hsivonen.iki.fi/doctype/ie8-mode.png>Here’s a flowchart</a>. Kill it. Kill it with fire.)
<p>Now then. Where were we? Ah yes, the doctype:
<pre><code><!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"></code></pre>
<p>That happens to be one of the 15 doctypes that trigger “standards mode” in all modern browsers. There is nothing wrong with it. If you like it, you can keep it. Or you can change it to the <abbr>HTML5</abbr> doctype, which is shorter and sweeter and also triggers “standards mode” in all modern browsers.
<p>This is the <abbr>HTML5</abbr> doctype:
<pre><code><!DOCTYPE html></code></pre>
<p>That’s it. Just 15 characters. It’s so easy, you can type it by hand and not screw it up.
<p class=a>❧
<h2 id=html-element>The Root Element</h2>
<p class=ss style="float:left;margin:0 1.75em 1.75em 0;width:280px"><img src=i/openclipart.org_johnny_automatic_tree_on_top_of_hill.png alt="tree on top of hill" width=280 height=394>
<p>An <abbr>HTML</abbr> page is a series of nested elements. The entire structure of the page is like a tree. Some elements are “siblings,” like two branches that extend from the same tree trunk. Some elements can be “children” of other elements, like a smaller branch that extends from a larger branch. (It works the other way too; an element that contains other elements is called the “parent” node of its immediate child elements, and the “ancestor” of its grandchildren.) Elements that have no children are called “leaf” nodes. The outer-most element, which is the ancestor of all other elements on the page, is called the “root element.” The root element of an <abbr>HTML</abbr> page is always <code><html></code>.
<p>In <a href=examples/blog-original.html>this example page</a>, the root element looks like this:
<table role=presentation><tr><td><pre><code><html xmlns="http://www.w3.org/1999/xhtml"
lang="en"
xml:lang="en"></code></pre></td></table>
<p>There is nothing wrong with this markup. Again, if you like it, you can keep it. It is valid <abbr>HTML5</abbr>. But parts of it are no longer necessary in <abbr>HTML5</abbr>, so you can save a few bytes by removing them.
<p>The first thing to discuss is the <code>xmlns</code> attribute. This is a vestige of <a href=http://www.w3.org/TR/xhtml1/><abbr>XHTML</abbr> 1.0</a>. It says that elements in this page are in the <abbr>XHTML</abbr> namespace, <code>http://www.w3.org/1999/xhtml</code>. But elements in <abbr>HTML5</abbr> <a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#xml>are always in this namespace</a>, so you no longer need to declare it explicitly. Your <abbr>HTML5</abbr> page will work exactly the same in all browsers, whether this attribute is present or not.
<p>Dropping the <code>xmlns</code> attribute leaves us with this root element:
<pre><code><html lang="en" xml:lang="en"></code></pre>
<p>The two attributes here, <code>lang</code> and <code>xml:lang</code>, both define the language of this <abbr>HTML</abbr> page. (<code>en</code> stands for “English.” Not writing in English? <a href=http://www.w3.org/International/questions/qa-choosing-language-tags>Find your language code</a>.) Why two attributes for the same thing? Again, this is a vestige of <abbr>XHTML</abbr>. Only the <code>lang</code> attribute has any effect in <abbr>HTML5</abbr>. You can keep the <code>xml:lang</code> attribute if you like, but if you do, you need to ensure that it <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#the-lang-and-xml:lang-attributes">contains the same value as the <code>lang</code> attribute</a>.
<blockquote cite="http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#the-lang-and-xml:lang-attributes">
<p>To ease migration to and from XHTML, authors may specify an attribute in no namespace with no prefix and with the literal localname "xml:lang" on <abbr>HTML</abbr> elements in <abbr>HTML</abbr> documents, but such attributes must only be specified if a <code>lang</code> attribute in no namespace is also specified, and both attributes must have the same value when compared in an ASCII case-insensitive manner. The attribute in no namespace with no prefix and with the literal localname "xml:lang" has no effect on language processing.
</blockquote>
<p>Are you ready to drop it? It’s OK, just let it go. Going, going… gone! That leaves us with this root element:
<pre><code><html lang="en"></code></pre>
<p>And that’s all I have to say about that.
<p class=a>❧
<h2 id=head-element>The <head> Element</h2>
<p class=c><img src=i/openclipart.org_johnny_automatic_8_from_behind.png alt="8 men from behind" width=554 height=164>
<p>The first child of the root element is usually the <code><head></code> element. The <code><head></code> element contains metadata — information <em>about</em> the page, rather than the body of the page itself. (The body of the page is, unsurprisingly, contained in the <code><body></code> element.) The <code><head></code> element itself is rather boring, and it hasn’t changed in any interesting way in <abbr>HTML5</abbr>. The good stuff is what’s <em>inside</em> the <code><head></code> element. And for that, we turn once again to <a href=examples/blog-original.html>our example page</a>:
<pre><code><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>My Weblog</title>
<link rel="stylesheet" type="text/css" href="style-original.css" />
<link rel="alternate" type="application/atom+xml"
title="My Weblog feed"
href="/feed/" />
<link rel="search" type="application/opensearchdescription+xml"
title="My Weblog search"
href="opensearch.xml" />
<link rel="shortcut icon" href="/favicon.ico" />
</head></code></pre>
<p>First up: the <code><meta></code> element.
<p class=a>❧
<h2 id=encoding>Character Encoding</h2>
<p>When you think of “text,” you probably think of “characters and symbols I see on my computer screen.” But computers don’t deal in characters and symbols; they deal in bits and bytes. Every piece of text you’ve ever seen on a computer screen is actually stored in a particular <em>character encoding</em>. There are <a href=http://www.iana.org/assignments/character-sets>hundreds of different character encodings</a>, some optimized for particular languages like Russian or Chinese or English, and others that can be used for multiple languages. Roughly speaking, the character encoding provides a mapping between the stuff you see on your screen and the stuff your computer actually stores in memory and on disk.
<p>In reality, it’s more complicated than that. The same character might appear in more than one encoding, but each encoding might use a different sequence of bytes to actually store the character in memory or on disk. So, you can think of the character encoding as a kind of decryption key for the text. Whenever someone gives you a sequence of bytes and claims it’s “text,” you need to know what character encoding they used so you can decode the bytes into characters and display them (or process them, or whatever).
<p>So, how does your browser actually determine the character encoding of the stream of bytes that a web server sends? I’m glad you asked. If you’re familiar with <abbr>HTTP</abbr> headers, you may have seen a header like this:
<blockquote><p><code>Content-Type: text/html; charset="utf-8"</code></blockquote>
<p>Briefly, this says that the web server thinks it’s sending you an <abbr>HTML</abbr> document, and that it thinks the document uses the <code>UTF-8</code> character encoding. Unfortunately, in the whole magnificent soup of the World Wide Web, few authors actually have control over their HTTP server. Think <a href="http://www.blogger.com/">Blogger</a>: the content is provided by individuals, but the servers are run by Google. So <abbr>HTML</abbr> 4 provided a way to specify the character encoding in the <abbr>HTML</abbr> document itself. You’ve probably seen this too:
<blockquote><p><code><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></code></blockquote>
<p>Briefly, this says that the web author thinks they have authored an <abbr>HTML</abbr> document using the <code>UTF-8</code> character encoding.
<p>Both of these techniques still work in <abbr>HTML5</abbr>. The <abbr>HTTP</abbr> header is the preferred method, and it overrides the <code><meta></code> tag if present. But not everyone can set <abbr>HTTP</abbr> headers, so the <code><meta></code> tag is still around. In fact, it got a little easier in <abbr>HTML5</abbr>. Now it looks like this:
<blockquote><p><code><meta charset="utf-8" /></code></blockquote>
<p>This works in all browsers. How did this shortened syntax come about? Here is <a href="http://lists.w3.org/Archives/Public/public-html/2007Jul/0550.html">the best explanation I could find</a>:
<blockquote cite="http://lists.w3.org/Archives/Public/public-html/2007Jul/0550.html">
<p>The rationale for the <code><meta charset=""></code> attribute combination is that UAs already implement it, because people tend to leave things unquoted, like:
<p><code><META HTTP-EQUIV=Content-Type CONTENT=text/html; charset=ISO-8859-1></code>
</blockquote>
<p>There are even a few <a href="http://simon.html5.org/test/html/parsing/encoding/"><code><meta charset></code> test cases</a> if you don’t believe that browsers already do this.
<div class=pf>
<h4>Ask Professor Markup</h4>
<div class=inner>
<blockquote class=note>
<p><span>☞</span>Q: I never use funny characters. Do I still need to declare my character encoding?<br>
<p>A: Yes! You should <em>always</em> specify a character encoding on every <abbr>HTML</abbr> page you serve. Not specifying an encoding <a href=http://openmya.hacker.jp/hasegawa/security/utf7cs.html>can lead to security vulnerabilities</a>.
</blockquote>
</div>
</div>
<p>To sum up: character encoding is complicated, and it has not been made any easier by decades of poorly written software used by copy-and-paste–educated authors. You should <strong>always</strong> specify a character encoding on <strong>every</strong> <abbr>HTML</abbr> document, or <a href=http://openmya.hacker.jp/hasegawa/security/utf7cs.html>bad things will happen</a>. You can do it with the HTTP <code>Content-Type</code> header, the <code><meta http-equiv></code> declaration, or the shorter <code><meta charset></code> declaration, but please do it. The web thanks you.
<p class=a>❧
<h2 id=link>Friends & (Link) Relations</h2>
<p>Regular links (<code><a href></code>) simply point to another page. Link relations are a way to explain <em>why</em> you’re pointing to another page. They finish the sentence “I’m pointing to this other page because...”
<ul>
<li>...it’s a stylesheet containing CSS rules that your browser should apply to this document.
<li>...it’s a feed that contains the same content as this page, but in a standard subscribable format.
<li>...it’s a translation of this page into another language.
<li>...it’s the same content as this page, but in <abbr>PDF</abbr> format.
<li>...it’s the next chapter of an online book of which this page is also a part.
</ul>
<p>And so on. <abbr>HTML5</abbr> breaks link relations <a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/semantics.html#attr-link-rel>into two categories</a>:
<blockquote cite=http://www.whatwg.org/specs/web-apps/current-work/multipage/semantics.html#attr-link-rel>
<p>Two categories of links can be created using the link element. <b>Links to external resources</b> are links to resources that are to be used to augment the current document, and <b>hyperlink links</b> are links to other documents. ...
<p>The exact behavior for links to external resources depends on the exact relationship, as defined for the relevant link type.
</blockquote>
<p>Of the examples I just gave, only the first (<code>rel="stylesheet"</code>) is a link to an external resource. The rest are hyperlinks to other documents. You may wish to follow those links, or you may not, but they’re not required in order to view the current page.
<p>Most often, link relations are seen on <code><link></code> elements within the <code><head></code> of a page. Some link relations can also be used on <code><a></code> elements, but this is uncommon even when allowed. <abbr>HTML5</abbr> also allows some relations on <code><area></code> elements, but this is even <em>less</em> common. (HTML 4 did not allow a <code>rel</code> attribute on <code><area></code> elements.) See <a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#linkTypes>the full chart of link relations</a> to check where you can use specific <code>rel</code> values.
<div class=pf>
<h4>Ask Professor Markup</h4>
<div class=inner>
<blockquote class=note>
<p><span>☞</span>Q: Can I make up my own link relations?<br>
<p>A: There seems to be an infinite supply of ideas for new link relations. In an attempt to prevent people from <a href=http://developer.apple.com/safari/library/documentation/AppleApplications/Reference/SafariWebContent/ConfiguringWebApplications/ConfiguringWebApplications.html#//apple_ref/doc/uid/TP40002051-CH3-SW4>just making shit up</a>, the WHATWG <a href=http://wiki.whatwg.org/wiki/RelExtensions>maintains a registry of proposed <code>rel</code> values</a> and <a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#other-link-types>defines the process for getting them accepted</a>.
</blockquote>
</div>
</div>
<h3 id=rel-stylesheet>rel = stylesheet</h3>
<p>Let’s look at the first link relation in <a href=examples/blog-original.html>our example page</a>:
<pre><code><link rel="stylesheet" href="style-original.css" type="text/css" /></code></pre>
<p>This is the most frequently used link relation in the world (literally). <code><link rel="stylesheet"></code> is for pointing to <abbr>CSS</abbr> rules that are stored in a separate file. One small optimization you can make in <abbr>HTML5</abbr> is to drop the <code>type</code> attribute. There’s only one stylesheet language for the web, <abbr>CSS</abbr>, so that’s the default value for the <code>type</code> attribute. This works in all browsers. (I suppose someone could invent a new stylesheet language someday, but if that happens, just add the <code>type</code> attribute back.)
<pre><code><link rel="stylesheet" href="style-original.css" /></code></pre>
<h3 id=rel-alternate>rel = alternate</h3>
<p>Continuing with <a href=examples/blog-original.html>our example page</a>:
<pre><code><link rel="alternate"
type="application/atom+xml"
title="My Weblog feed"
href="/feed/" /></code></pre>
<p>This link relation is also quite common. <code><link rel="alternate"></code>, combined with either the <abbr>RSS</abbr> or Atom media type in the <code>type</code> attribute, enables something called “feed autodiscovery.” It allows syndicated feed readers (like <a href="http://www.google.com/reader/">Google Reader</a>) to discover that a site has a news feed of the latest articles. Most browsers also support feed autodiscovery by displaying a special icon next to the <abbr>URL</abbr>. (Unlike with <code>rel="stylesheet"</code>, the <code>type</code> attribute matters here. Don’t drop it!)
<p>The <code>rel="alternate"</code> link relation has always been a strange hybrid of use cases, <a href=http://www.w3.org/TR/html401/types.html#type-links>even in <abbr>HTML</abbr> 4</a>. In <abbr>HTML5</abbr>, its definition has been clarified and extended to more accurately describe existing web content. As you just saw, using <code>rel="alternate"</code> in conjunction with <code>type=application/atom+xml</code> indicates an Atom feed for the current page. But you can also use <code>rel="alternate"</code> in conjunction with other <code>type</code> attributes to indicate the same content in another format, like <abbr>PDF</abbr>.
<p>HTML5 also puts to rest a long-standing confusion about how to link to translations of documents. <abbr>HTML</abbr> 4 says to use the <code>lang</code> attribute in conjunction with <code>rel="alternate"</code> to specify the language of the linked document, but this is incorrect. The <a href=http://www.w3.org/MarkUp/html4-updates/errata>HTML 4 Errata</a> document lists four outright errors in the <abbr>HTML</abbr> 4 specification. One of these outright errors is how to specify the language of a document linked with <code>rel="alternate"</code> The correct way, described in the <abbr>HTML</abbr> 4 Errata and now in <abbr>HTML5</abbr>, is to use the <code>hreflang</code> attribute. Unfortunately, these errata were never re-integrated into the <abbr>HTML</abbr> 4 spec, because no one in the W3C <abbr>HTML</abbr> Working Group was working on <abbr>HTML</abbr> anymore.
<h3 id=new-relations>Other Link Relations in HTML5</h3>
<p><code>rel="author"</code> is used to link to information about the author of the page. This can be a <code>mailto:</code> address, though it doesn’t have to be. It could simply link to a contact form or “about the author” page.
<p><a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#link-type-external>rel="external"</a> “indicates that the link is leading to a document that is not part of the site that the current document forms a part of.” I believe it was first popularized by <a href=http://www.wordpress.org/>WordPress</a>, which uses it on links left by commenters.
<p class=ss style="width:313px"><img src=i/openclipart.org_johnny_automatic_girl_feeding_birds.png width=313 height=384 alt="girl feeding birds">
<p>HTML 4 defined <a href=http://www.w3.org/TR/html401/types.html#type-links><code>rel="start"</code>, <code>rel="prev"</code>, and <code>rel="next"</code></a> to define relations between pages that are part of a series (like chapters of a book, or even posts on a blog). The only one that was ever used correctly was <code>rel="next"</code>. People used <code>rel="previous"</code> instead of <code>rel="prev"</code>; they used <code>rel="begin"</code> and <code>rel="first"</code> instead of <code>rel="start"</code>; they used <code>rel="end"</code> instead of <code>rel="last"</code>. Oh, and — all by themselves — they made up <code>rel="up"</code> to point to a “parent” page.
<p><abbr>HTML5</abbr> includes <code>rel="first"</code>, which was the most common variation of the different ways to say “first page in a series.” (<code>rel="start"</code> is a non-conforming synonym, provided for backward compatibility.) It also includes <code>rel="prev"</code> and <code>rel="next"</code>, just like <abbr>HTML</abbr> 4, and supports <code>rel="previous"</code> for backward compatibility, as well as <code>rel="last"</code> (the last in a series, mirroring <code>rel="first"</code>) and <code>rel="up"</code>.
<p>The best way to think of <code>rel="up"</code> is to look at your breadcrumb navigation (or at least imagine it). Your home page is probably the first page in your breadcrumbs, and the current page is at the tail end. <code>rel="up"</code> points to the next-to-last page in the breadcrumbs.
<p><a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#rel-icon>rel="icon"</a> is the <a href=http://code.google.com/webstats/2005-12/linkrels.html>second most popular link relation</a>, after <code>rel="stylesheet"</code>. It is usually found together with <code>shortcut</code>, like so:
<pre><code><link rel="shortcut icon" href="/favicon.ico"></code></pre>
<p>All major browsers support this usage to associate a small icon with the page. Usually it’s displayed in the browser’s location bar next to the URL, or in the browser tab, or both.
<p>Also new in <abbr>HTML5</abbr>: the <code>sizes</code> attribute can be used in conjunction with the <code>icon</code> relationship to <a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#rel-icon>indicate the size of the referenced icon</a>.
<p><a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#link-type-license>rel="license"</a> was <a href=http://microformats.org/wiki/rel-license>invented by the microformats community</a>. It “indicates that the referenced document provides the copyright license terms under which the current document is provided.”
<p><a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#link-type-nofollow>rel="nofollow"</a> “indicates that the link is not endorsed by the original author or publisher of the page, or that the link to the referenced document was included primarily because of a commercial relationship between people affiliated with the two pages.” It was <a href=http://googleblog.blogspot.com/2005/01/preventing-comment-spam.html>invented by Google</a> and <a href=http://microformats.org/wiki/rel-nofollow>standardized within the microformats community</a>. <a href=http://www.wordpress.org>WordPress</a> adds <code>rel="nofollow"</code> to links added by commenters. The thinking was that if “nofollow” links did not pass on PageRank, spammers would give up trying to post spam comments on weblogs. That didn’t happen, but <code>rel="nofollow"</code> persists.
<p><a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#link-type-noreferrer>rel="noreferrer"</a> “indicates that no referrer information is to be leaked when following the link.” No shipping browser currently supports this, but support <a href=http://webkit.org/blog/907/webkit-nightlies-support-html5-noreferrer-link-relation/>was recently added to WebKit nightlies</a>, so it will eventually be showing up in Safari, Google Chrome, and other WebKit-based browsers. [<a href=http://wearehugh.com/public/2009/04/rel-noreferrer.html>rel="noreferrer" test case</a>]
<p><a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#link-type-pingback>rel="pingback"</a> specifies the address of a “pingback” server. As explained in <a href=http://hixie.ch/specs/pingback/pingback-1.0>the Pingback specification</a>, “The pingback system is a way for a blog to be automatically notified when other Web sites link to it. ... It enables reverse linking — a way of going back up a chain of links rather than merely drilling down.” Blogging systems, notably WordPress, implement the pingback mechanism to notify authors that you have linked to them when creating a new blog post.
<p class=ss style="float:left;margin:0 1.75em 1.75em 0;width:271px"><img src=i/openclipart.org_johnny_automatic_dog_on_chair.png width=271 height=309 alt="dog on chair">
<p><a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#link-type-prefetch>rel="prefetch"</a> “indicates that preemptively fetching and caching the specified resource is likely to be beneficial, as it is highly likely that the user will require this resource.” Search engines sometimes add <code><link rel="prefetch" href="<i>URL of top search result</i>"></code> to the search results page if they feel that the top result is wildly more popular than any other. For example: using Firefox, <a href="http://www.google.com/search?q=cnn">search Google for CNN</a>, view the page source, and search for the keyword <code>prefetch</code>. Mozilla Firefox is the only current browser that supports <code>rel="prefetch"</code>.
<p><a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#link-type-search>rel="search"</a> “indicates that the referenced document provides an interface specifically for searching the document and its related resources.” Specifically, if you want <code>rel="search"</code> to do anything useful, it should point to an <a href=http://www.opensearch.org/>OpenSearch</a> document that describes how a browser could construct a URL to search the current site for a given keyword. OpenSearch (and <code>rel="search"</code> links that point to OpenSearch description documents) has been supported in Microsoft Internet Explorer since version 7 and Mozilla Firefox since version 2.
<p><a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#link-type-sidebar>rel="sidebar"</a> “indicates that the referenced document, if retrieved, is intended to be shown in a secondary browsing context (if possible), instead of in the current browsing context.” What does that mean? In Opera and Mozilla Firefox, it means “when I click this link, prompt the user to create a bookmark that, when selected from the Bookmarks menu, opens the linked document in a browser sidebar.” (Opera actually calls it the “panel” instead of the “sidebar.”) Internet Explorer, Safari, and Chrome ignore <code>rel="sidebar"</code> and just treat it as a regular link. [<a href=http://wearehugh.com/public/2009/04/rel-sidebar.html>rel="sidebar" test case</a>]
<p><a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#link-type-tag>rel="tag"</a> “indicates that the tag that the referenced document represents applies to the current document.” Marking up “tags” (category keywords) with the <code>rel</code> attribute was <a href=http://www.powazek.com/2005/07/000532.html>invented by Technorati</a> to help them categorize blog posts. Early blogs and tutorials thus referred to them as “Technorati tags.” (You read that right: a commercial company convinced the entire world to add metadata that made the company’s job easier. Nice work if you can get it!) The syntax was later <a href=http://microformats.org/wiki/rel-tag>standardized within the microformats community</a>, where it was simply called <code>rel="tag"</code>. Most blogging systems that allow associating categories, keywords, or tags with individual posts will mark them up with <code>rel="tag"</code> links. Browsers do not do anything special with them; they’re really designed for search engines to use as a signal of what the page is about.
<p class=a>❧
<h2 id=new-elements>New Semantic Elements in HTML5</h2>
<p><abbr>HTML5</abbr> is not just about making existing markup shorter (although it does a fair amount of that). It also defines new semantic elements.
<dl class=col>
<dt><a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/sections.html#the-section-element><code><section></code></a>
<dd>The <code>section</code> element represents a generic document or application section. A section, in this context, is a thematic grouping of content, typically with a heading. Examples of sections would be chapters, the tabbed pages in a tabbed dialog box, or the numbered sections of a thesis. A Web site's home page could be split into sections for an introduction, news items, contact information.
<dt><a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/sections.html#the-nav-element><code><nav></code></a>
<dd>The <code>nav</code> element represents a section of a page that links to other pages or to parts within the page: a section with navigation links. Not all groups of links on a page need to be in a <code>nav</code> element — only sections that consist of major navigation blocks are appropriate for the <code>nav</code> element. In particular, it is common for footers to have a short list of links to common pages of a site, such as the terms of service, the home page, and a copyright page. The <code>footer</code> element alone is sufficient for such cases, without a <code>nav</code> element.
<dt><a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/sections.html#the-article-element><code><article></code></a>
<dd>The <code>article</code> element represents a component of a page that consists of a self-contained composition in a document, page, application, or site and that is intended to be independently distributable or reusable, e.g. in syndication. This could be a forum post, a magazine or newspaper article, a Web log entry, a user-submitted comment, an interactive widget or gadget, or any other independent item of content.
<dt><a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/sections.html#the-aside-element><code><aside></code></a>
<dd>The <code>aside</code> element represents a section of a page that consists of content that is tangentially related to the content around the <code>aside</code> element, and which could be considered separate from that content. Such sections are often represented as sidebars in printed typography. The element can be used for typographical effects like pull quotes or sidebars, for advertising, for groups of <code>nav</code> elements, and for other content that is considered separate from the main content of the page.
<dt><a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/sections.html#the-hgroup-element><code><hgroup></code></a>
<dd>The <code>hgroup</code> element represents the heading of a section. The element is used to group a set of <code>h1</code>–<code>h6</code> elements when the heading has multiple levels, such as subheadings, alternative titles, or taglines.
<dt><a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/sections.html#the-header-element><code><header></code></a>
<dd>The <code>header</code> element represents a group of introductory or navigational aids. A <code>header</code> element is intended to usually contain the section’s heading (an <code>h1</code>–<code>h6</code> element or an <code>hgroup</code> element), but this is not required. The <code>header</code> element can also be used to wrap a section’s table of contents, a search form, or any relevant logos.
<dt><a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/sections.html#the-footer-element><code><footer></code></a>
<dd>The <code>footer</code> element represents a footer for its nearest ancestor sectioning content or sectioning root element. A footer typically contains information about its section such as who wrote it, links to related documents, copyright data, and the like. Footers don’t necessarily have to appear at the end of a section, though they usually do. When the <code>footer</code> element contains entire sections, they represent appendices, indexes, long colophons, verbose license agreements, and other such content.
<dt><a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/text-level-semantics.html#the-time-element><code><time></code></a>
<dd>The <code>time</code> element represents either a time on a 24 hour clock, or a precise date in the proleptic Gregorian calendar, optionally with a time and a time-zone offset.
<dt><a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/text-level-semantics.html#the-mark-element><code><mark></code></a>
<dd>The <code>mark</code> element represents a run of text in one document marked or highlighted for reference purposes.
</dl>
<p>I know you’re anxious to start using these new elements, otherwise you wouldn’t be reading this chapter. But first we need to take a little detour.
<p class=a>❧
<h2 id=unknown-elements>A long digression into how browsers handle unknown elements</h2>
<p>Every browser has a master list of <abbr>HTML</abbr> elements that it supports. For example, Mozilla Firefox’s list is stored in <a href="http://mxr.mozilla.org/seamonkey/source/parser/htmlparser/src/nsElementTable.cpp">nsElementTable.cpp</a>. Elements not in this list are treated as “unknown elements.” There are two fundamental problems with unknown elements:
<ol>
<li><b>How should the element be styled?</b> By default, <code><p></code> has spacing on the top and bottom, <code><blockquote></code> is indented with a left margin, and <code><h1></code> is displayed in a larger font. But what default styles should be applied to unknown elements?</li>
<li><b>What should the element’s DOM look like?</b> Mozilla’s <code>nsElementTable.cpp</code> includes information about what kinds of other elements each element can contain. If you include markup like <code><p><p></code>, the second paragraph element implicitly closes the first one, so the elements end up as siblings, not parent-and-child. But if you write <code><p><span></code>, the <code>span</code> does not close the paragraph, because Firefox knows that <code><p></code> is a block element that can contain the inline element <code><span></code>. So, the <code><span></code> ends up as a child of the <code><p></code> in the DOM.</li>
</ol>
<p>Different browsers answer these questions in different ways. (Shocking, I know.) Of the major browsers, Microsoft Internet Explorer’s answer to both questions is the most problematic, but every browser needs a little bit of help here.
<p>The first question should be relatively simple to answer: don’t give any special styling to unknown elements. Just let them inherit whatever CSS properties are in effect wherever they appear on the page, and let the page author specify all styling with CSS. And that works, mostly, but there’s one little gotcha you need to be aware of.
<div class=pf>
<h4>Professor Markup Says</h4>
<div class=inner>
<blockquote>
<p>All browsers render unknown elements inline, <i>i.e.</i> as if they had a <code>display:inline</code> <abbr>CSS</abbr> rule.
</blockquote>
</div>
</div>
<p>There are several new elements defined in <abbr>HTML5</abbr> which are block-level elements. That is, they can contain other block-level elements, and <abbr>HTML5</abbr>-compliant browsers will style them as <code>display:block</code> by default. If you want to use these elements in older browsers, you will need to define the display style manually:
<pre><code>article,aside,details,figcaption,figure,
footer,header,hgroup,menu,nav,section {
display:block;
}</code></pre>
<p>(This code is lifted from Rich Clark’s <a href=http://html5doctor.com/html-5-reset-stylesheet/><abbr>HTML5</abbr> Reset Stylesheet</a>, which does many other things that are beyond the scope of this chapter.)
<p>But wait, it gets worse! Prior to version 9, Internet Explorer did not apply <em>any</em> styling on unknown elements. For example, if you had this markup:
<pre><code><style type="text/css">
article { display: block; border: 1px solid red }
</style>
...
<article>
<h1>Welcome to Initech</h1>
<p>This is your <span>first day</span>.</p>
</article></code></pre>
<p>Internet Explorer (up to and including IE 8) will not treat the <code><article></code> element as a block-level element, nor will it put a red border around the article. All the style rules are simply ignored. <a href=http://msdn.microsoft.com/en-us/ie/ff468705.aspx#_HTML_Parsing>Internet Explorer 9 fixes this problem</a>.
<p>The second problem is the DOM that browsers create when they encounter unknown elements. Again, the most problematic browser is older versions of Internet Explorer (before version 9, <a href=http://msdn.microsoft.com/en-us/ie/ff468705.aspx#_HTML_Parsing>which fixes this problem too</a>). If IE 8 doesn’t explicitly recognize the element name, it will insert the element into the DOM <em>as an empty node with no children</em>. All the elements that you would expect to be direct children of the unknown element will actually be inserted as siblings instead.
<p>Here is some righteous <abbr>ASCII</abbr> art to illustrate the difference. This is the DOM that <abbr>HTML5</abbr> dictates:
<pre>article
|
+--h1 (child of article)
| |
| +--text node "Welcome to Initech"
|
+--p (child of article, sibling of h1)
|
+--text node "This is your "
|
+--span
| |
| +--text node "first day"
|
+--text node "."</pre>
<p>But this is the DOM that Internet Explorer actually creates:
<pre>article (no children)
h1 (sibling of article)
|
+--text node "Welcome to Initech"
p (sibling of h1)
|
+--text node "This is your "
|
+--span
| |
| +--text node "first day"
|
+--text node "."</pre>
<p>There is a wonderous workaround for this problem. If you <a href=http://xopus.com/devblog/2008/style-unknown-elements.html>create a dummy <code><article></code> element</a> with JavaScript before you use it in your page, Internet Explorer will magically recognize the <code><article></code> element and let you style it with CSS. There is no need to ever insert the dummy element into the <abbr>DOM</abbr>. Simply creating the element once (per page) is enough to teach IE to style the element it doesn’t recognize.
<pre><code><html>
<head>
<style>
article { display: block; border: 1px solid red }
</style>
<mark><script>document.createElement("article");</script></mark>
</head>
<body>
<article>
<h1>Welcome to Initech</h1>
<p>This is your <span>first day</span>.</p>
</article>
</body>
</html></code></pre>
<p>This works in all versions of Internet Explorer, all the way back to IE 6! We can extend this technique to create dummy copies of all the new <abbr>HTML5</abbr> elements at once — again, they’re never inserted into the <abbr>DOM</abbr>, so you’ll never see these dummy elements — and then just start using them without having to worry too much about non-HTML5-capable browsers.
<p>Remy Sharp has done just that, with his aptly named <a href=http://remysharp.com/2009/01/07/html5-enabling-script/><abbr>HTML5</abbr> enabling script</a>. The script has gone through more than a dozen revisions since I started writing this book, but this is the basic idea:
<pre><code><!--[if lt IE 9]>
<script>
var e = ("abbr,article,aside,audio,canvas,datalist,details," +
"figure,footer,header,hgroup,mark,menu,meter,nav,output," +
"progress,section,time,video").split(',');
for (var i = 0; i < e.length; i++) {
document.createElement(e[i]);
}
</script>
<![endif]--></code></pre>
<p>The <code><!--[if lt IE 9]></code> and <code><![endif]--></code> bits are <a href="http://msdn.microsoft.com/en-us/library/ms537512(VS.85).aspx">conditional comments</a>. Internet Explorer interprets them like an <code>if</code> statement: “if the current browser is a version of Internet Explorer less than version 9, then execute this block.” Every other browser will treat the entire block as an <abbr>HTML</abbr> comment. The net result is that Internet Explorer (up to and including version 8) will execute this script, but other browsers will ignore the script altogether. This makes your page load faster in browsers that don’t need this hack.
<p>The JavaScript code itself is relatively straightforward. The variable <var>e</var> ends up as an array of strings like <code>"abbr"</code>, <code>"article"</code>, <code>"aside"</code>, and so on. Then we loop through this array and create each of the named elements by calling <code>document.createElement()</code>. But since we ignore the return value, the elements are never inserted into the <abbr>DOM</abbr>. But this is enough to get Internet Explorer to treat these elements the way we want them to be treated, once we actually use them later in the page.
<p>That “later” bit is important. This script needs to be at the top of your page, preferably in your <code><head></code> element, not at the bottom. That way, Internet Explorer will execute the script <em>before</em> it parses your tags and attributes. If you put this script at the bottom of your page, it will be too late. Internet Explorer will have already misinterpreted your markup and constructed the wrong <abbr>DOM</abbr>, and it won’t go back and adjust it just because of this script.
<p>Remy Sharp has “minified” this script and <a href=http://code.google.com/p/html5shiv/>hosted it on Google Project Hosting</a>. (In case you were wondering, the script itself is open source and MIT-licensed, so you can use it in any project.) If you like, you can even “hotlink” the script by pointing directly to the hosted version, like this:
<pre><code><head>
<meta charset="utf-8" />
<title>My Weblog</title>
<!--[if lt IE 9]>
<script <mark>src="<a href=http://html5shiv.googlecode.com/svn/trunk/html5.js>http://html5shiv.googlecode.com/svn/trunk/html5.js</a>"</mark>></script>
<![endif]-->
</head></code></pre>
<p>Now we’re ready to start using the new semantic elements in <abbr>HTML5</abbr>.
<p class=a>❧
<h2 id=header-element>Headers</h2>
<p class=ss style="width:205"><img src=i/openclipart.org_johnny_automatic_newsboy.png alt="newsboy hawking newspaper" width=205 height=335>
<p>Let’s go back to <a href=examples/blog-original.html>our example page</a>. Specifically, let’s look at just the headers:
<pre><code><div id="header">
<h1>My Weblog</h1>
<p class="tagline">A lot of effort went into making this effortless.</p>
</div>
…
<div class="entry">
<h2>Travel day</h2>
</div>
…
<div class="entry">
<h2>I'm going to Prague!</h2>
</div></code></pre>
<p>There is nothing wrong with this markup. If you like it, you can keep it. It is valid <abbr>HTML5</abbr>. But <abbr>HTML5</abbr> provides some additional semantic elements for headers and sections.
<p>First off, let’s get rid of that <code><div id="header"></code>. This is a common pattern, but it doesn’t mean anything. The <code>div</code> element has no defined semantics, and the <code>id</code> attribute has no defined semantics. (User agents are not allowed to infer any meaning from the value of the <code>id</code> attribute.) You could change this to <code><div id="shazbot"></code> and it would have the same semantic value, <i>i.e.</i>, nothing.
<p><abbr>HTML5</abbr> defines a <code><header></code> element for this purpose. The <abbr>HTML5</abbr> specification has <a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/sections.html#the-header-element>real-world examples of using the <code><header></code> element</a>. Here is what it would look like on <a href=examples/blog-original.html>our example page</a>:
<pre><code><header>
<h1>My Weblog</h1>
<p class="tagline">A lot of effort went into making this effortless.</p>
…
</header></code></pre>
<p>That’s good. It tells anyone who wants to know that this is a header. But what about that tagline? Another common pattern, which up until now had no standard markup. It’s a difficult thing to mark up. A tagline is like a subheading, but it’s “attached” to the primary heading. That is, it’s a subheading that doesn’t create its own section.
<p>Header elements like <code><h1></code> and <code><h2></code> give your page structure. Taken together, they create an outline that you can use to visualize (or navigate) your page. Screenreaders use document outlines to help blind users navigate through your page. There are <a href="http://gsnedders.html5.org/outliner/">online tools</a> and <a href=http://chrispederick.com/work/web-developer/>browser extensions</a> that can help you visualize your document’s outline.
<p>In <abbr>HTML</abbr> 4, <code><h1></code>–<code><h6></code> elements were the <em>only</em> way to create a document outline. The outline on the example page looks like this:
<pre>My Weblog (h1)
|
+--Travel day (h2)
|
+--I'm going to Prague! (h2)</pre>
<p>That’s fine, but it means that there’s no way to mark up the tagline “A lot of effort went into making this effortless.” If we tried to mark it up as an <code><h2></code>, it would add a phantom node to the document outline:
<pre>My Weblog (h1)
|
+--A lot of effort went into making this effortless. (h2)
|
+--Travel day (h2)
|
+--I'm going to Prague! (h2)</pre>
<p>But that’s not the structure of the document. The tagline does not represent a section; it’s just a subheading.
<p>Perhaps we could mark up the tagline as an <code><h2></code> and mark up each article title as an <code><h3></code>? No, that’s even worse:
<pre>My Weblog (h1)
|
+--A lot of effort went into making this effortless. (h2)
|
+--Travel day (h3)
|
+--I'm going to Prague! (h3)</pre>
<p>Now we still have a phantom node in our document outline, but it has “stolen” the children that rightfully belong to the root node. And herein lies the problem: <abbr>HTML</abbr> 4 does not provide a way to mark up a subheading without adding it to the document outline. No matter how we try to shift things around, “A lot of effort went into making this effortless” is going to end up in that graph. And that’s why we ended up with semantically meaningless markup like <code><p class="tagline"></code>.
<p><abbr>HTML5</abbr> provides a solution for this: the <code><hgroup></code> element. The <code><hgroup></code> element acts as a wrapper for two or more <em>related</em> heading elements. What does “related” mean? It means that, taken together, they only create a single node in the document outline.
<p>Given this markup:
<pre><code><header>
<mark><hgroup></mark>
<h1>My Weblog</h1>
<mark><h2></mark>A lot of effort went into making this effortless.<mark></h2></mark>
<mark></hgroup></mark>
…
</header>
…
<div class="entry">
<h2>Travel day</h2>
</div>
…
<div class="entry">
<h2>I'm going to Prague!</h2>
</div></code></pre>
<p>This is the document outline that is created:
<pre>My Weblog (h1 of its hgroup)
|
+--Travel day (h2)
|
+--I'm going to Prague! (h2)</pre>
<p>You can test your own pages in the <a href="http://gsnedders.html5.org/outliner/"><abbr>HTML5</abbr> Outliner</a> to ensure that you’re using the heading elements properly.
<p class=a>❧
<h2 id=article-element>Articles</h2>
<p>Continuing with <a href=examples/blog-original.html>our example page</a>, let’s see what we can do about this markup:
<pre><code><div class="entry">
<p class="post-date">October 22, 2009</p>
<h2>
<a href="#"
rel="bookmark"
title="link to this post">
Travel day
</a>
</h2>
…
</div></code></pre>
<p>Again, this is valid <abbr>HTML5</abbr>. But <abbr>HTML5</abbr> provides a more specific element for the common case of marking up an article on a page — the aptly named <code><article></code> element.
<pre><code><mark><article></mark>
<p class="post-date">October 22, 2009</p>
<h2>
<a href="#"
rel="bookmark"
title="link to this post">
Travel day
</a>
</h2>
…
<mark></article></mark></code></pre>
<p>Ah, but it’s not quite that simple. There is one more change you should make. I’ll show it to you first, then explain it:
<pre><code><article>
<header>
<p class="post-date">October 22, 2009</p>
<mark><h1></mark>
<a href="#"
rel="bookmark"
title="link to this post">
Travel day
</a>
<mark></h1></mark>
</header>
…
</article></code></pre>
<p>Did you catch that? I changed the <code><h2></code> element to an <code><h1></code>, and wrapped it inside a <code><header></code> element. You’ve already seen the <code><header></code> element in action. Its purpose is to wrap all the elements that form the article’s header (in this case, the article’s publication date and title). But…but…but… shouldn’t you only have one <code><h1></code> per document? Won’t this screw up the document outline? No, but to understand why not, we need to back up a step.
<p>In <abbr>HTML</abbr> 4, the <em>only</em> way to create a document outline was with the <code><h1></code>–<code><h6></code> elements. If you only wanted one root node in your outline, you had to limit yourself to one <code><h1></code> in your markup. But the <abbr>HTML5</abbr> specification <a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/sections.html#headings-and-sections>defines an algorithm for generating a document outline</a> that incorporates the new semantic elements in <abbr>HTML5</abbr>. The <abbr>HTML5</abbr> algorithm says that an <code><article></code> element creates a new section, that is, a new node in the document outline. And in <abbr>HTML5</abbr>, each section can have its own <code><h1></code> element.
<p>This is a drastic change from <abbr>HTML</abbr> 4, and here’s why it’s a good thing. Many web pages are really generated by templates. A bit of content is taken from one source and inserted into the page up here; a bit of content is taken from another source and inserted into the page down there. Many tutorials are structured the same way. “Here’s some <abbr>HTML</abbr> markup. Just copy it and paste it into your page.” That’s fine for small bits of content, but what if the markup you’re pasting is an entire section? In that case, the tutorial will read something like this: “Here’s some <abbr>HTML</abbr> markup. Just copy it, paste it into a text editor, and fix the heading tags so they match the nesting level of the corresponding heading tags in the page you’re pasting it into.”
<p>Let me put it another way. <abbr>HTML</abbr> 4 has no <em>generic</em> heading element. It has six strictly numbered heading elements, <code><h1></code>–<code><h6></code>, which must be nested in exactly that order. That kind of sucks, especially if your page is “assembled” instead of “authored.” And this is the problem that <abbr>HTML5</abbr> solves with the new sectioning elements and the new rules for the existing heading elements. If you’re using the new sectioning elements, I can give you this markup:
<pre><code><article>
<header>
<h1>A syndicated post</h1>
</header>
<p>Lorem ipsum blah blah…</p>
</article></code></pre>
<p>and you can copy it and paste it <em>anywhere in your page</em> without modification. The fact that it contains an <code><h1></code> element is not a problem, because the entire thing is contained within an <code><article></code>. The <code><article></code> element defines a self-contained node in the document outline, the <code><h1></code> element provides the title for that outline node, and all the other sectioning elements on the page will remain at whatever nesting level they were at before.
<div class=pf>
<h4>Professor Markup Says</h4>
<div class=inner>
<blockquote><p>As with all things on the web, reality is a little more complicated than I’m letting on. The new “explicit” sectioning elements (like <code><h1></code> wrapped in <code><article></code>) may interact in unexpected ways with the old “implicit” sectioning elements (<code><h1></code>–<code><h6></code> by themselves). Your life will be simpler if you use one or the other, but not both. If you must use both on the same page, be sure to check the result in <a href=http://gsnedders.html5.org/outliner/>the <abbr>HTML5</abbr> Outliner</a> and verify that your document outline makes sense.
</blockquote>
</div>
</div>
<p class=a>❧
<h2 id=time-element>Dates and Times</h2>
<p class=ss style="float:left;margin:0 30px 0 0"><img src=i/openclipart.org_johnny_automatic_clock_tower.png width=205 height=378 alt="clock tower">
<p>This is exciting, right? I mean, it’s not “skiing down Mount Everest naked while reciting the Star Spangled Banner backwards” exciting, but it’s pretty exciting as far as semantic markup goes. Let’s continue with <a href=examples/blog-original.html>our example page</a>. The next line I want to highlight is this one:
<pre><code><div class="entry">
<mark><p class="post-date">October 22, 2009</p></mark>
<h2>Travel day</h2>
</div></code></pre>
<p>Same old story, right? A common pattern — designating the publication date of an article — that has no semantic markup to back it up, so authors resort to generic markup with custom <code>class</code> attributes. Again, this is valid <abbr>HTML5</abbr>. You’re not <em>required</em> to change it. But <abbr>HTML5</abbr> does provide a specific solution for this case: the <code><time></code> element.
<pre><code><time datetime="2009-10-22" pubdate>October 22, 2009</time></code></pre>
<p>There are three parts to a <code><time></code> element:
<ol>
<li>A machine-readable timestamp
<li>Human-readable text content
<li>An optional <code>pubdate</code> flag
</ol>
<p>In this example, the <code>datetime</code> attribute only specifies a date, not a time. The format is a four-digit year, two-digit month, and two-digit day, separated by dashes:
<pre><code><time <mark>datetime="2009-10-22"</mark> pubdate>October 22, 2009</time></code></pre>
<p>If you want to include a time too, add the letter <code>T</code> after the date, then the time in 24-hour format, then a timezone offset.
<pre><code><time datetime="<mark>2009-10-22T13:59:47-04:00</mark>" pubdate>
October 22, 2009 1:59pm EDT
</time></code></pre>
<p>(The date/time format is pretty flexible. The <abbr>HTML5</abbr> specification <a href=http://www.whatwg.org/specs/web-apps/current-work/multipage/common-microsyntaxes.html#valid-global-date-and-time-string>contains examples of valid date/time strings</a>.)
<p>Notice I changed the text content — the stuff between <code><time></code> and <code></time></code> — to match the machine-readable timestamp. This is not actually required. The text content can be anything you like, as long as you provide a machine-readable date/timestamp in the <code>datetime</code> attribute. So this is valid <abbr>HTML5</abbr>:
<pre><code><time datetime="2009-10-22"><mark>last Thursday</mark></time></code></pre>
<p>And this is also valid <abbr>HTML5</abbr>:
<pre><code><time datetime="2009-10-22"></time></code></pre>
<p>The final piece of the puzzle here is the <code>pubdate</code> attribute. It’s a Boolean attribute, so just add it if you need it, like this:
<pre><code><time datetime="2009-10-22" <mark>pubdate</mark>>October 22, 2009</time></code></pre>
<p>If you dislike “naked” attributes, this is also equivalent:
<pre><code><time datetime="2009-10-22" <mark>pubdate="pubdate"</mark>>October 22, 2009</time></code></pre>
<p>What does the <code>pubdate</code> attribute mean? It means one of two things. If the <code><time></code> element is in an <code><article></code> element, it means that this timestamp is the publication date of the article. If the <code><time></code> element is not in an <code><article></code> element, it means that this timestamp is the publication date of the entire document.
<p>Here’s the entire article, reformulated to take full advantage of <abbr>HTML5</abbr>:
<pre><code><article>
<header>
<time datetime="2009-10-22" pubdate>
October 22, 2009
</time>
<h1>
<a href="#"
rel="bookmark"
title="link to this post">
Travel day
</a>
</h1>
</header>
<p>Lorem ipsum dolor sit amet…</p>
</article></code></pre>
<p class=a>❧
<h2 id=nav-element>Navigation</h2>
<p class=ss><img src=i/openclipart.org_johnny_automatic_a_pink.png alt="a man navigating a sailboat on the water" width=345 height=377>
<p>One of the most important parts of any web site is the navigation bar. CNN.com has “tabs” along the top of each page that link to the different news sections — “Tech,” “Health,” “Sports,” <i class=baa>&</i>c. Google search results pages have a similar strip at the top of the page to try your search in different Google services — “Images,” “Video,” “Maps,” <i class=baa>&</i>c. And <a href=examples/blog-original.html>our example page</a> has a navigation bar in the header that includes links to different sections of our hypothetical site — “home,” “blog,” “gallery,” and “about.”
<p>This is how the navigation bar was originally marked up:
<pre><code><div id="nav">
<ul>
<li><a href="#">home</a></li>
<li><a href="#">blog</a></li>
<li><a href="#">gallery</a></li>
<li><a href="#">about</a></li>
</ul>
</div></code></pre>
<p>Again, this is valid <abbr>HTML5</abbr>. But while it’s marked up as a list of four items, there is nothing about the list that tells you that it’s part of the site navigation. Visually, you could guess that by the fact that it’s part of the page header, and by reading the text of the links. But semantically, there is nothing to distinguish this list of links from any other.
<p>Who cares about the semantics of site navigation? For one, <a href=http://diveintoaccessibility.org/>people with disabilities</a>. Why is that? Consider this scenario: your motion is limited, and using a mouse is difficult or impossible. To compensate, you might use a browser add-on that allows you to jump to (or jump past) major navigation links. Or consider this: if your sight is limited, you might use a dedicated program called a “screenreader” that uses text-to-speech to speak and summarize web pages. Once you get past the page title, the next important pieces of information about a page are the major navigation links. If you want to navigate quickly, you’ll tell your screenreader to jump to the navigation bar and start reading. If you want to browse quickly, you might tell your screenreader to jump <em>over</em> the navigation bar and start reading the main content. Either way, being able to determine navigation links programmatically is important.
<p>So, while there’s nothing wrong with using <code><div id="nav"></code> to mark up your site navigation, there’s nothing particularly right about it either. It’s suboptimal in ways that affect real people. <abbr>HTML5</abbr> provides a semantic way to mark up navigation sections: the <code><nav></code> element.
<pre><code><mark><nav></mark>
<ul>
<li><a href="#">home</a></li>
<li><a href="#">blog</a></li>
<li><a href="#">gallery</a></li>
<li><a href="#">about</a></li>
</ul>
<mark></nav></mark></code></pre>
<div class=pf>
<h4>Ask Professor Markup</h4>
<div class=inner>
<blockquote class=note>
<p><span>☞</span>Q: Are <a href=http://www.webaim.org/techniques/skipnav/>skip links</a> compatible with the <code><nav></code> element? Do I still need skip links in <abbr>HTML5</abbr>?<br>
<p>A: Skip links allow readers to skip over navigation sections. They are helpful for disabled users who use third-party software to read a web page aloud and navigate it without a mouse. (<a href=http://www.webaim.org/techniques/skipnav/>Learn how and why to provide skip links</a>.)
<p>Once screenreaders are updated to recognize the <code><nav></code> element, skip links will become obsolete, since the screenreader software will be able to automatically offer to skip over a navigation section marked up with the <code><nav></code> element. However, it will be a while before all the disabled users on the web upgrade to <abbr>HTML5</abbr>-savvy screenreader software, so you should continue to provide your own skip links to jump over <code><nav></code> sections.
</blockquote>
</div>
</div>
<p class=a>❧
<h2 id=footer-element>Footers</h2>
<p>At long last, we have arrived at the end of <a href=examples/blog-original.html>our example page</a>. The last thing I want to talk about is the last thing on the page: the footer. The footer was originally marked up like this:
<pre><code><div id="footer">
<p>&#167;</p>
<p>&#169; 2001&#8211;9 <a href="#">Mark Pilgrim</a></p>
</div></code></pre>
<p>This is valid <abbr>HTML5</abbr>. If you like it, you can keep it. But <abbr>HTML5</abbr> provides a more specific element for this: the <code><footer></code> element.
<pre><code><mark><footer></mark>
<p>&#167;</p>
<p>&#169; 2001&#8211;9 <a href="#">Mark Pilgrim</a></p>
<mark></footer></mark></code></pre>
<p>What’s appropriate to put in a <code><footer></code> element? Probably whatever you’re putting in a <code><div id="footer"></code> now. OK, that’s a circular answer. But really, that’s it. The <abbr>HTML5</abbr> specification says, “A footer typically contains information about its section such as who wrote it, links to related documents, copyright data, and the like.” That’s what’s in this example page: a short copyright statement and a link to an about-the-author page. Looking around at some popular sites, I see lots of footer potential.
<ul>
<li><a href=http://www.cnn.com/>CNN</a> has a footer that contains a copyright statement, links to translations, and links to terms of service, privacy, “about us,” “contact us,” and “help” pages. All totally appropriate <code><footer></code> material.
<li><a href=http://www.google.com/>Google</a> has a famously sparse home page, but at the bottom of it are links to “Advertising Programs,” “Business Solutions,” and “About Google”; a copyright statement; and a link to Google’s privacy policy. All of that could be wrapped in a <code><footer></code>.
<li>My weblog has a footer with links to my other sites, plus a copyright statement. Definitely appropriate for a <code><footer></code> element. (Note that the links themselves should <em>not</em> be wrapped in a <code><nav></code> element, because they are not site navigation links; they are just a collection of links to my other projects on other sites.)
</ul>
<p>“<a href=http://ui-patterns.com/pattern/FatFooter>Fat footers</a>” are all the rage these days. Take a look at the footer on <a href=http://www.w3.org/>the <abbr>W3C</abbr> site</a>. It contains three columns, labeled “Navigation,” “Contact W3C,” and “W3C Updates.” The markup looks like this, more or less:
<pre><code><div id="w3c_footer">
<div class="w3c_footer-nav">
<h3>Navigation</h3>
<ul>
<li><a href="/">Home</a></li>
<li><a href="/standards/">Standards</a></li>
<li><a href="/participate/">Participate</a></li>
<li><a href="/Consortium/membership">Membership</a></li>
<li><a href="/Consortium/">About W3C</a></li>
</ul>
</div>
<div class="w3c_footer-nav">
<h3>Contact W3C</h3>
<ul>
<li><a href="/Consortium/contact">Contact</a></li>
<li><a href="/Help/">Help and FAQ</a></li>
<li><a href="/Consortium/sup">Donate</a></li>
<li><a href="/Consortium/siteindex">Site Map</a></li>
</ul>
</div>
<div class="w3c_footer-nav">
<h3>W3C Updates</h3>
<ul>
<li><a href="http://twitter.com/W3C">Twitter</a></li>
<li><a href="http://identi.ca/w3c">Identi.ca</a></li>
</ul>
</div>
<p class="copyright">Copyright © 2009 W3C</p>
</div></code></pre>
<p>To convert this to semantic <abbr>HTML5</abbr>, I would make the following changes:
<ul>
<li>Convert the outer <code><div id="w3c_footer"></code> to a <code><footer></code> element.
<li>Convert the first two instances of <code><div class="w3c_footer-nav"></code> to <code><nav></code> elements, and the third instance to a <code><section></code> element.
<li>Convert the <code><h3></code> headers to <code><h1></code>, since they’ll now each be inside a sectioning element. The <code><nav></code> element creates a section in the document outline, just like the <a href=#article-element><code><article></code> element</a>.
</ul>
<p>The final markup might look something like this:
<pre><code><mark><footer></mark>
<mark><nav></mark>
<mark><h1></mark>Navigation<mark></h1></mark>
<ul>
<li><a href="/">Home</a></li>
<li><a href="/standards/">Standards</a></li>
<li><a href="/participate/">Participate</a></li>
<li><a href="/Consortium/membership">Membership</a></li>
<li><a href="/Consortium/">About W3C</a></li>
</ul>
<mark></nav></mark>
<mark><nav></mark>
<mark><h1></mark>Contact W3C<mark></h1></mark>
<ul>
<li><a href="/Consortium/contact">Contact</a></li>
<li><a href="/Help/">Help and FAQ</a></li>
<li><a href="/Consortium/sup">Donate</a></li>
<li><a href="/Consortium/siteindex">Site Map</a></li>
</ul>
<mark></nav></mark>
<mark><section></mark>
<mark><h1></mark>W3C Updates<mark></h1></mark>
<ul>
<li><a href="http://twitter.com/W3C">Twitter</a></li>
<li><a href="http://identi.ca/w3c">Identi.ca</a></li>
</ul>
<mark></section></mark>
<p class="copyright">Copyright © 2009 W3C</p>
<mark></footer></mark></code></pre>
<p class=a>❧
<h2 id=further-reading>Further Reading</h2>
<p>Example pages used throughout this chapter:
<ul>
<li><a href=examples/blog-original.html>Original (<abbr>HTML</abbr> 4)</a>
<li><a href=examples/blog-html5.html>Modified (<abbr>HTML5</abbr>)</a>
</ul>
<p>On character encoding:
<ul>
<li><a href="http://www.joelonsoftware.com/articles/Unicode.html">The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)</a> by <cite>Joel Spolsky</cite>
<li><a href="http://www.tbray.org/ongoing/When/200x/2003/04/06/Unicode">On the Goodness of Unicode</a>, <a href="http://www.tbray.org/ongoing/When/200x/2003/04/13/Strings">On Character Strings</a>, and <a href="http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF">Characters vs. Bytes</a> by <cite>Tim Bray</cite>
</ul>
<p>On enabling new <abbr>HTML5</abbr> in Internet Explorer:
<ul>
<li><a href=http://xopus.com/devblog/2008/style-unknown-elements.html>How to style unknown elements in IE</a> by <cite>Sjoerd Visscher</cite>
<li><a href=http://ejohn.org/blog/html5-shiv/><abbr>HTML5</abbr> shiv</a> by <cite>John Resig</cite>
<li><a href=http://remysharp.com/2009/01/07/html5-enabling-script/><abbr>HTML5</abbr> enabling script</a> by <cite>Remy Sharp</cite>
</ul>
<p>On standards modes and doctype sniffing:
<ul>
<li><a href=http://hsivonen.iki.fi/doctype/>Activating Browser Modes with Doctype</a> by <cite>Henri Sivonen</cite>. This is the only article you should read on the subject. Any article on doctypes that doesn’t reference Henri’s work is guaranteed to be out of date, incomplete, or wrong.
</ul>
<p><abbr>HTML5</abbr>-aware validator:
<ul>
<li><a href=http://html5.validator.nu/>html5.validator.nu</a>
</ul>
<p class=a>❧
<p>This has been “What Does It All Mean?” The <a href=table-of-contents.html>full table of contents</a> has more if you’d like to keep reading.
<div class=pf>
<h4>Did You Know?</h4>
<div class=moneybags>
<blockquote><p>In association with Google Press, O’Reilly is distributing this book in a variety of formats, including paper, ePub, Mobi, and <abbr>DRM</abbr>-free <abbr>PDF</abbr>. The paid edition is called “HTML5: Up & Running,” and it is available now. This chapter is included in the paid edition.
<p>If you liked this chapter and want to show your appreciation, you can <a href="http://www.amazon.com/HTML5-Up-Running-Mark-Pilgrim/dp/0596806027?ie=UTF8&tag=diveintomark-20&creativeASIN=0596806027">buy “HTML5: Up & Running” with this affiliate link</a> or <a href=http://oreilly.com/catalog/9780596806033>buy an electronic edition directly from O’Reilly</a>. You’ll get a book, and I’ll get a buck. I do not currently accept direct donations.
</blockquote>
</div>
</div>
<p class=c>Copyright MMIX–MMXI <a href=about.html>Mark Pilgrim</a>
<script src=j/jquery.js></script>
<script src=j/dih5.js></script>