Preamble

This document specifies the format of an XML-based sitemap file, as used by the Standard-Sitemap Protocol (SSP). The level of detail offered is aimed at developers of sitemap-aware software, so that implementations can produce consistent behaviour. Consequently, authors experimenting with sitemaps can understand why their sitemap file did or did not produce the desired effects, and can be sure that it will be interpreted correctly in all contexts.

Some (partial) example sitemaps are available.

Semantics

An SSP sitemap expresses several aspects of a site:

a hierarchy of nodes (including a root node) which could help a visitor to navigate the site, or determine the relative importance of nodes,
a set of nodes with special meanings (e.g. a home page, a search page, a page of contact information, etc.), known as roles,
how HTML/XML documents within the site should be locally modified by clients when the other information is presented to a visitor.

Nodes in the hierarchy are referred to as hierarchy nodes. Hierarchy nodes may be either the root node, item nodes or group nodes. Nodes with special meanings are known as role nodes. A node may both have a role and exist in the hierarchy.

A sitemap is represented by one or more SSP sitemap XML files. A URI identifies one of them, the root file, and this may reference other files to complete the representation of the sitemap.

Nodes and their qualities

As part of the hierarchy, each node may contain zero or more other nodes, and each node has at least one parent (except for the anonymous root node, which has no parent).

A node has zero or more roles, a relation and a priority. A node has one or more variants.

Reachability and exclusion of elements

In the node hierarchy, the root node is represented by a <sitemap> element, item nodes are represented by <item> elements, and group nodes are represented by <group> elements. Role nodes are similarly represented by <item> elements. Note that the elements that represent hierarchy nodes themselves form a hierarchy due to the nature of XML, and this hierarchy is not necessarily congruent with the node hierarchy.

With respect to a given root file, elements may be reached. Some elements may be excluded. Only elements that are reached may represent role nodes, and only elements that are reached and not excluded may represent hierarchy nodes.

If the root file’s document element is a <sitemap> element, it is reached, and represents the root node.
If a <sitemap>, <item> or <group> element is reached and not excluded, its <item> and <group> children are also reached. Hierarchy nodes represented by these reached elements can be children of the node represented by their parent element.
If a <sitemap>, <item> or <group> element is reached and not excluded, and contains <external> elements that reference <item> and <group> elements, the referenced elements are also reached. Hierarchy nodes represented by these reached elements can be children of the node represented by the parent of the referencing <external> element.

Despite being reached, an element may be excluded from the node hierarchy—i.e. it represents no node in the hierarchy—under any of the following conditions:

The element has a tree attribute with the value exclude.
The element has a tree value of auto, and its role set includes neither tree nor any unrecognised role.
The element has a tree value of user, and its role set includes neither tree nor any unrecognised role, and the user has specified that such elements should be excluded (usually by a configuration option).
The element is an <item>, whose represented variants includes at least one with no ‘name’ quality.
The element is an <item>, has no non-excluded <item> or <group> children, has no <external> children referencing non-excluded <item> or <group> elements, and whose represented variants includes at least one with no ‘location’ quality.

Note that it is possible to compute an element’s exclusion and set of represented variants without knowing whether it has been reached.

Although such excluded elements have been reached, and indeed can represent role nodes, they do not allow their own descendant elements to be reached. (This does not prevent those descendants from being reached by other means.)

Given that the default value of role is tree, and that the default value of tree is auto, an ordinary <item> or <group> element whose parent is reached will also be reached and not excluded, because the role set includes tree. It therefore will appear as a hierarchy node.

Node variants and their qualities

Each node variant can have some of the following qualities:

name: a short text acting as the name or title of the node, as specified by the name attribute
description: a longer text describing the node, possibly distinguishing it from other nodes with similar or identical names, as specified by the description attribute
location: the address (URI) of the variant’s content, as specified by the url attribute
language: the language in which the content at the variant’s location is written, as specified by the lang attribute
character encoding: the octet-to-character translation by which the content at the variant’s location is encoded, as specified by the charset attribute
content type: the format of the content at the variant’s location, as specified by the type attribute
search template: the format of a query that can be issued to the node’s location for searching the site, as specified by the data attribute
search method: the HTTP method of a query that can be issued to the node’s location for searching the site, as specified by the method attribute
referrer: the choice of HTTP referrer URI when fetching a node’s ‘location’, as specified by the refer attribute

An <item> or <group> element (a node element) serves as the root of a local hierarchy, with all other elements being <variant>s. Altogether, these specify all of the corresponding node’s variants.

The node element may specify qualities through the XML attributes listed above. Each contained <variant> inherits the qualities of its parent, and may introduce qualities through its own attributes. Finally, each leaf of the local hierarchy specifies a single variant of its corresponding node, with the qualities accumulated by its ancestry.

<item name="FAQ" url="faq.html">
  <variant description="Frequently asked questions" lang="en" />
  <variant description="Oftaj demandoj" lang="eo" />
</item>

In the example above, an <item> defines the ‘name’ and ‘location’ qualities common to its two variants. Their ‘description’ qualities, however, are language-dependent.

All variants of an item node must have the ‘name’ quality, or the node and its children need not appear in the node hierarchy. All variants of a childless item node must have the ‘location’ quality, or the node need not appear in the node hierarchy.

All variants of an item node with search in its role set must have the ‘search template’ quality, or the node cannot fulfil the search role.

A group node cannot have a ‘location’ quality.

Roles

An item node may have several roles, as determined by its representing element’s role attribute, and partly by that element’s position in the root sitemap file.

The following roles are defined:

home: The node refers to a home page or start page from which the user may restart navigation.
contentinfo: The node refers to a page describing how the site was made or published, and who is responsible for its upkeep and accuracy.
contact: The node refers to a page giving contact information for users of the site.
search: The node refers to a family of pages permitting a site-limited search.
searchpage: The node refers to a conventional search page.
tree: The node takes part in the node hierarchy.

Implementations are free to define other roles, but should do so by agreement with future versions of this specification, or by placing those roles in a private namespace (a mechanism for which is yet to be defined).

In its sidebar, the Firefox extension uses the search role to configure the search field, but ignores searchpage. Meanwhile, its pop-up menu for the customizable toolbars ignores search, but presents searchpage under its Search item.

An <item>’s role attribute specifies a space-separated list of role names. This is an initial set of roles that any node represented by this element can fulfil. The default list is tree, so setting the attribute to another single value implies that the represented node should not appear in the tree, i.e. it is excluded.

The first <item> in the <sitemap> element of the root file additionally takes on the role home, if no other <item> is reached with that explicit role. Note that this does not exclude that element from the node hierarchy, as exclusion is defined in terms of the actual value of the role attribute, not the set of roles that a node ultimately fulfils.

Site-limited search

The search role makes use of a node’s ‘search method’ and ‘search template’ qualities. An implementation may use it to provide the user with a ‘site-limited search’. After accepting a search term, the user agent may visit a URI formed from the node’s ‘location’, using the template resolved against the search term and the root file’s URI as defined under the data attribute.

If the ‘search method’ is get, a query ? and the resolved template are resolved against the location, and the user agent visits that address with an HTTP GET request. Otherwise, the ‘search method’ is post, and the resolved template is POSTed to the node’s location as application/x-www-form-urlencoded. The ‘search method’ is specified by the method attribute.

For example:

<item   role="search"
        name="Search"
 description="Search"
         url="/cgi-bin/search"
    xml:base="http://www.example.foo/juice/"
        data="q=%s" />

This item is only activated by filling in the search field, and does not appear in the source tree (no tree role). A search query of foo invokes a GET http://www.example.foo/cgi-bin/search?q=foo.

<item   role="search searchpage"
      method="post"
        name="Search"
 description="Search"
         url="/cgi-bin/search"
    xml:base="http://www.example.foo/juice/"
        data="q=%s" />

In the variation above, a POST http://www.example.foo/cgi-bin/search is issued, with q=foo as the content. Also, the address http://www.example.foo/cgi-bin/search may be listed as the search page.

<item   role="search tree"
        name="Search"
 description="Search"
         url="search"
    xml:base="http://www.example.foo/juice/"
        data="q=%s" />

Finally, in this variation, GET http://www.example.foo/juice/search?q=foo is issued when the term foo is sought, and the item search also appears in the tree.

Data types

Language code

A language code is a case-insensitive string identifying a natural language, possibly a specific regional dialect. For example:

en identifies English.
de identifies German.
en-gb identifies English as spoken in Great Britain.
de-at identifies German as spoken in Austria.

SSP language codes follow the same format as HTML language codes. The first component is an ISO 639:1988 two-letter code. The second, if present, is an ISO 3166:1993 country code.

Character encoding

A character encoding (or ‘charset’, informally) specifies translation between octets and characters. For example:

US-ASCII
UTF-8
ISO-Latin-1

These names are registered under IANA character sets.

Content type

A content type specifies the nature of a resource. For example:

image/png identifies the PNG format for images.
text/html identifies HTML.
application/pdf identifies PDF.

These names are registered under IANA Media Types.

URI reference

This is a string as defined by RFC3986: Uniform Resource Identifier (URI): Generic Syntax

Element ID

This is an element identifier as defined in xml:id Version 1.0. It is a case-insensitive string consisting of letters, digits, underscores, dashes, and dots.

Elements

Element `<sitemap>`

Namespace: http://standard-sitemap.org/2007/ns
Attributes:
- charset
- lang
Content:
1. Zero or more of <class-change>
2. One or more of:
  - <item>
  - <external>

The <sitemap> element is the root of an SSP sitemap document.

Element `<item>`

Namespace: http://standard-sitemap.org/2007/ns
Attributes:
- charset
- data
- description
- lang
- name
- order
- priority
- refer
- relation
- role
- tree
- type
- url
Content:
1. Zero or more of <variant>
2. One or more of:
  - <item>
  - <group>
  - <external>
Child of:
- <item>
- <group>
- <sitemap>

The <item> element represents an ordinary node. Many of its attributes set the default qualities for the node’s variants, if it has any <variant> children, or set the qualities of the node’s sole variant.

Element `<group>`

Namespace: http://standard-sitemap.org/2007/ns
Attributes:
- charset
- description
- lang
- name
- order
- relation
Content:
1. Zero or more of <variant>
2. One or more of:
  - <item>
  - <group>
  - <external>
Child of:
- <item>
- <group>
- <sitemap>

The <group> element expresses a node of lesser prominance in the hierarchy. It need not have a name, and has no location. Its children may be rendered as if they were children of the <group>’s parent, and cannot be folded away separately from that parent’s other children. Many of its attributes set the default qualities for its node’s variants, if it has any <variant> children, or set the qualities of the node’s sole variant.

Element `<external>`

Namespace: http://standard-sitemap.org/2007/ns
Attributes:
- url (required)
Content: empty
Child of:
- <item>
- <group>
- <sitemap>

The <external> element identifies an <item>/<group> element in the same or another document. The value of the url attribute is resolved against the element’s base, and identifies the <item>/<group> by its xml:id attribute.

The node represented by the referenced <item>/<group>, including its variants and their qualities, its child nodes, and its role, relation and priority, becomes a child of the node represented by the element containing the referencing <external> element.

Element `<variant>`

Namespace: http://standard-sitemap.org/2007/ns
Attributes:
- charset
- data
- description
- lang
- name
- refer
- type
- url
Content:
1. Zero or more of <variant>
Child of:
- <item>
- <group>
- <variant>

The <variant> element allows variants of a node to be specified. Each <variant>’s nearest <item> or <group> ancestor represents that node as a whole, and is the <variant>’s node element. Each <variant> may therefore appear in <item>, <group>, or other <variant>s.

Each <variant> that has no child elements specifies a variant of its node. Attributes of such a <variant> specify the variant’s qualities. For attributes that are not set on the <variant> itself, the qualities are derived from the corresponding attributes of the nearest ancestors that set them (i.e. they are inherited), with the following restrictions. These attributes may be inherited from any ancestor:

charset
lang

Other attributes may only be inherited from the node element or its children.

Element `<class-change>`

Namespace: http://standard-sitemap.org/2007/ns
Attributes:
- attr
- elem (required)
- prefix (required)
Content: empty
Child of:
- <sitemap>

The <class-change> element specifies how a document served by the sitemap should be modified to indicate that it is being so served. The XPath expression specified by elem identifies an element in the served document to be modified. attr identifies an attribute on that element to be modified. The attribute prefix identifies the prefix of a family of class names to be updated and maintained in the attribute value, according to a display levels in the range [0,100].

Whenever the display level is set to N, the attribute is modified so that its set of classes of the form prefix-over-integer and prefix-under-integer consists of exactly 100 items:

prefix-over-0 upto prefix-over-L, where L=N−1
prefix-under-M upto prefix-over-100, where M=N+1

Authors are expected to use these changes to dynamically alter the styling of their site in the distinct cases of being visited by a sitemap-aware user agent and a sitemap-unaware user agent.

Attributes

Attribute `attr`

Value: attribute name
Default: class
Appears on:
- <class-change>

This specifies the name of an attribute of the element specified by elem, whose value should be managed as the served page’s display level is changed.

The attribute name may include a namespace prefix, for any prefix in effect on the <class-change> element.

Attribute `charset`

Value: character encoding
Variant quality: character encoding
Appears on:
- <group>
- <item>
- <sitemap>
- <variant>

This attribute specifies the ‘character encoding’ quality of the node variant it applies to.

Attribute `data`

Value: URI query string template
Variant quality: search template
Appears on:
- <item> (required on every variant when role includes search)

This attribute specifies the ‘search template’ quality of the node variant it applies to. This string specifies the template for the query data used in a site-limited search. Various % expressions are replaced by strings according to the following table lists, showing the result of applying them to an example sitemap address of:

http://www.example.com/a/b/c/standard-sitemap.xml

…and an example search query fish.

Expression	Meaning	Example
`%s`	Search term	`fish`
`%w`	Website home (`./` resolved against sitemap address)	`http://www.example.com/a/b/c/`
`%(1w)`	Parent (`../` resolved against sitemap address)	`http://www.example.com/a/b/`
`%(2w)`	Parent (`../../` resolved against sitemap address)	`http://www.example.com/a/`
`%r`	Root (`/` resolved against sitemap address)	`http://www.example.com/`
`%h`	Host of sitemap address	`www.example.com`

All expanded values are escaped as if they are URI query values.

Attribute `description`

Value: text
Variant quality: description
Appears on:
- <group>
- <item>
- <variant>

This attribute specifies the ‘description’ quality of the node variant it applies to. This should be a one- or two-line description, and will likely appear as a toolip of a menu item or navigation tree.

Attribute `elem`

Value: XPath expression
Appears on:
- <class-change> (required)

This identifies an element whose attribute should be managed as the served page’s display level is changed. Only the first element that matches the expression is modified.

The XPath expression may be written in terms of any namespace prefixes in effect on the <class-change> element.

Attribute `lang`

Value: language code
Variant quality: language
Appears on:
- <group>
- <item>
- <sitemap>
- <variant>

This attribute specifies the ‘language’ quality of the node variant it applies to.

Attribute `method`

Value: post or get
Default: get
Variant quality: search method
Appears on:
- <item>

This attribute specifies the ‘search method’ quality of the node variant it applies to. It specifies the HTTP request method, GET or POST, to be used when performing a site-limited search.

Attribute `name`

Value: text
Variant quality: name
Appears on:
- <group>
- <item>
- <variant>

This attribute specifies the ‘name’ quality of the node variant it applies to. This should be a relatively short name, as it will likely appear as the text of a menu item or navigation tree.

Attribute `order`

Value: none, lexical, base10, base16, version
Default: none
Appears on:
- <group>
- <item>

This attribute allows an implementation to make certain assumptions about the ordering-by-name of child nodes of the node represented by this element.

none: The implementation can make no assumptions about the ordering of child nodes.
lexical: The child nodes are lexically ordered by name.
base10: The child nodes are numerically ordered by name, in radix 10.
base16: The child nodes are numerically ordered by name, in radix 16.
version: The child nodes are ordered by name, and each name is a hierarchical version number (e.g. 1.3.1).

If an ordering exists, and there are many nodes for the implementation to render, it may automatically group and subgroup them, and use their names to work out the names of the synthetic groups. This allows the implementation to choose the size of such groups according to local requirements, e.g. the number that fit comfortably into a screen.

For example, a large number of programming symbols could be divided into several synthetic groups whose names are formed by taking the name of the first node in the group, appending an ellipsis, and then appending the name of the last node in the group. This node:

<group name="Defined symbols" order="lexical">
  <item name="abort" .../>
  <item name="abs" .../>
  <item name="acos" .../>
  <!-- 900 or so other items, in alphabetic order -->
  <item name="wscanf" .../>
  <item name="xor" .../>
  <item name="xor_eq" .../>
</group>

…could be divided like this:

abort…atoi
atol…catanf
catanh…compl
…30 groups of about 30 items each…
uint_fast8_t…vprintf
vscanf…wcsncat
wcsncmp…xor-eq

Attribute `prefix`

Value: class-name prefix
Appears on:
- <class-change> (required)

This identifies a family of class names of the form prefix-over-integer and prefix-under-integer which should be managed on an attribute of an element in the served page as its display level is changed.

Attribute `priority`

Value: decimal [0.0, 1.0]
Default: decimal 0.5
Appears on:
- <item>

This attribute allows the author to set the relative priorities or levels of importance of nodes across the site. Implementations may represent this differences by, for example, emboldening names of more important nodes, or changing font size as appropriate.

Attribute `refer`

Value: space-separated list of user, root, map, page, parent, none
Default: user
Appears on:
- <item>
- <variant>

This attribute specifies the ‘referrer’ quality of the node variant it applies to. The following values are permitted:

root: The referrer shall be the URI of the root file.
map: The referrer shall be the URI of the sitemap file containing the <item> element that represents this node.
page: The referrer shall be the URI of the current page.
parent: The referrer shall be the URI of the page identified by this node’s parent. Implementations are not required to support this, as it might make the UA appear to be a referrer spammer; such implementations should interpret this as user instead. Furthermore, this value only has meaning if a node is accessed as part of the tree of nodes presented to the user. If instead it is an extract role, the implementation should treat it as user.
none: There shall be no referrer URI.
user: The UA shall determine the referrer URI.

If an implementation does not recognize this attribute, it should behave as if user was specified. When multiple values are specified, the implementation should behave according to the first it can honour.

Attribute `relation`

Value: none, sequence
Default: none
Appears on:
- <group>
- <item>

This attribute specifies whether the children of the node it specifies form some sort of sequence, and should be navigable as such. For example, an implementation may provide ‘previous’ and ‘next’ buttons to traverse them rapidly.

none: The child nodes do not form a navigable sequence.
sequence: The child nodes form a navigable sequence.

Attribute `role`

Value: spaced-separated token list
Default: tree
Appears on:
- <item>

This specifies a set of potential roles for the node represented by the containg element.

For example:

<item name="Contact"
       url="contact.html"
      role="contact contentinfo" />

Attribute `tree`

Value: include, exclude, auto, user
Default: auto
Appears on:
- <item>

This attribute specifies an element’s participation in the sitemap’s node hierarchy.

include: If the <item> is reached from the root file, it will represent a node in the hierarchy.
exclude: The <item> will not represent a node in the hierarchy.
auto: If the <item> is reached from the root file, and has a role set including tree or an unknown role, it will represent a node in the hierarchy.
user: If the <item> is reached from the root file, and has a role set including tree or an unknown role, or the user prefers it, it will represent a node in the hierarchy.

exclude allows an author to prevent a node from appearing in the hierarchy, while allowing it to fulfil a role (e.g. search). For example, suppose you have separate pages for showing a blank search form (search.html) and displaying results (search-results.cgi). The results page should never be accessed without a query string, so it must never appear in the node hierarchy:

<!-- The search form is just a normal page. -->
<item name="Search this site"
       url="search.html" />

<!-- The search results without a query are
     always excluded from the hierarchy. -->
<item name="Search this site"
       url="search-results.cgi"
      role="search"
      data="q=%s"
      tree="exclude" />

Using the default setting auto would not guarantee that the <item> would be excluded.

include allows an author to override the automatic removal of nodes from the hierarchy by virtue of them also fulfilling roles. user allows the author to defer that overriding according to the user’s preference.

Attribute `type`

Value: content type
Variant quality: content type
Appears on:
- <item>
- <variant>

This attribute specifies the ‘content type’ quality of the node variant it applies to.

<!-- The search form is just a normal page. -->
<item name="Specification">
  <variant type="text/html" url="spec.html" />
  <variant type="application/pdf" url="spec.pdf" />
</item>

Attribute `url`

Value: URI reference
Variant quality: location
Appears on:
- <item>
- <variant>

This attribute specifies the ‘location’ quality of the node variant it applies to. As a URI reference, it is resolved against the base URI of its element.

Attribute `xml:base`

Value: URI reference
Appears on: any element

This attribute overrides the base URI of an element. The base URI of the root element of a document is the document URI, and the base URI of any other element is the base URI of its parent. However, if an element specifies xml:base, its value is first resolved against the element’s base as it would be if xml:base were not specified, and that resolved value becomes the element’s base URI (and, therefore, the base URI of descendant elements that don’t specify xml:base).

This is in accordance with XML Base.

Authors should be aware that xml:base influences the url attribute of <external> elements.

Attribute `xml:id`

Value: Element ID
Appears on: any element

This attribute gives its element an identity unique within the document, in accordance with xml:id Version 1.0. In a sitemap file, this primarily exists to support the <external> element. If an element is given an identifier as follows:

<item xml:id="fred" ... />

…then it can be referenced from within the same document like this:

<external url="#fred" />

It can also be referenced from another document:

<external url="freds-document.xml#fred" />

Element index

<class-change>
<external>
<item>
<group>
<sitemap>
<variant>

Attribute index

attr
charset
data
description
elem
lang
method
name
order
prefix
priority
refer
relation
role
tree
type
url
xml:base
xml:id

Preamble

Semantics

Nodes and their qualities

Reachability and exclusion of elements

Node variants and their qualities

Roles

Site-limited search

Data types

Language code

Character encoding

Content type

URI reference

Element ID

Elements

Element <sitemap>

Element <item>

Element <group>

Element <external>

Element <variant>

Element <class-change>

Attributes

Attribute attr

Attribute charset

Attribute data

Attribute description

Attribute elem

Attribute lang

Attribute method

Attribute name

Attribute order

Attribute prefix

Attribute priority

Attribute refer

Attribute relation

Attribute role

Attribute tree

Attribute type

Attribute url

Attribute xml:base

Attribute xml:id