This document specifies the format of an XML-based sitemap file, as used by the Standard-Sitemap Protocol (SSP). The level of detail offered is aimed at developers of sitemap-aware software, so that implementations can produce consistent behaviour. Consequently, authors experimenting with sitemaps can understand why their sitemap file did or did not produce the desired effects, and can be sure that it will be interpreted correctly in all contexts.
Some (partial) example sitemaps are available.
An SSP sitemap expresses several aspects of a site:
Nodes in the hierarchy are referred to as hierarchy nodes. Hierarchy nodes may be either the root node, item nodes or group nodes. Nodes with special meanings are known as role nodes. A node may both have a role and exist in the hierarchy.
A sitemap is represented by one or more SSP sitemap XML files. A URI identifies one of them, the root file, and this may reference other files to complete the representation of the sitemap.
As part of the hierarchy, each node may contain zero or more other nodes, and each node has at least one parent (except for the anonymous root node, which has no parent).
A node has zero or more roles, a relation and a priority. A node has one or more variants.
In the node hierarchy, the root node is represented by a
<sitemap> element, item nodes are represented by
<item> elements, and group nodes are represented by
Role nodes are similarly represented by
Note that the elements that represent hierarchy nodes themselves form a hierarchy due to the nature of XML, and this hierarchy is not necessarily congruent with the node hierarchy.
With respect to a given root file, elements may be reached. Some elements may be excluded. Only elements that are reached may represent role nodes, and only elements that are reached and not excluded may represent hierarchy nodes.
If the root file’s document element is a
<sitemap> element, it is reached, and represents the root node.
<group> element is reached and not excluded, its
<group> children are also reached.
Hierarchy nodes represented by these reached elements can be children of the node represented by their parent element.
<group> element is reached and not excluded, and contains
<external> elements that reference
<group> elements, the referenced elements are also reached.
Hierarchy nodes represented by these reached elements can be children of the node represented by the parent of the referencing
Despite being reached, an element may be excluded from the node hierarchy—i.e. it represents no node in the hierarchy—under any of the following conditions:
The element has a
tree attribute with the value exclude.
The element has a
tree value of user, and its role set includes neither tree nor any unrecognised role, and the user has specified that such elements should be excluded (usually by a configuration option).
The element is an
<item>, has no non-excluded
<group> children, has no
<external> children referencing non-excluded
<group> elements, and whose represented variants includes at least one with no ‘location’ quality.
Note that it is possible to compute an element’s exclusion and set of represented variants without knowing whether it has been reached.
Although such excluded elements have been reached, and indeed can represent role nodes, they do not allow their own descendant elements to be reached. (This does not prevent those descendants from being reached by other means.)
Each node variant can have some of the following qualities:
The node element may specify qualities through the XML attributes listed above.
<variant> inherits the qualities of its parent, and may introduce qualities through its own attributes.
Finally, each leaf of the local hierarchy specifies a single variant of its corresponding node, with the qualities accumulated by its ancestry.
<item name="FAQ" url="faq.html"> <variant description="Frequently asked questions" lang="en" /> <variant description="Oftaj demandoj" lang="eo" /> </item>
In the example above, an
<item> defines the ‘name’ and ‘location’ qualities common to its two variants. Their ‘description’ qualities, however, are language-dependent.
All variants of an item node must have the ‘name’ quality, or the node and its children need not appear in the node hierarchy. All variants of a childless item node must have the ‘location’ quality, or the node need not appear in the node hierarchy.
All variants of an item node with search in its role set must have the ‘search template’ quality, or the node cannot fulfil the search role.
A group node cannot have a ‘location’ quality.
An item node may have several roles, as determined by its representing element’s
role attribute, and partly by that element’s position in the root sitemap file.
The following roles are defined:
Implementations are free to define other roles, but should do so by agreement with future versions of this specification, or by placing those roles in a private namespace (a mechanism for which is yet to be defined).
In its sidebar, the Firefox extension uses the search role to configure the search field, but ignores searchpage. Meanwhile, its pop-up menu for the customizable toolbars ignores search, but presents searchpage under its Search item.
role attribute specifies a space-separated list of role names. This is an initial set of roles that any node represented by this element can fulfil. The default list is tree, so setting the attribute to another single value implies that the represented node should not appear in the tree, i.e. it is excluded.
<item> in the
<sitemap> element of the root file additionally takes on the role home, if no other
<item> is reached with that explicit role. Note that this does not exclude that element from the node hierarchy, as exclusion is defined in terms of the actual value of the
role attribute, not the set of roles that a node ultimately fulfils.
The search role makes use of a node’s ‘search method’ and ‘search template’ qualities. An implementation may use it to provide the user with a ‘site-limited search’. After accepting a search term, the user agent may visit a URI formed from the node’s ‘location’, using the template resolved against the search term and the root file’s URI as defined under the
If the ‘search method’ is get, a query ? and the resolved template are resolved against the location, and the user agent visits that address with an HTTP GET request. Otherwise, the ‘search method’ is post, and the resolved template is POSTed to the node’s location as application/x-www-form-urlencoded. The ‘search method’ is specified by the
<item role="search" name="Search" description="Search" url="/cgi-bin/search" xml:base="http://www.example.foo/juice/" data="q=%s" />
This item is only activated by filling in the search field, and does not appear in the source tree (no tree role). A search query of foo invokes a GET http://www.example.foo/cgi-bin/search?q=foo.
<item role="search searchpage" method="post" name="Search" description="Search" url="/cgi-bin/search" xml:base="http://www.example.foo/juice/" data="q=%s" />
In the variation above, a POST http://www.example.foo/cgi-bin/search is issued, with q=foo as the content. Also, the address http://www.example.foo/cgi-bin/search may be listed as the search page.
<item role="search tree" name="Search" description="Search" url="search" xml:base="http://www.example.foo/juice/" data="q=%s" />
Finally, in this variation, GET http://www.example.foo/juice/search?q=foo is issued when the term foo is sought, and the item search also appears in the tree.
A language code is a case-insensitive string identifying a natural language, possibly a specific regional dialect. For example:
SSP language codes follow the same format as HTML language codes. The first component is an ISO 639:1988 two-letter code. The second, if present, is an ISO 3166:1993 country code.
A character encoding (or ‘charset’, informally) specifies translation between octets and characters. For example:
These names are registered under IANA character sets.
A content type specifies the nature of a resource. For example:
These names are registered under IANA Media Types.
This is a string as defined by RFC3986: Uniform Resource Identifier (URI): Generic Syntax
This is an element identifier as defined in xml:id Version 1.0. It is a case-insensitive string consisting of letters, digits, underscores, dashes, and dots.
<sitemap> element is the root of an SSP sitemap document.
<group> element expresses a node of lesser prominance in the hierarchy. It need not have a name, and has no location. Its children may be rendered as if they were children of the
<group>’s parent, and cannot be folded away separately from that parent’s other children. Many of its attributes set the default qualities for its node’s variants, if it has any
<variant> children, or set the qualities of the node’s sole variant.
<external> element identifies an
<group> element in the same or another document. The value of the
url attribute is resolved against the element’s base, and identifies the
<group> by its
The node represented by the referenced
<group>, including its variants and their qualities, its child nodes, and its role, relation and priority, becomes a child of the node represented by the element containing the referencing
<variant> element allows variants of a node to be specified. Each
<group> ancestor represents that node as a whole, and is the
<variant>’s node element. Each
<variant> may therefore appear in
<group>, or other
<variant> that has no child elements specifies a variant of its node. Attributes of such a
<variant> specify the variant’s qualities. For attributes that are not set on the
<variant> itself, the qualities are derived from the corresponding attributes of the nearest ancestors that set them (i.e. they are inherited), with the following restrictions. These attributes may be inherited from any ancestor:
Other attributes may only be inherited from the node element or its children.
<class-change> element specifies how a document served by the sitemap should be modified to indicate that it is being so served. The XPath expression specified by
elem identifies an element in the served document to be modified.
attr identifies an attribute on that element to be modified. The attribute
prefix identifies the prefix of a family of class names to be updated and maintained in the attribute value, according to a display levels in the range [0,100].
Whenever the display level is set to N, the attribute is modified so that its set of classes of the form prefix-over-integer and prefix-under-integer consists of exactly 100 items:
Authors are expected to use these changes to dynamically alter the styling of their site in the distinct cases of being visited by a sitemap-aware user agent and a sitemap-unaware user agent.
This attribute specifies the ‘character encoding’ quality of the node variant it applies to.
<item>(required on every variant when
This attribute specifies the ‘search template’ quality of the node variant it applies to. This string specifies the template for the query data used in a site-limited search. Various % expressions are replaced by strings according to the following table lists, showing the result of applying them to an example sitemap address of:
…and an example search query fish.
||Website home (./ resolved against sitemap address)||http://www.example.com/a/b/c/|
||Parent (../ resolved against sitemap address)||http://www.example.com/a/b/|
||Parent (../../ resolved against sitemap address)||http://www.example.com/a/|
||Root (/ resolved against sitemap address)||http://www.example.com/|
||Host of sitemap address||www.example.com|
All expanded values are escaped as if they are URI query values.
This attribute specifies the ‘description’ quality of the node variant it applies to. This should be a one- or two-line description, and will likely appear as a toolip of a menu item or navigation tree.
This identifies an element whose attribute should be managed as the served page’s display level is changed. Only the first element that matches the expression is modified.
The XPath expression may be written in terms of any namespace prefixes in effect on the
This attribute specifies the ‘language’ quality of the node variant it applies to.
This attribute specifies the ‘search method’ quality of the node variant it applies to. It specifies the HTTP request method, GET or POST, to be used when performing a site-limited search.
This attribute specifies the ‘name’ quality of the node variant it applies to. This should be a relatively short name, as it will likely appear as the text of a menu item or navigation tree.
This attribute allows an implementation to make certain assumptions about the ordering-by-name of child nodes of the node represented by this element.
If an ordering exists, and there are many nodes for the implementation to render, it may automatically group and subgroup them, and use their names to work out the names of the synthetic groups. This allows the implementation to choose the size of such groups according to local requirements, e.g. the number that fit comfortably into a screen.
For example, a large number of programming symbols could be divided into several synthetic groups whose names are formed by taking the name of the first node in the group, appending an ellipsis, and then appending the name of the last node in the group. This node:
<group name="Defined symbols" order="lexical"> <item name="abort" .../> <item name="abs" .../> <item name="acos" .../> <!-- 900 or so other items, in alphabetic order --> <item name="wscanf" .../> <item name="xor" .../> <item name="xor_eq" .../> </group>
…could be divided like this:
This identifies a family of class names of the form prefix-over-integer and prefix-under-integer which should be managed on an attribute of an element in the served page as its display level is changed.
This attribute allows the author to set the relative priorities or levels of importance of nodes across the site. Implementations may represent this differences by, for example, emboldening names of more important nodes, or changing font size as appropriate.
This attribute specifies the ‘referrer’ quality of the node variant it applies to. The following values are permitted:
<item>element that represents this node.
If an implementation does not recognize this attribute, it should behave as if user was specified. When multiple values are specified, the implementation should behave according to the first it can honour.
This attribute specifies whether the children of the node it specifies form some sort of sequence, and should be navigable as such. For example, an implementation may provide ‘previous’ and ‘next’ buttons to traverse them rapidly.
This specifies a set of potential roles for the node represented by the containg element.
<item name="Contact" url="contact.html" role="contact contentinfo" />
This attribute specifies an element’s participation in the sitemap’s node hierarchy.
<item>is reached from the root file, it will represent a node in the hierarchy.
<item>will not represent a node in the hierarchy.
<item>is reached from the root file, and has a role set including tree or an unknown role, it will represent a node in the hierarchy.
<item>is reached from the root file, and has a role set including tree or an unknown role, or the user prefers it, it will represent a node in the hierarchy.
exclude allows an author to prevent a node from appearing in the hierarchy, while allowing it to fulfil a role (e.g. search). For example, suppose you have separate pages for showing a blank search form (search.html) and displaying results (search-results.cgi). The results page should never be accessed without a query string, so it must never appear in the node hierarchy:
<!-- The search form is just a normal page. --> <item name="Search this site" url="search.html" /> <!-- The search results without a query are always excluded from the hierarchy. --> <item name="Search this site" url="search-results.cgi" role="search" data="q=%s" tree="exclude" />
Using the default setting auto would not guarantee that the
<item> would be excluded.
include allows an author to override the automatic removal of nodes from the hierarchy by virtue of them also fulfilling roles. user allows the author to defer that overriding according to the user’s preference.
This attribute specifies the ‘content type’ quality of the node variant it applies to.
<!-- The search form is just a normal page. --> <item name="Specification"> <variant type="text/html" url="spec.html" /> <variant type="application/pdf" url="spec.pdf" /> </item>
This attribute specifies the ‘location’ quality of the node variant it applies to. As a URI reference, it is resolved against the base URI of its element.
This attribute overrides the base URI of an element. The base URI of the root element of a document is the document URI, and the base URI of any other element is the base URI of its parent. However, if an element specifies
xml:base, its value is first resolved against the element’s base as it would be if
xml:base were not specified, and that resolved value becomes the element’s base URI (and, therefore, the base URI of descendant elements that don’t specify
This is in accordance with XML Base.
This attribute gives its element an identity unique within the document, in accordance with xml:id Version 1.0. In a sitemap file, this primarily exists to support the
<external> element. If an element is given an identifier as follows:
<item xml:id="fred" ... />
…then it can be referenced from within the same document like this:
<external url="#fred" />
It can also be referenced from another document:
<external url="freds-document.xml#fred" />