An implementation can process or render a sitemap in any way necessary to meet the goals of that implementation. However, certain rules must be followed to properly locate and obtain sitemaps.

Namespaces

This specification uses a namespace in several places to avoid clashes with identical names in other projects. This namespace implicitly applies to all specified XML element names, unless otherwise specified. The canonical namespace for this project is identified by the following URI:

http://standard-sitemap.org/2007/ns

Sitemap discovery

A page may be served by a sitemap described by a root sitemap document and several ancillary documents which it references. The following rules specify how an implementation must find the root document of the sitemap that serves it, or that no such sitemap exists.

An implementation must locate root sitemap documents only where specified by the pages that are served by them. It must check the following resources in this order:

  1. If the page is delivered by HTTP, check the HTTP response header for a field called NN-Location (case-insensitive), where NN is a two-digit namespace prefix defined by a field Opt: "http://standard-sitemap.org/2007/ns"; ns=NN. Treat the value as the URI of a sitemap that serves the delivered page. Resolve it against the URI of the page to yield an absolute URI.

    This mechanism is in accordance with the experimental RFC2774: An HTTP Extension Framework.

  2. If the page is delivered by HTTP, check the HTTP response header for a field called X-Standard-Sitemap-Location (case-insensitive). Treat the value as the URI of a sitemap that serves the delivered page. Resolve it against the URI of the page to yield an absolute URI. This exists to support legacy sitemaps, and its requirement will be removed at some future point.

  3. If the page’s content type is text/html or application/xhtml+xml, look for a <link> element (in the namespace http://www.w3.org/1999/xhtml) with a rel="prefix.location" attribute, for any prefix defined by another <link> element with a rel="schema.prefix" attribute and href="http://standard-sitemap.org/2007/ns". Resolve the URI against the base URI in effect on the enclosing element.

    This mechanism is in accordance with A Proposed Convention for Embedding Metadata in HTML.

  4. If the page’s content type is text/html or application/xhtml+xml, look for a <link> element with a rel="standard-sitemap" attribute. Resolve the URI against the base URI in effect on the enclosing element. This exists to support legacy sitemaps, and its requirement will be removed at some future point.

  5. The implementation may locate the root document using site-specific user preferences.

Having found a root sitemap URI using one of the above methods, an implementation must not seek another one for the same page (unless it is reloaded), even if no meaningful sitemap document can be found at the location.

An implementation must not attempt to guess the location of a sitemap from a URI. For example, given a page at:

http://www.example.com/foo/bar

…the implementation must not suppose that a sitemap exists for it at:

http://www.example.com/sitemap.xml

…nor at:

http://www.example.com/foo/sitemap.xml

An implementation that fetches URIs such as these, when they have not been specified by the proper HTTP headers or <link> element, will cause confusion for the administrators of sites that the user visits, as it generates failure entries in the access logs for undefined documents. See Amazon A9's siteinfo.xml: almost a repeat of favicon.ico for reasons why this might be undesirable. It will also waste bandwidth and CPU power, especially important for mobile devices.

Fetching sitemaps

A sitemap should be fetched under the same conditions which the served page was fetched under, with regard to content negotiation, compression capabilities, user-agent string, and other user-agent settings. However, this recommendation does not apply to content negotiation on MIME type, i.e. an implementation may express that it accepts only text/xml and application/xml documents as responses to the sitemap request.

In other words, from the point of view of servers and proxies involved in delivering the sitemap, the request should come from the same kind of user agent, with the same capabilities. The relationship between the fetching of a sitemap and the page that instigated it should be similar to that between the fetching of an image and the page that embeds it.

Parsing sitemaps

Sitemaps are XML, i.e. the MIME type is either text/xml or application/xml. An implementation is not required to regard documents of other types as sitemaps.

An implementation must follow the normal XML conventions for determining the character encoding of the document. For example, the HTTP header Content-Type overrides any setting in the document, e.g. by <?xml encoding="UTF-8" ?>.

An implementation must process a root sitemap document if its root element is <sitemap>.

In the course of finding roles, an implementation must process every <item> and <group> if it can be reached.

An implementation must not present roles derived from unreached <item>s.

An implementation must resolve relative URIs in accordance with any applicable xml:base attributes.

The interpretation of sitemap elements is largely implementation-defined, but should take into account the meanings given by the sitemap file format and (less formally) by the authoring advice, in order to accurately present the information that the author intends to convey.

Sitemaps with loops

An implementation is not required to fully render portions of a sitemap that form infinite loops. Where a loop is found, i.e. when an <item> appears as a descendant of itself, it must appear again as its own descendant, but its descendants in its lower incarnation need not appear.

Adjusting styles for display levels

An implementation should adjust the content of a page served by the sitemap it is displaying by applying all the instructions given by processed <class-change> elements.

An implementation should allow the user to adjust the precise display levels it offers when revealing or hiding the sitemap information.

Tolerance of obsolete features

During the initial development of this project, experimental and alternative features were considered, and some test sites might contain these. Although authors will be discouraged from using them in new sitemaps, an implementation should recognise the following: