For the sitemap to be usable by your visitors, you have to do two things:
You must place your sitemap in the site’s webspace, and configure the server to send it correctly.
Serve your sitemap as application/xml or text/xml. In the Apache HTTPd webserver, if your sitemap has the suffix .xml, you need:
AddType application/xml .xml
Sitemaps could contain a fair amount of metadata about every page in the site. But, as XML, they will also be quite repetitive (element/attribute names, whitespace), and should compress well. Feel free to use HTTP compression if your sitemap turns out big. Note, however, that compression is not required to serve a sitemap file.
Most modern browsers can receive compressed files over HTTP, and automatically decompress them when they arrive. HTTP provides two mechanisms for this:
The possibilties for using content encoding are:
It’s possible to get a server to compress files automatically. Apache does this through a 3rd-party module mod_gzip for gzip, and through a standard module mod_deflate for deflate. IIS has a built-in feature.
You may prefer to compress the files yourself, and store them on the server in compressed form, saving the server from the overhead of compression and the extra storage to cache the compressed version. You just need to tell the server that they should be served with an HTTP header field indicating that they are encoding for the purpose of end-to-end compression. In Apache:
AddEncoding gzip .gz
(This assumes that your pre-compressed files are identified by a conventional .gz suffix, but any appropriate suffix will do.)
Mozilla (Firefox), Opera and Safari will handle this. IE will also if it’s using HTTP/1.1 — check its settings both for direct and proxy connections. GoogleBot will also cope with it, as can careful Java applications.
If you’re doing precompression, but want greater compatibility with older browsers, and if you can have the server upgraded, the 3rd-party mod_gunzip module claims to be able to decompress files on-the-fly, if it detects that the client won’t be able to cope with it. This ought to be better than automatic compression, as decompression should have a smaller overhead than compression.
You need to hook the module into the processing of your pre-compressed files:
AddHandler send-gunzipped .gz
(Note that this is not particularly relevant for sitemaps at the moment, as this is meant to support IE, but we don’t have anything for IE to process sitemaps anyway.)
For an Apache2 solution, have a look at Trying to emulate mod_gunzip with Apache 2 Filters, especially among the comments.
While the sitemap format already has multilingual features, an alternative feature - provided by HTTP - can be exploited instead, namely “Content Negotiation”.
In this scheme, you write several versions of each page in your site in different languages, e.g. index.en.xml for English, index.de.html for German. You then tell your server what languages these pages are in:
# Apache HTTPd .htaccess AddLanguage en .en AddLanguage de .de
…and that they should be automatically selected according to the visitor's preference:
Options +MultiViews
Now, if the URL index.html is accessed, the browser, the server, and the intervening proxies will co-operate to select a version (index.en.xml or index.de.html) most suitable for the visitor (according to how he has configured his browser’s language preferences).
This mechanism also works fine for sitemaps, so you can write sitemap.en.xml for the English sitemap, and sitemap.de.xml for the German one.
When someone visits your site, his browser normally just fetches a page from it and displays it. For him to see your sitemap, you must also tell the browser where to fetch it from, as it is fetching the page. There are two ways to do this:
<head>
section of each page. This can be tedious if you edit your pages manually, but should be relatively easy for sites built from some sort of template.To link to the sitemap via HTML, you should add a <meta>
tag and a <link>
tag:
<html> <head> <meta name="schema.stdmap" content="http://standard-sitemap.org/2007/ns"> <link rel="stdmap.location" href="/sitemap.xml"> </head> <body> ... </body> </html>
The <meta>
tag binds the prefix stdmap to our namespace http://standard-sitemap.org/2007/ns. The <link>
tag states the location of the sitemap, and can be relative to the HTML document.
You don’t have to use the string stdmap as the prefix; make it anything you choose that doesn’t clash with any other prefix defined in the same page. Just make sure that the same string appears in both the <meta>
and the <link>
.
This mechanism is in accordance with A Proposed Convention for Embedding Metadata in HTML.
Instead of the <meta>
tag, you can use an equivalent <link>
tag:
<link rel="schema.stdmap" href="http://standard-sitemap.org/2007/ns"> <link rel="stdmap.location" href="/sitemap.xml">
However, the <meta>
tag is slightly more preferred, as its content is to be interpreted as an opaque identifier, rather than as a (potentially relative) URL.
If your server supports it, you can make it send out a couple of HTTP header fields with every document to tell browsers where the sitemap is. The fields should look like this:
Opt: "http://standard-sitemap.org/2007/ns"; ns=15 15-Location: /sitemap.xml
This works in accordance with RFC2447: An HTTP Extension Framework. The Opt field binds any 2-digit prefix you want (15 in this example) to the sitemap namespace identifier http://standard-sitemap.org/2007/ns. The other header field gives the relative URI of the sitemap file.
In Apache, if the right modules are enabled, you should use a configuration directive such as:
LoadModule headers_module modules/mod_headers.so Header set Opt "\"http://standard-sitemap.org/2007/ns\"; ns=15" Header set 15-Location "/sitemap.xml"
You can also set the header field from PHP, though it will only have an effect on PHP-processed pages, of course:
<?php header('Opt: "http://standard-sitemap.org/2007/ns"; ns=15'); header('15-Location: standard-sitemap.xml'); ?>