At Swiftype we’re always working on new ways to improve the quality of the crawl of your website, and today we’re announcing Swiftype crawler support for the Sitemap.xml protocol.
The Sitemap.xml protocol is a well-documented and widely implemented standard for specifying exactly which set of URLs you would like web crawlers to index on your website, and if your website supplies a sitemap.xml file to our crawler we will dutifully follow your specifications as our crawler builds a search index for your website.
To get started, create a simple sitemap.xml file. An example sitemap.xml that specifies 3 URLs might look as follows:
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://www.yourdomain.com/</loc> </url> <url> <loc>http://www.yourdomain.com/faq/</loc> </url> <url> <loc>http://www.yourdomain.com/about/</loc> </url> </urlset>
Next, you’ll put the sitemap.xml file on your web server at a location that is accessible by our crawler. Many sites place the sitemap at the root of the domain (i.e. http://www.yourdomain.com/sitemap.xml), but any location is fine. Whatever location you choose, you should specify the location in your Robots.txt file as follows:
User-agent: * Sitemap: http://www.yourdomain.com/sitemap.xml
If you’re unfamiliar with the Robots.txt file, you can find more information at the official Web Robots page.
Once your robots.txt file is updated and your sitemap.xml file has been uploaded you’re finished. The next time the Swiftype crawler visits your website we’ll recognize your sitemap.xml file follow the links you specify.
As always, if you’re having trouble or want more information, feel free to get in touch. Also, don’t forget to follow the blog so you don’t miss out on great content from our friends like Bob Hiler from Mixergy.