The Swiftype Blog / To Crawl or Not to Crawl: How to Index Data for Site Search

To Crawl or Not to Crawl: How to Index Data for Site Search

crawler-vs-api

If you’re considering a new site search solution like Swiftype, you’re probably already
aware of the benefits of upgrading your website’s search experience—things like
greater control over search results, a better user experience, and the ability to gather
analytics from user searches. You also know that taking your site search to the next level
will increase conversions and positively impact your company’s bottom line.

But before you can start enjoying the benefits of enhanced site search, there’s one
important decision to make: how to index the content on your site. Indexing lays the
foundation for your search engine by taking inventory of all your site data, then
organizing it in a structured format that makes it easy for the search algorithm to find
what it needs later on. Essentially, if your website is a stack of thousands of papers, the
search index is the mother of all filing cabinets.

There are a few different ways to go about indexing site content, but the two main
options are using a web crawler or a search API. Both choices have pros and cons, so it’s
helpful to understand which one is the best fit for your situation. Here’s the lowdown on
each.

Web Crawler

You may be familiar with Google’s web crawler, Googlebot, which perpetually “crawls” the internet, visiting each available web page and indexing content for potential Google searches. Swiftype’s crawler, Swiftbot, does the same thing for individual websites like yours.

Using a web crawler to index site data has a couple of key advantages. For one thing,
it’s extremely plug-and- play. Rather than pay a team of developers to build the index,
simply select the crawler option and let it do its thing—no coding required.

A crawler also allows you to get your new site search up and running very quickly. For example, Swiftbot creates a search index within minutes by simply crawling your website URL or sitemap. And it stays on top of changes to your site, immediately indexing any new information so that search results always reflect the latest and greatest your business has to offer.

In our experience, the web crawler option works best for the vast majority of our customers. It’s fast and easy to use, yet also creates a powerful, comprehensive search experience that’s a huge improvement over a fragile plugin or other antiquated site search solution. However, there are some situations where the customer needs a greater amount of customization, and in those cases, an API integration might be the way to go.

Developer API

The main advantage of using an API for search indexing is that it gives you full programmatic control over the content in your search engine. There are infinite ways to build a search experience, and an API (like the Swiftype Search API) lets you choose your own adventure and make changes as often as you like.

For example, if you want to index sensitive data that cannot be exposed on your website such as product margins or page views for a particular article, you may want a more custom indexing setup than the one that comes with the web crawler. The developer API allows you granular, real time control over every aspect of your search engine.

Unlike the web crawler option, using an API usually requires a fair amount of coding, so
we usually see this option used by large businesses with bigger budgets and/or a developer team on staff. Also, since an API integration is custom, the initial indexing process can take time to set-up, so it’s less attractive to customers who are anxious to get started.

Which one is best?

The choice between the web crawler and the developer API will come down to your specific situation. Most Swiftype customers are extremely happy with the crawler, but some do require the flexibility and control inherent in the API. We offer both options so that you can choose the best one for your site and business.

No matter which option you choose for indexing data, the ultimate outcome will be an enhanced site search experience that’s more relevant—and more profitable—than your current solution.