The Swiftype Blog / Month: September 2015

Teaching Swiftbot to Intelligently Index Images

When creating search engines, the first and arguably most important step is indexing website information in a structured format that is optimized for a specific search algorithm. The specific information you index and the structure by which you organize this information (also known as the schema) dictates how your search engine will determine relevance, what your users can search by, and what information you can display in search results.

How does indexing work?
While there are numerous ways to customize and control the information you index in your Swiftype search engine (for example, via our API or one of our platform integrations) we aim to make this process as simple as possible for non-technical users by automatically indexing website information with Swiftbot—our high performance web crawler designed to index information from a specific URL.

Swiftbot allows non-technical users to get up and running with a working search engine in minutes by simply entering their website URL and letting Swiftbot index their website for them. A major component of Swiftbot’s technology is the logic that our engineering team has built in to parse website HTML and index it in a structured format that works with Swiftype’s advanced search algorithm and information retrieval method. (To learn more about the technical challenge of building a search engine, read our white paper on the subject, written for a non-technical audience).

Building an intelligent web crawler
Because almost every website is built and structured in a different way, teaching Swiftbot how to effectively read, sort, and organize information from a website’s HTML base is an ongoing challenge. While we do allow site owners to completely customize the default information Swiftbot indexes from your website with custom <meta> tags, not all users have the technical resources or knowledge to do this on their own, so Swiftbot is also built to make many of these indexing decisions on its own.

HTML windows

With every website structured differently, how do we teach Swiftbot to intelligently index this information?

Still, with websites differing so dramatically from one another, indexing the right information in the right format from each page is no easy task. In particular, identifying the most important image from a web page and associating that image with a search result is a multifaceted problem, since there are many images on every page and these images often have different filename structures and/or occupy different locations on a page.

images in search and autocomplete

Adding images to search results pages and autocomplete menus can create a much more engaging search experience.

Nevertheless, indexing images allows site owners to create much more engaging search experience, adding thumbnails of varying sizes to their autocomplete and search results that let users see a preview of the page content before selecting a result. So, in a recent update to Swiftbot, we’ve built in conditional logic that automatically indexes images from your website pages (provided there are no Swiftype specific image tags already in place).

How does Swiftbot decide which image is “best”?
To teach Swiftbot how to index the “best” image from web pages, we had to build in logic that would overcome a series of challenges that result from the varying nature of website pages.

  1. As a starting point, we decided to leverage existing open graph <meta> tags (such as Facebook and Twitter <meta> tags) that many site owners use to prepare their content for sharing on social media platforms and other content distribution networks. By teaching Swiftbot to obey these <meta> tags if no Swiftype specific <meta> tags exist, we created hierarchical indexing logic that more intelligently sources images from existing website metadata.
  2. Secondly, we know that many websites have a large number of images that repeat across many, if not every page on their website (for example: a company logo, images in the header, footer, and sidebar, author headshots, ads, etc.). To ensure these images are not considered the “best” image for a specific document, we built in logic that identifies and rules out these repeating elements as candidates. Similarly, we do not want to index advertisements, so we run any images on the page against an ad server blacklist to ensure these remain out of consideration.
  3. Thirdly, we compared data in the alt attribute of each <img> with the url and <title> of that page, assigning a relevance score to those images based on how closely the alt description matched this page information.
  4. Lastly, Swiftbot looks for common CSS classes and id’s to locate the main content area of each page—another step that helps rule out extraneous information such as the header, footer, and sidebar.

Taking all these pieces of information together, Swiftbot assigns the images on the page a relevance score and indexes the image it judges to be the “best” image for that document. As this new indexing process gains wider use and we gather feedback from customers, we will continually work to improve our image extraction technology over time.

Adding these images to search
Once these images are indexed from your website and in your search engine, the question becomes: how do I display these image thumbnails in my search results and autocomplete dropdown? While there are many ways to style your autocomplete and search results (including using Swiftype’s web components or jQuery library) the best choice for users with very little technical experience is the Result Designer, which allows users to style their search results entirely from the Swiftype dashboard without writing any additional code. To learn more about the Result Designer, watch our dedicated webinar explaining this tool and offering best practices advice from the Swiftype customer success team.

11 Ideas to Pin at the Top of Search Results

Result ranking allows you to drag and drop to rearrange results for a specific search term.

One of the coolest features that Swiftype’s site search software offers is the ability to drag and drop to rearrnage results that users see for any search query. Using the Result Ranking tool, the Marketing Team has been having a lot of fun coming up with the different ways to have this feature help us generate more leads and close more business. So, we decided that we would share our top 11 most useful use cases and how they could be useful for our customers.

  1. White paper – If you’re a publisher who offers guarantees for lead gen packages or a demand generation team at a corporation, consider pinning your white papers and ebooks at the top of relevant search queries.
  2. Webinar – Making sure that upcoming and on-demand webinars are at the top of key search results will significantly increase the chances of increasing registrants and upping the percentage that attend.
  3. Video – searches with thumbnails get strong engagement from users. Pinning video testimonials or demos can help your prospect move down the marketing funnel at a much faster velocity.
  4. Unused inventory (so that you can get rid of it) – E-commerce companies always struggle to find ways to get rid of last years collection. Need a new idea? Just pin those SKUs to the top of some converting search queries and watch your inventory fly off the shelves.
  5. Top selling product(s) – Already have a product that’s selling like hot cakes? Then leverage your site search analytics to find other opportunities to sell that product.
  6. Viral article – Similar approach to top selling products. If you know that an article is going viral, then increase the number of search queries that that article should be at the very top of to generate even more engagement.
  7. The day’s top story – For publishers, the day’s top story can sometimes be buried in search results. Make sure that your top search queries show the newest and most relevant top stories.
  8. FAQ/Support pages – If you are seeing that a piece of support or knowledge base content is helping lower call volumes, then find other queries that this content can help support.
  9. Highest priority job posting – Recruiters should take advantage of site search analytics to see what kinds of jobs prospective candidates are looking for. These insights will help you pin your highest priority jobs to appropriate searches to help you generate more applications to that job.
  10. Most recent op-ed – Have an editorial that delivers your company’s new fresh message, make sure to pin it to the top of relevant search queries for highest visibility.
  11. Sponsored content – If you are a publisher who offers your advertisers sponsored content, you can work with your advertiser to make sure that their content is pinned to the top of the search results that they’re trying to target. This is a great money making opportunity and easy way to build deeper trust with your advertisers.

Have any pinning use cases that we haven’t already mentioned? Send them our way at [email protected]—we’d love to hear how you’re using Result Ranking to improve your on site search experience.

Subscribe to our blog