The Swiftype Blog / Category: Developer

How Swiftype Uses Swiftype:
Part 1 – Developers

I’m Brian, a Software Engineer at Swiftype. I’ve been working a lot on Swiftype Enterprise Search, and I use it every day.

I had our rotating “Support Wizard” hat this week, which means I’m responsible for addressing customer inquiries and cases for the week. Enterprise Search helped me close a customer case in 15 seconds. The customer needed to whitelist our crawler’s IP addresses so we could crawl their site. I went to search.swiftype.com in my browser and searched for “crawler ip ranges.” I clicked the first result from Help Scout and it took me to a recent ticket requesting the same information but from a different customer. Bam! That’s exactly what I was looking for! Case closed.

Brian Stevenson, Engineering Wizard

 

When dealing with code, I use Enterprise Search for a number of different things. The browser extension is super handy when reviewing Pull Requests (PR) in Github. For example, I was looking at a PR that was pulling in a newer version of nokogiri, but it didn’t have a lot of context. All it had was the version bump, the new version of the gem, and small commit message. I opened the Enterprise Search Chrome extension and I was immediately presented with other PRs and Jira tickets related to the same body of work. I was able to click through to those results to get a much better idea of where and why those changes were taking place. At that point, I had much more context and was able to effectively review the changes in front of me. The browser extension is perfect for that – I can open it up on a pull request on Github and see a plethora of additional, relevant PRs and Jira tickets for that area of code.

Using the browser extension with Jira is also super helpful. If I’m looking at a ticket in Jira, it shows me all open pull requests and any other related Jira tickets that may not have been linked. Furthermore, it shows me all of our sprint planning docs in Google Drive and Dropbox, due to our full text extraction capabilities and fine-tuned search algorithms.

One of my favorite things to use Enterprise Search for is when I’m working with our Design team. They create a lot of visual content, like mockups and templates, but where that content is stored in Dropbox isn’t exactly self-evident. So when I’m working on a project that requires implementing their designs, rather than trying to wade through the ocean of digital assets in Dropbox, or bug them to send me an exported version of the new design, I just search for the content in the Enterprise Search app.  I use really simple, but extremely powerful queries like “new dashboard design in dropbox” or “sidebar icons in dropbox.” The search results all have image previews of the visual content they’ve been designing, so I can quickly scan them to find exactly what I’m looking for in an instant.

Enterprise Design Results

I also use Enterprise Search to show me all of the open pull requests assigned to me, across all of our repositories. It’s extremely useful because I don’t have to go to each repository individually to check for those PRs I need to take action on. I also sometimes use it to see PRs assigned to other people, in case they’re out sick, for example.

Speaking of people, the “Person View” is pretty awesome. One of my developers just went on vacation and I needed to be able to see what he was working on to be able to get the work done before the end of the sprint. I just searched for “Chris,” and because he was automatically created as a person in our organization (just by signing up for an account), I was able to see all of his recent changes across all our repositories in Github and other sources. I was able to jump on the highest priority task he was working on and finish it off. Success! I was also able to get more context on the other issues he was working on because I found some conversations he had with other engineers in Slack, and comments he made on tickets in Help Scout.

We also just hired a new engineer (who is coincidentally also named Brian)! I was helping him get up to speed and needed to find this mythical “onboarding” document. I did a quick search for “welcome guide”, and sure enough, the document showed up as the first result. And with a few more quick searches, I was able to find all the other onboarding documents that were scattered around our various cloud services. It’s so handy, and easy, to be able to search and find documents like this. It saves me so much time!

Last but not least, I use the mobile app to receive notifications for upcoming meetings. We have a sprint planning meeting every two weeks, so I get a notification on my phone that says hey, there’s this sprint planning meeting coming up, do you want to review these documents first? And I’m like yeah! I do want to review those docs so I can remember what we’re talking about at sprint planning! Thanks, Swiftype!

Swiftype proposed, and I said yes! A True Love Story in the Making.

I joined Swiftype shortly after graduating from Georgia Tech with a B.S. in Computer Engineering last May. I got to spend four years in Atlanta, which provided me an amazing startup ecosystem that let me invest a significant amount of time while still being a college student. I was able to work on a great team at Springbot and build out and launch an MVP at Stackfolio. I got to venture out a bit and intern at MongoDB last summer as well.

The startups I got to work with varied quite a bit in size, and I decided I wanted to join a startup with a small, somewhat established development team. I think this sweet spot is the best type of environment for growing as a software developer. Swiftype definitely fit that criteria, and much more.

Reasons I ended up signing my life to Swiftype:

  • I can see value in the product.
    • I always default to using the “site:www.website.com” syntax on google instead of using a website’s dedicated search tool. I think it’s silly that the majority of websites get beaten by a generic web crawler at finding their own content.
  • Swiftype gave me the time of day.
    • Quin [the CTO of Swiftype] personally reached out to me the day before my interview with a phone call and a follow-up email to make sure I was doing fine and made my way to San Francisco without issue. He also was very active throughout my entire interview process to make sure everything went smoothly. I got the feeling that he actually (even if just a little bit) cared about me.
    • I got the opportunity to ask in-depth questions about the company and its technology, which caused my interview to run way longer than scheduled. Swiftype was one of the few companies that was happy to take the time to give me in-depth answers.
    • Initial contact to offer was less than three weeks. (Not the quickest of all time, but considering I was on the other side of the world or on a plane for 9 days of that time, I’d say it’s pretty good.
  • I got a clear idea of what I would be doing.
    • More often than not, I think new software developers go into jobs pretty blind on what they’re actually going to do. I learned this the hard way through my first internship! It’s perfectly understandable given many circumstances, and perfectly reasonable for people to put themselves into that situation, but it still makes me very uncomfortable.
  • I knew who I’d be working with.
    • I got to interview with the entire engineering team. I left with the feeling that if I could be where they are when I get to their age, I’d be pretty happy with my career. We’ll see how that turns out.
  • Swiftype aligned with my interests.
    • The vast majority of my abandoned personal projects revolved around scraping data and doing something with it. I only found a select few startups whose business revolved around this concept and actually did meaningful things with it.
  • Super soft hoodies that actually look normal.
    • At least at the time, this was a priority. Unfortunately, not many people or companies actually took me seriously, which is understandable. Regardless, this is my public request for the long awaited Swiftype hoodie V2.
  • Positive Culture inclinations.
    • It’s tough to evaluate culture through interviews that span a short amount of time. But I got the same baseline vibes from the Swiftype engineering team as the friendliest, heartwarming development team I interviewed with in Tennessee. This absolutely wasn’t a priority while I was in the job search, but looking back on it, this definitely helped me make a quick decision to say yes to Swiftype.

*****

Note from the Swiftype Team:
Looking for a new opportunity? Jonesing to work with a talented, up-and-coming software development team? Really into soft hoodies and free lunch? You might be a great fit for the Swiftype team! We’re not on Tinder, but you can check out our careers page for our current openings and apply. 

To Crawl or Not to Crawl: How to Index Data for Site Search

crawler-vs-api

If you’re considering a new site search solution like Swiftype, you’re probably already
aware of the benefits of upgrading your website’s search experience—things like
greater control over search results, a better user experience, and the ability to gather
analytics from user searches. You also know that taking your site search to the next level
will increase conversions and positively impact your company’s bottom line.

But before you can start enjoying the benefits of enhanced site search, there’s one
important decision to make: how to index the content on your site. Indexing lays the
foundation for your search engine by taking inventory of all your site data, then
organizing it in a structured format that makes it easy for the search algorithm to find
what it needs later on. Essentially, if your website is a stack of thousands of papers, the
search index is the mother of all filing cabinets.

There are a few different ways to go about indexing site content, but the two main
options are using a web crawler or a search API. Both choices have pros and cons, so it’s
helpful to understand which one is the best fit for your situation. Here’s the lowdown on
each.

Web Crawler

You may be familiar with Google’s web crawler, Googlebot, which perpetually “crawls” the internet, visiting each available web page and indexing content for potential Google searches. Swiftype’s crawler, Swiftbot, does the same thing for individual websites like yours.

Using a web crawler to index site data has a couple of key advantages. For one thing,
it’s extremely plug-and- play. Rather than pay a team of developers to build the index,
simply select the crawler option and let it do its thing—no coding required.

A crawler also allows you to get your new site search up and running very quickly. For example, Swiftbot creates a search index within minutes by simply crawling your website URL or sitemap. And it stays on top of changes to your site, immediately indexing any new information so that search results always reflect the latest and greatest your business has to offer.

In our experience, the web crawler option works best for the vast majority of our customers. It’s fast and easy to use, yet also creates a powerful, comprehensive search experience that’s a huge improvement over a fragile plugin or other antiquated site search solution. However, there are some situations where the customer needs a greater amount of customization, and in those cases, an API integration might be the way to go.

Developer API

The main advantage of using an API for search indexing is that it gives you full programmatic control over the content in your search engine. There are infinite ways to build a search experience, and an API (like the Swiftype Search API) lets you choose your own adventure and make changes as often as you like.

For example, if you want to index sensitive data that cannot be exposed on your website such as product margins or page views for a particular article, you may want a more custom indexing setup than the one that comes with the web crawler. The developer API allows you granular, real time control over every aspect of your search engine.

Unlike the web crawler option, using an API usually requires a fair amount of coding, so
we usually see this option used by large businesses with bigger budgets and/or a developer team on staff. Also, since an API integration is custom, the initial indexing process can take time to set-up, so it’s less attractive to customers who are anxious to get started.

Which one is best?

The choice between the web crawler and the developer API will come down to your specific situation. Most Swiftype customers are extremely happy with the crawler, but some do require the flexibility and control inherent in the API. We offer both options so that you can choose the best one for your site and business.

No matter which option you choose for indexing data, the ultimate outcome will be an enhanced site search experience that’s more relevant—and more profitable—than your current solution.

How to Index Thumbnails for Crawler Based Engines

As you’re getting started with Swiftype, you may be wondering how to index thumbnails from your website and serve them to users in your search results. The answer to this question lies in using Swiftype’s custom <meta> tags, which allow site owners to pass detailed web page information directly to Swiftbot, our web crawler, as it moves across your site. As Swiftbot encounters these custom Swiftype <meta> tags, it indexes their content and incorporates that information in your search engine index schema.

To index thumbnails from your website, all you need to do is add a Swiftype image <meta> tag to the <head> section of your website template that indicates where images are located on your various page types. For illustration purposes, the Swiftype image <meta> tag is formatted like this:


Swiftype recommends placing these <meta> tags at the template level of your website to ensure that image files are dynamically populated within the tags, rather than being added manually for every page on site.

NOTE: the value of the “content” attribute must be HTML encoded. For more information see this guide.

Alternatively, you can wrap images with a body-embedded Swiftype image <meta> tag to avoid changing your website <head>. For example, Swiftbot will index example.jpg into the image field from the HTML below:

<body>

Hello world

 

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed ut risus sed ante dignissim pharetra aliquet a orci. Maecenas varius.

 

In in augue molestie, bibendum velit vel, luctus erat. Curabitur cursus, tellus at feugiat lacinia, tellus est suscipit lectus, non commodo diam elit sit amet justo.

http://fullurl.com/example.jpg </body>

It is important to note that in both the <head> and <body> embedded <meta> tags, you need to specify the data-type attribute as enum. For images, this will always be the case. For any other custom meta tags you choose to define, each attribute must be a valid, Swiftype-supported field type, which you may read about here.

Once you index thumbnails from your website, you can easily customize your search results and autocomplete to feature thumbnails in a range of shapes and sizes with the Swiftype Result Designer.

To learn more about using custom Swiftype <meta> tags to refine your search engine index, check out our tutorial. As always, if you need help or have any questions, feel free to reach out to us at [email protected].

Building an Asynchronous API to Improve Performance

One of the challenges we’ve had to deal with at Swiftype is that we have had customers pushing a lot of search and indexing traffic from very early on. When a customer is pushing hundreds of index updates per second, it’s important to respond quickly so we don’t start dropping requests.

In order to do that, we’ve built bulk create/update API endpoints to reduce the number of HTTP requests required to index a batch of documents and moved most processing out of the web request. We’ve also invested in front-end routing technology to limit the impact customers have on each other.

However, we were not satisfied. Sometimes when a large customer was indexing a huge number of documents, our front-end queues would still back up.  In the pursuit of even better response times for our customers, we’ve built an asynchronous indexing API. Our goals in creating the new API were high throughput, supporting bulk requests for all interactions, and excellent developer ergonomics. We wanted an API that was fast and easy to use.

Here’s how it works.

async_bulk_API_vertical_2.28.39_PM

First, customers submit a batch of documents to create or update. The request for this looks just like our pre-existing bulk create or update API, but goes to a new endpoint.

When our server receives the response, it performs a quick sanity check on the input, without hitting the database. If all the input parameters are present and validly formatted, we create two records in our database for each document that was submitted: a document creation journal, and a document receipt.

For performance, we insert these rows using activerecord-import. This is a great library that uses a single INSERT statement with multiple rows. This results in a massive speed improvement compared to standard ActiveRecord when saving a large number of records. We also generate the IDs ahead of time using BSON. By generating the IDs ahead of time, we don’t need to get them from the database after inserting, and using BSON lets us encode a timestamp in the ID at the cost of a larger ID column.

Once created, we enqueue a message for each document creation journal onto a queue that is read by a pool of loops workers. Loops is a dead-simple background processing library written by our Technical Operations Lead, Oleksiy Kovyrin. It makes it easy to write code that does one thing forever, in this case, reading messages off the queue and creating the associated document in the database.

The response to the API request includes a way to check the status of all the document receipts. To make the API easy to use, we’re including URLs to the created resources. Though we’re not following all its precepts, this approach is inspired by the idea of the hypermedia API. These URLs make it easy for both humans and computers to find the resource.

Since the API is asynchronous, users must poll the document receipts API to check for the status of the document creation. We’ve built an abstraction in our Ruby client library that allows developers to simulate a synchronous request, although we recommend that only for development.

By pushing all work except for JSON parsing and the most minimal input validation to the backend, we’re able to respond to these API requests very quickly. On the backend, the loops workers read messages off the queue and create documents. When a loops worker attempts to create a document, it updates the document receipt (either with the status of “complete” and a link to the created/updated document, or with the status “failed” and a list of error messages) and deletes the document creation journal.

This brings us to one final aspect of the asynchronous API: how we make sure it keeps working. If our loops workers started failing, the document creation journals would back up without being processed, and no documents would be created/updated. To guard against this, we have built a monitoring framework that alerts us when the oldest record in the table is older than a certain threshold.

This solution has been successful for us in beta tests with our largest API users, and we have now rolled it out to everyone.

We hope this helps you build out your next high-throughput API. If this is the kind of thing you’re interested in, we’re hiring engineers for our core product and infrastructure teams.

ObjectIdColumns: Transparently Store MongoDB BSON IDs in a RDBMS

Here at Swiftype, we use both MongoDB and MySQL to store some of our core metadata — not search indexes themselves, but users, accounts, search engines, and so on. As we’ve migrated data from MongoDB to MySQL, we’ve found ourselves needing to store the primary keys of MongoDB documents in MySQL.

While it’s possible to use more-or-less arbitrary data in MongoDB as your _id, very, very frequently you will simply use MongoDB’s built-in ObjectId type. This is a data type similar in concept to a UUID; it can be generated on any machine at any time, and the chance it will be globally-unique is still extremely high. Some relational databases offer native support for UUIDs; we thought, why shouldn’t we teach Rails how to get as close to that ideal as possible with ObjectIds, too?

The result has been our objectid_columns RubyGem, which we are proud to release as open source under the MIT license. Using ObjectIdColumns, you can store MongoDB ObjectId values as a CHAR(24) or VARCHAR(24) (which stores the hexadecimal representation of the ObjectId in your database), or as a BINARY(12), which stores an efficient-as-possible binary representation of the ObjectId value in your database.

No matter how you choose to store this data, it’s automatically exposed from your ActiveRecord models as an instance of the bson gem’s BSON::ObjectId class, or the moped gem’s Moped::BSON::ObjectId class. (ObjectIdColumns is compatible with both equally; the two are extremely similar.)

my_model = MyModel.find(...)
my_model.my_oid # => BSON::ObjectId('52eab2cf78161f1314000001')

You can assign values as an instance of either of these classes, or as a String representation of an ObjectId — in either hex or pure-binary forms — and it will automatically translate for you:

my_model.my_oid = BSON::ObjectId.new # OK
my_model.my_oid = "52eab32878161f1314000002" # OK
my_model.my_oid = "R\xEA\xB2\xCFx\x16\x1F\x13\x14\x00\x00\x01" # OK

ObjectIdColumns even transparently supports queries; the following will all “just work”:

MyModel.where(:my_oid => BSON::ObjectId('52eab2cf78161f1314000001'))
MyModel.where(:my_oid => '52eab2cf78161f1314000001')
MyModel.where(:my_oid => 'R\xEA\xB2\xCFx\x16\x1F\x13\x14\x00\x00\x01'))

Enjoy! Head on over to the objectid_columns GitHub page for more details, or just drop gem ‘objectid_columns’in your Gemfile and go for it!

If you enjoyed the tips in this tutorial, make sure to bookmark our blog and subscribe for more announcements like our new Swiftype Ruby Gem.

Our Cloud Stack at Swiftype

Swiftype site search was featured as LeanStack’s service of the week. As part of that I wrote a guest blog post about how Swiftype uses cloud services to run our business.

“Implementing a better product with less hassle is really only half the advantage of using a service like ours. The other half — which doesn’t seem to get as much marketing play — is that by leveraging the product of a company dedicated to a single, specific technology, you realize the gains of having a full-time team of domain experts dedicated to improving your search feature, without assuming any of the cost. At Swiftype we spend all of our time thinking about, developing, and iterating on search, and every time we ship an improvement, all of our customers reap the benefits instantly. Our experience has shown that at most companies it can be a full-time job just maintaining an internal search system, much less improving it over time. When search isn’t a core competency of your company, we believe you’re better off letting us take care of the details. And of course the same philosophy applies to our company as well, which is why we leverage so many existing cloud-based services in our daily operations. Anywhere that we can save time and resources using a product that another company focuses their full effort on delivering is a win for us, because it allows us to spend our resources on what we do best — building great search software.”

Read the post to learn more about our cloud stack and the services we use.

If you liked this post, please remember to bookmark our blog and subscribe to our newsletter. We’ll be posting announcements and more from the Swiftype team, as well as our friends and partners who power their search with Swiftype, such as Laughing Squid.

Sitemap.xml Support for Swiftype

At Swiftype we’re always working on new ways to improve the quality of the crawl of your website, and today we’re announcing Swiftype crawler support for the Sitemap.xml protocol.

The Sitemap.xml protocol is a well-documented and widely implemented standard for specifying exactly which set of URLs you would like web crawlers to index on your website, and if your website supplies a sitemap.xml file to our crawler we will dutifully follow your specifications as our crawler builds a search index for your website.

If you aren’t familiar with Sitemap.xml files, we’ll take you through a quick tutorial here, and there is additional information in our documentation section as well as the official protocol page.

To get started, create a simple sitemap.xml file. An example sitemap.xml that specifies 3 URLs might look as follows:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>http://www.yourdomain.com/</loc>
  </url>
  <url>
    <loc>http://www.yourdomain.com/faq/</loc>
  </url>
  <url>
    <loc>http://www.yourdomain.com/about/</loc>
  </url>
</urlset>

Next, you’ll put the sitemap.xml file on your web server at a location that is accessible by our crawler. Many sites place the sitemap at the root of the domain (i.e. http://www.yourdomain.com/sitemap.xml), but any location is fine. Whatever location you choose, you should specify the location in your Robots.txt file as follows:

User-agent: *
Sitemap: http://www.yourdomain.com/sitemap.xml

If you’re unfamiliar with the Robots.txt file, you can find more information at the official Web Robots page.

Once your robots.txt file is updated and your sitemap.xml file has been uploaded you’re finished. The next time the Swiftype crawler visits your website we’ll recognize your sitemap.xml file follow the links you specify.

As always, if you’re having trouble or want more information, feel free to get in touch. Also, don’t forget to follow the blog so you don’t miss out on great content from our friends like Bob Hiler from Mixergy.