The Swiftype Blog / Category: Site Search

WordPress Search Plugin Updated for WordPress 4.0

Last week, WordPress released their newest version – WordPress 4.0, named “Benny”. As we know how valuable great search is for WordPress site owners, we’re happy to announce that the Swiftype WordPress Search Plugin has officially been updated for WordPress 4. If you are updating your site, you should see no issues continuing to use the Swiftype WordPress Search Plugin while still getting all the same great features you are accustomed to. If you’re just getting started with WordPress or Swiftype, we now support the latest version.

We’ve got some more WordPress updates coming in the near future, so make sure you check the blog regularly so you don’t miss out. Also, if you have a WordPress site and are looking for a better search solution, go ahead and check out the plugin.

Why We Updated Our Site Search Algorithm

Since we launched Swiftype back in 2012, we’ve worked tirelessly to build the best website search product around. Our founders, Matt & Quin, started Swiftype after discovering site search has been broken for a long time. Of course, the single most important element to search from the searcher’s perspective is relevance of the results. Many of our key features are centered around making it easy to ensure your website’s search results are relevant, from the search engine we build for you providing relevant and up to date results from the moment of your first crawl to allowing heavy customization of your results if you feel they aren’t quite right (by the way – keep an eye out, we’ve got some really exciting stuff coming on the customization front).

However, just because we’ve built a great product doesn’t mean we’ll quit making it better. Quin, our CTO has mentioned that one of the things that makes search so interesting is that he considers it a moving target – something that can always be improved on. In that spirit, we’re incredibly excited to announce a major update to our search algorithm, which now features phrase scoring! This means that our algorithm will now focus on better matching of your search results to the meaning of your entire query, rather ranking based on relevance to each individual keyword in the query. It’s a bit of a subtle difference, but it has some major implications on search results relevance.

A common misconception about most site search products is that the search engines generally understand queries as humans do. For example: if I told you I needed to buy a pair of dress shoes, you would immediately recommend a shoe store, not a dress shop. However, most website search engines use simple keyword matching, which means they rank documents based on their relevance to the individual keywords in the query, rather than the meaning of the query as a whole. In the given example, this likely wouldn’t pose a huge problem, as the most relevant results would be the ones that included both dress and shoes – although there might be some unexpected products further down your result set! However, imagine if you had searched for “black dress shoes.” Without phrase scoring, black dresses are about as likely to show up as dress shoes.

For another illustration, let’s go through an example that we’ve seen in some of our clients.

A very common relevance signal considered in ranking documents against queries is recency. For an illustrative example, consider TechCrunch. Some of their most popular content is news about the iPhone. As TechCrunch is a news site, recency is one of the most important relevance factors, so a search for iPhone will rank news on the new iPhone 6 highly. However, what if you are researching the history of the iPhone, and are looking for press on the iPhone 3? Since most site searches don’t offer phrase scoring, they’ll likely rank results based on relevance to each individual keyword, as well as recency. Since 3 is an extremely common keyword, the results will again be heavily weighted towards recent stories on the iPhone.

Now, with Swiftype, performing a search for “iPhone 3” will consider the sets and order of words. Our results ranking is based on how closely the result matches the set of words, so the results will be much more likely to be relevant to the actual meaning of the search. We’ve been beta testing this feature for a few months with select clients, who have seen improved click-through rates, conversion rates, and retention rates, as their visitors are finding what they are searching for much more often.

Now that we’ve rolled the new algorithm out to all our users, you’ll see immediate improvements to your results. The impact will be felt most by websites that contain large amounts of text, as this is when considering the meaning of the entire query is most meaningful, but all our users should see improvements. For an example of the impact Swiftype’s phrase scoring can have on search results, compare TechCrunch’s search to Mashable’s search. Since those are live search links, I went ahead and took screenshots in case your results look a bit different:

Swiftype-Search-v-Other-Site-Search

All of TechCrunch’s results are clearly about the internet of things (and check out the great faceting they’ve implemented). However, none of the top results on Mashable are at all related to the internet of things, instead being ranked on relevance to the term “internet,” or if I had grabbed the sixth result (The 7 Most Worthless Things at Your Garage Sale), the term “things” (by the way, if anyone from Mashable reads this, go ahead and contact us, we’d love to help).

This is just one example of how we’re continuing to innovate and build in the website search world. This isn’t a feature widely available in website search solutions, or in the out of the box search you get from a third party website platform. We’re looking to fix site search for the broadest possible audience, which is very difficult and complex task. Sometimes it means building a drag-and-drop results customization tool, other times it means releasing feature updates for those of you using our WordPress Plugin. But at its core, search is about relevant results and ensuring your users find what they are looking for, and we are constantly working to make sure we offer the best solution you can find.

To find out more about how Swiftype can fix your search problems, drop us a line.

New Feature: Location Attributes Can Now Have Multiple Values

We’ve made a small change in how location attributes are handled within Swiftype. Now you can have multiple values for location attributes associated with one document. This is great news for many of our customers, because now, if you have a page listing all your store locations, that page can now be associated with all of those locations. Previously, to make sure searchers would be able to find all your locations, you’d need a unique page for each location.

Here’s how to do it using Meta Tags:

<head>
  <title>page title | website name</title>
  <meta class="swiftype" name="title" data-type="string" content="page title" />
  <meta class="swiftype" name="body" data-type="text" content="this is the body content" />
  <meta class="swiftype" name="url" data-type="enum" content="http://www.swiftype.com" />
  <meta class="swiftype" name="store_location" data-type="location" content="25,-10" />
  <meta class="swiftype" name="store_location" data-type="location" content="20,-15" />
  <meta class="swiftype" name="store_location" data-type="location" content="40,-10" />
  <meta class="swiftype" name="store_location" data-type="location" content="20,-20" />
  <meta class="swiftype" name="tags" data-type="string" content="tag1" />
  <meta class="swiftype" name="tags" data-type="string" content="tag2" />
</head>

As you can see, there are 4 store_location fields. You can also add the attributes to existing elements if you’d prefer. You can find out more in our Meta Tags 2 documentation.

You can also use repeated location fields with the Swiftype API. For example, to create a Document with multiple locations similar to the Meta Tags example above:

Swiftype.api_key = 'your_api_key'
client = Swiftype::Client.new
client.create_document('your_engine', 'your_document_type', {
  :external_id => 'unique_id',
  :fields => [
    {:name => 'title', :type => 'string', :value => 'document title'},
    {:name => 'store_location', :type => 'location', :value => {:lat => 25, :lon => -10}},
    {:name => 'store_location', :type => 'location', :value => {:lat => 20, :lon => -15}},
    {:name => 'store_location', :type => 'location', :value => {:lat => 40, :lon => -10}},
    {:name => 'store_location', :type => 'location', :value => {:lat => 25, :lon => -20}},
  ]
})

We’re looking forward to this feature helping out a lot of our customers, so please, reach out with any questions or comments.

Feature Announcement: Cross Origin Resource Sharing

cors-03

We’ve got some exciting news: we now offer Cross Origin Resource Sharing (CORS) for our public API. For a detailed explanation, the linked Wikipedia page is a good place to start. In layman’s terms, you can now access our read-only public API using Javascript from another site. In other words, your website can now pull information from our public API for use on that website without using JSONP. We’ve had a few customers request this instead of the previous JSONP method to deal with situations where the request is too large for a GET. It’s also bit cleaner of an implementation, so we went ahead and rolled it out across the board. It shouldn’t change anything on your end if you weren’t having issues with JSONP before.

Here’s an example of how to execute a search query with the Swiftype Public API with JSONP with jQuery:

var params = {
  q: "your search terms",    
  engine_key: "YOUR_ENGINE_KEY"
};

function handleSearchResults(data) {
  // do something with the search results  
  console.log(data);
}

$.getJSON("https://api.swiftype.com/api/v1/public/engines/search.json?callback=?", params).success(handleSearchResults);

And here is the same query using CORS:

$.getJSON("https://api.swiftype.com/api/v1/public/engines/search.json", params).success(handleSearchResults);

(Note there is no callback parameter.)

We hope this makes using the Swiftype public API easier for those of you who can rely on it. JSONP will continue to be supported, of course!

NOTE: Like JSONP, CORS is only supported from our public API, as it isn’t secure to use your secret API key in front-end JavaScript.

Keep an eye on the blog for more engineering features, and as always, feel free to reach out for support.

We’ve Updated the Swiftype WordPress Search Plugin

WordPress_Plugin_Swiftype

We’ve released another new version of the Swiftype Search WordPress plugin. This release has two new features that will make Swiftype WordPress Search work even better.

First, we’ve added support for WP-CLI. This is great especially for those of you with 10,000+ posts, because indexing from the command line is much faster. You can increase the post batch size to get more posts indexed at once.

wp swiftype sync --index-batch-size=100

You can also quickly re-index by destructively dropping all the documents that have been indexed (great for development).

wp swiftype sync --destructive --index-batch-size=100

If you’ve got WP-CLI installed (and if you don’t, why not?), type wp swiftype in your WordPress directory to see help text for the available commands.

Second, we’ve merged an open source contribution by Paul Morisson to make it easier to modify the query before passing it to Swiftype. The new swiftype_search_query_string filter allows you to pre-process the search query. Add or strip quote marks, remove words from the query, or add them. For example, if you know that your visitors search for phrases more often than combinations of words, you could pre-process all your queries as phrase match to increase the likelihood that your visitors find what they are looking for.

We hope this improves the flexibility and usefulness of our WordPress plugin for all our users. As always, feel free to reach out for help.

Ecommerce Industry Names Site Search Biggest Need for 2014

According to a recent Oracle survey of B2B companies, “advanced on-site search/navigation” was the most-cited “key capability” for B2B ecommerce, mentioned by 45% of respondents. Site search has clearly been identified by B2B ecommerce companies as a “key capability:”

“Many respondents believe that their customers are looking for key capabilities such as custom pricelists, search & navigation, and mobile web/apps when buying online.”

Additionally, Oracle’s survey of B2C ecommerce companies drew the following conclusion:

“While there is much focus on building relationships with customers to increase loyalty, a huge challenge for organizations is the lack of visibility to the customer, their preferences, and their relationship with the brand as a whole (across channels).”

Customer feedback is essential to B2C ecommerce. We’ve seen our clients leverage Swiftype search to help understand customers, as seeing exactly what your customers are looking for is highly relevant. Our simple custom results offering makes it easy to quickly respond to this feedback and design optimal paths for searchers on your website.

At Swiftype, we power numerous ecommerce site searches, and have fantastic plugins built specifically for ecommerce platforms such as Shopify and Magento. If you have an online store and are ready to add powerful ecommerce search to your store, send us a note to set up a demo.

Why Good Mobile Search is Essential to Your Mobile Strategy

Google may not be the most important player in mobile search for much longer if the trends recently reported by eMarketer continue. A recent study highlights how critical good mobile site search is to anyone with a mobile app, as well as the value of powerful analytics such as those provided by us at Swiftype.

Google became the dominant player on the web thanks to an unrivaled ability to provide the best answer to a query quickly. With the ever-expanding native app world, Google now helps people discover apps that provide even better answers to hyper-specific queries – by allowing deep-links to specialty apps such as Yelp or Kayak. While most will continue to use Google for their broadest searches, those who have downloaded apps designed for specific verticals now expect searches within those apps to generate high-quality, relevant content.

A major trend in 2014 has been the growing adoption of deep-linking, enabling seamless app-to-app, app-to-web, or web-to-app navigation, similar to the site-to-site navigation familiar to web users. As this technology becomes more broadly adopted, marketers are realizing the potential to drive more engagement within their apps, often offering a much better mobile experience than on the mobile web. Google even recently announced that app content will now be indexed. We are now seeing the early changes driven by deep-links.

A new study by eMarketer (reported and analyzed by TechCrunch), reports Google experienced a 17% drop in mobile ad revenue. Meanwhile, companies like Yelp, which generate substantial searches within their mobile apps are seeing major growth in mobile ad revenue. This shift towards searches within apps is compounded by the well-documented overall shift towards mobile internet use from desktop (a Nielson report claims that we spend ~34 hours a week on mobile internet compared to ~27 hours on desktops). Also, a recent study shows nearly 90% of mobile web usage occurs within apps, rather than on the web.

This emerging trend highlights the importance of good search in your mobile app. Regardless of your plans to begin selling ads, if users are more likely to perform a search in your app than on Google, it’s critical to provide them a great experience to entice their return. The shift also creates a major opportunity to learn from your customers, which clients like Asana and SupportBee extensively leverage. If you use a tool such as Swiftype, you’ll even get incredibly simple custom results controls, letting you design better experiences for your searchers than a pure algorithm could generate. You’ll also get access to our powerful search analytics. At Swiftype, we power great site search in mobile apps such as Twitch, DramaFever, Shopify, Vayable, TechCrunch, and more.

Across the quarter billion queries we serve monthly, we’ve seen nearly 25% of them come from mobile (and many of our customers aren’t yet large enough to worry about mobile optimization, as we offer a great low cost plan). Overall, in the US, mobile and tablet search volume only just reached 20% of total Google searches. Both of these numbers will only continue to grow as smartphones move closer to 100% adoption, their power increases, and the speed and availability of mobile internet improves. If you have a mobile app and would like to improve your search, reach out to us to schedule a demo.

Launch a Site Search Overlay from Any Clickable Element


A handy new tip has just been loaded into Swiftype’s Tutorials section.

“Undocumented feature” no longer, you can learn how to incorporate your site search box in a pop-up overlay similar to what you’ll encounter when clicking around swiftype.com!

Get the full scoop by visiting our tutorial doc here.

And as always, feel free to drop us a line with any questions or comments about this or any other features.

MetaEvents RubyGem: DRY Up, Structure, & Document Your Mixpanel Events

Here at Swiftype, we’re huge fans of Mixpanel. Mixpanel is a service that provides very easy, yet very scalable and powerful, user-centric analytics to Web and mobile applications — with just a few lines of code, you can be up and running, gaining deep insight into how your users are using your product. One of Mixpanel’s great strengths is how easy it is to get up and running; you can embed just a few lines of JavaScript and be analyzing your application in a few minutes. For example:

# app/views/layouts/application.html.erb:# app/views/pricing/show.html.erb:# app/views/pricing/paid.html.erb:

 

These few little snippets of code will allow you to track user progress through a paid-plan flow; you’ll be able to see which users are looking at which plans, breaking them down by their current plan and/or which plan they looked at, track conversion rates through the flow, and so on.

Mixpanel is centered around events (like User Looked at Pricing Plan or User Signed Up for Pricing Plan), which are emitted when you call mixpanel.track (and which are the basic unit of pricing for Mixpanel), andproperties (like currentPlan, newPlan, or oldPlan), which are included with events and which are free. One of the keys to a high-quality Mixpanel integration is to pass lots and lots of properties: the more properties you pass, the more ways you’ll have to analyze your data. This is particularly helpful when applied in a speculative fashion: if you work to pass lots of data now, then the number of historical analyses you’ll be able to do in a month, six months, or a year goes up greatly. When you’re staring at the data, trying to figure out what’s going on, it’s so much nicer to think “ah, let me look at X!” than to think “gee, I really wish I’d measured X — maybe if I add it now, I can answer this question in another two months”.

Expanding the implementation

Let’s expand our implementation. We might want to add a few more properties about the user; Mixpanel lets us do this using something called super properties (using the register call), which are passed with every event once set:

# app/views/layouts/application.html.erb:

And let’s pass considerably more properties on each event. The price of a plan is a critical factor, as are its capabilities; we definitely want to record those as well as the plan name, as we might change a plan’s price or capabilities over time. We’ll also throw in plan IDs, since the names might change:

# app/views/pricing/show.html.erb:# app/views/pricing/paid.html.erb:

From a Mixpanel point of view, this is a lot more powerful: we can now do calculations based on differences in price, slice and dice by yearly and monthly prices, support type, max users, and so on. We’ll be able to answer questions like “what was the effect in upgrades when we increased the maximum number of users from 3 to 5 on the middle-level plan, for those users who were still on the basic plan?”. We’ll be able to be clear and consistent in our data, and observe historical prices and capabilities of plans, even if we change them later.

…And now, the problems ensue

However, from a code point of view, this is the leading edge of turning into what we technical folks call a big ol’ mess. Our code is verbose, it isn’t DRY at all, and, as a result, it’s very error-prone. (How many of you noticed that I accidentally passed the name of the old plan twice in the second example, rather than the ID and the name?) And this is with just two events — can you imagine what it’s going to look like when we have twenty, or fifty?

Further, maintaining this code in the long run is going to be a nightmare. If we add another property to plans that we’re interested in monitoring, we have to go update every single event and add that property, or we’ll have inconsistent data. If we want to change property names — again, we have to go update every single call site. We have two events right now; in a real production system, we might have thirty or fifty. Yuck.

Not quite as obvious, but perhaps even more important, is the fact that having a clean record of exactly what an event means — and changes that might affect that event! — is critical for correct analysis. For example, if we decide to show pricing plans to everybody directly on their home page, the number of events for User Looked at Pricing Plan is going to skyrocket, and so the conversion rates to User Signed Up for Pricing Plan are going to plummet. Sure,you might be able to remember this, right now — but in another year, when you have six more people looking at it, is everybody going to remember all of the fifteen different significant changes you made over that year when looking at your results? There has to be a better way, right?

It may help to consider, on a more theoretical level, what’s happening when you use Mixpanel effectively. The properties you pass are effectively a snapshot of various parts of your database; the user is likely the single most important part of that snapshot, but there are plenty of other objects that contribute, too. That might be a database row representing a user-to-user communication, a taxi ride, a stay overnight, or a search engine, depending on your domain, but essentially you are reflecting a denormalized snapshot of a chunk of your database to Mixpanel with each event — this is how it can be so effective for you. When you consider it this way, it becomes even more clear why adding some structure and mechanism can be of huge advantage: with the right framework, you ought to be able to gather that database information and pass it very easily, almost implicitly, rather than having to maintain huge lists of properties all over your application.

What About Super Properties? Mixpanel’s “super properties”, while incredibly useful, can also be problematic. The implementation of these is straightforward: Mixpanel’s library issues a permanent cookie to your end user that records the current set of “super properties” that is registered; when firing an event client-side, it simply merges these properties in with any specified in the event. This is a really simple, useful, and powerful model, and is great when you’re starting out. However, there are several caveats: perfect updating is required — if you change data server-side and forget to re-call Mixpanel.register, that data will be perpetually incorrect in Mixpanel; inaccessible server-side — if you fire events server-side (and, in our experience, you inevitably will have to at some point, like email generation or background tasks), you simply won’t have access to that data at all; easy to tamper with — it’s really easy for users to change their own super properties. As you’ll see below, our MetaEvents library replaces “super properties” with implicit properties, which largely eliminate all these issues.

Introducing MetaEvents

We’d love to introduce you to Swiftype’s solution for all these problems: the MetaEvents RubyGem. Let’s take a look at what our code from the above example would look like using MetaEvents. First, we define methods on some of our models that convert them to properties, and set up MetaEvents in our ApplicationController:

# app/models/user.rb
  def to_event_properties
    { :signup_date => created_at, :account_type => account_type,
       :signin_count => signin_count }
  end

# app/models/plan.rb
  def to_event_properties
    { :id => id, :name => name, :monthly_price => monthly_price,
       :yearly_price => yearly_price, :max_users => max_users,
       :support_type => support_type }
  end

# app/controllers/application_controller.rb
  def meta_events_tracker
    @meta_events_tracker ||= MetaEvents::Tracker.new(
      current_user.id, request.remote_ip, :current_user => current_user)
  end

And now we can fire events from our controllers just this easily:

#app/controllers/plan.rb
  def show
    @plan = Plan.find(params[:id])
    meta_events_tracker.event!(:plan, :show, :plan => @plan)
  end

  def pay
    # ...
    meta_events_tracker.event!(:plan, :paid,
      :old_plan => @old_plan, :new_plan => @new_plan)
  end

(Here, we’re firing events server-side; we’ve found this to be more flexible and consistent than client-side events, but it’s just as easy to fire the events client-side, if you prefer.)

Several interesting things are happening here:

  • MetaEvents allows us to pass implicit properties on every single request (the MetaEvents::Tracker.new call); this is like Mixpanel’s “super properties”, only more reliable (because they’re guaranteed always up-to-date) and in your full control;
  • MetaEvents lets us pass objects as properties; it expands them using their #to_event_properties method, and integrates them into events, prefixing them with whatever key you passed them in with;
  • MetaEvents provides a flexible model for firing events server-side; it’s easy to fire them asynchronously using Resque or a similar system.

We’re still passing through every bit as much data as before, only now it’s completely DRY. Adding properties is a piece of cake, properties will be completely consistent across events, and 100% up-to-date information about the current user will be passed on every single event.

Finally, let’s look at what config/meta_events.rb, which defines our events, might look like a year from now:

category :plan do
  event :show, "2014-03-03", "user looks at a pricing plan" do
    note "2014-06-14", "pburkart", "we moved plan display onto the dashboard...vast increase in displays"
    note "2014-07-11", "lweyand", "moved off dashboard unless a user was over plan limits"
    note "2014-10-17", "mvellez", "more aggressive visual display of plan on dashboard"
    note "2014-12-09", "lweyand", "holiday promotion lightbox added"
    note "2015-01-05", "mvellez", "holiday promotion lightbox removed"
  end

event :paid, "2014-03-03", "user pays for a new pricing plan" do
  note "2014-08-09", "mvellez", "removed street address from payment form -- turns out we don't need it"
  note "2014-08-17", "lweyand", "added PayPal support"
  note "2014-11-13", "pburkart", "upped free trial from 30 to 60 days"
  note "2014-12-14", "pburkart", "reduced free trial back to 30...turns out 60 didn't make any difference"
  end
end

Not only does this file become a canonical record of what events you’re firing (as adding an event here is required before you can fire it), it also becomes a historical record of changes to your events. This tree is even exposed via MetaEvents, so you could easily turn this into an HTML report for the pointy-haired bosses around.

MetaEvents, defined

MetaEvents is a RubyGem that provides a framework for structuring your events, efficiently exposing large numbers of properties, adding implicit properties based on the currently logged-in user and browser, and firing events either server-side or client-side. When used in a large-scale Ruby application:

  • You’ll be able to understand your events — forever. MetaEvents provides a Ruby-based DSL to document your events; you will have a permanent record of what each event is for, when it was introduced, and any changes you’ve made. This alone will probably make your product managers and business folks very happy, if experience is any guide.
  • You’ll pass far more properties, and they’ll be consistent across all events. MetaEvents encourages you to define #to_event_properties methods on your models, and then pass entire models to its methods; it then automatically merges all properties of those models into the event. Now, when you think “hey, I wonder if…”, you’re much more likely to already have that data in Mixpanel for weeks or months than to have to add it now.
  • You’ll capture environmental/contextual data automatically. MetaEvents lets you define implicit properties, which are fired with every single event and are typically properties from the currently-logged-in user, browser, account, or so on. Better than Mixpanel’s “super properties” because they come from your database right now and work server-side, these further increase the axes along which you can do analysis.
  • You’ll still reap these benefits when firing events from the client. MetaEvents provides very easy ways to define events server-side and fire them client-side, either automatically on links or via any mechanism you choose.

Implementing MetaEvents doesn’t take long at all, and it can happily coexist with your existing Mixpanel code — you should be up and running within an hour, tops, and be able to expand rapidly from there.

Get started with MetaEvents now!

If you need to track users who aren’t logged in (and who doesn’t?), you might also want to take a look at our WebServerUid RubyGem, which provides an easy way to generate a unique browser ID for visitors.

MetaEvents and a predecessor system have been used in two different large-scale Rails web applications, providing detailed analysis at scale: over 500,000 events per day, with between 20-50 properties each, all with extremely little maintenance or overhead. Although its release is recent, the ideas have proved themselves over the course of several years. It is also thoroughly tested and documented; we think you’ll find the code easy to read and well-structured.

WebServerUid: Easy Unique Browser IDs for Rails & Better Analytics

Here at Swiftype, one of the ways we work hard to improve our product is to use various analytics tools (including our own search analytics!) to watch how people are using our website, so we know what’s working for our customers and what isn’t. We use (and love!) Mixpanel, Google Analytics, and in-house tools built around various databases and log files.

When you’re doing these kinds of analytics, giving each user — or, especially, visitor — a unique ID is paramount. For logged-in users, this is easy; you just use their user ID. For visitors, you need to generate some kind of synthetic ID and use that.

This is easy enough to do: generate a UUID or sufficiently-large random number, hand it to the visitor in a cookie that never expires, and be done with it. wipes hands Done!

…well…almost.

There’s one problem — and it’s a big deal. Most web servers have, at best, a lot of difficulty logging outbound Set-Cookie headers; they typically only log inbound Cookie headers from cookies the client already has. This means you won’t get this generated ID in the log line for the very first request the user makes — and this is the very most important single HTTP request they will ever make, because it tells you how they found your site and what page they landed on. You can see what’s going on below; the cookie simply isn’t present in the request that your server is logging:
Web_Server_UID_01_No_Cookie
Fortunately, both Apache and nginx provide modules that solve this problem quite nicely. Apache’s mod_uid and nginx’s httpuseridmodule both can generate a unique token for each visitor, issue it to them in a cookie, and add it to your HTTP log file, even on the first request.

Let’s do this with nginx, by adding the following to our /etc/nginx/nginx.conf (by default, nginx compiles in support for http_userid_module already):

userid on;
userid_name brid;
userid_path /;
userid_expires max;

proxy_set_header X-Nginx-Browser-ID-Got $uid_got;
proxy_set_header X-Nginx-Browser-ID-Set $uid_set;

This tells nginx to generate a unique ID for all requests, to store it in a cookie named brid, to set its expiration to the maximum time allowed, and to pass it back to our Rails site in a header. There are actually two headers, because $uid_got will contain any inbound value for the ID ( i.e. , from a cookie the client already had) and $uid_set will contain an outbound value for the ID ( i.e. , that which nginx generated for this request).

Now, our situation looks like this:

Web_Server_UID_02_Cookie_Set

We can read this value from Rails by looking at request.env['HTTP_X_NGINX_BROWSER_ID_SET'], which will contain a value like brid=D07FA8C019EA0753B600AD0F02030303. If we configure our nginx logs to output the contents of $uid_set, we’ll get it there, too. On the next request, we’ll get the exact same string in request.env['HTTP_X_NGINX_BROWSER_ID_GOT'], instead, because nginx is telling us it’s a UID passed by the client, rather than generated by nginx itself:

Web_Server_UID_03_Cookie_Got

Past the first request, we’ll also get the value in cookies[:brid] — but in a different format. nginx sends us the ID as a hex string, but the cookie it sends to the client is Base64-encoded, so, while it represents the same value, it looks like wKh/0FMH6hkPrQC2AwMDAg== instead.

These are two incompatible formats, and neither of them is ideal if you want to store these values in your database — you may wish to save on precious buffer cache by using the most-compact possible format of pure binary data. Further, these IDs actually have internal structure; they’re generated using things like the IP address of the server, the start time of the web server process, the process ID, and a sequence token, which can be of use.

To help with this, we’re proud to offer WebServerUid, a small Ruby class that can:

  • Read any of these formats (hex, Base64, or binary);
  • Return any of these formats;
  • Compare and hash itself cleanly ( i.e. , <, >=, <=>, ==, and hash all work correctly);
  • Expose the internal structure of the UID (things like the IP address of the generating server, the PID of the generating process, and that process’s start time are in there);
  • Generate new UIDs from scratch, using the same algorithm as Apache and nginx.

Using WebServerUid, it takes about one minute to configure your web server to generate unique IDs for visitors and have them easily accessible in your Rails application. Our ApplicationController contains something like this:

def current_browser_id
  WebServerUid.from_header(request.env['HTTP_X_NGINX_BROWSER_ID_SET'], 'brid') ||
  WebServerUid.from_header(request.env['HTTP_X_NGINX_BROWSER_ID_GOT'], 'brid') ||
  WebServerUid.from_base64(cookies['brid'])
end

…and we can now store simply current_browser_id.to_binary_string in our database for a highly-efficient storage format; we can use this class on the way out to transform it to a more human-readable format.

Using these techniques, you can have unique IDs added throughout your Web stack in a matter of a half-hour or so. Enjoy!

If you enjoyed this post, subscribe to our blog newsletter for more tips such as this one or transparently storing MongoDB BSON IDs in a RDBMS.

Subscribe to our blog