Here at Swiftype, one of the ways we work hard to improve our product is to use various analytics tools (including our own search analytics!) to watch how people are using our website, so we know what’s working for our customers and what isn’t. We use (and love!) Mixpanel, Google Analytics, and in-house tools built around various databases and log files.
When you’re doing these kinds of analytics, giving each user — or, especially, visitor — a unique ID is paramount. For logged-in users, this is easy; you just use their user ID. For visitors, you need to generate some kind of synthetic ID and use that.
This is easy enough to do: generate a UUID or sufficiently-large random number, hand it to the visitor in a cookie that never expires, and be done with it. wipes hands Done!
There’s one problem — and it’s a big deal. Most web servers have, at best, a lot of difficulty logging outbound
Set-Cookie headers; they typically only log inbound
Cookie headers from cookies the client already has. This means you won’t get this generated ID in the log line for the very first request the user makes — and this is the very most important single HTTP request they will ever make, because it tells you how they found your site and what page they landed on. You can see what’s going on below; the cookie simply isn’t present in the request that your server is logging:
Fortunately, both Apache and nginx provide modules that solve this problem quite nicely. Apache’s mod_uid and nginx’s httpuseridmodule both can generate a unique token for each visitor, issue it to them in a cookie, and add it to your HTTP log file, even on the first request.
Let’s do this with
nginx, by adding the following to our
/etc/nginx/nginx.conf (by default,
nginx compiles in support for
userid on; userid_name brid; userid_path /; userid_expires max; proxy_set_header X-Nginx-Browser-ID-Got $uid_got; proxy_set_header X-Nginx-Browser-ID-Set $uid_set;
nginx to generate a unique ID for all requests, to store it in a cookie named
brid, to set its expiration to the maximum time allowed, and to pass it back to our Rails site in a header. There are actually two headers, because
$uid_got will contain any inbound value for the ID ( i.e. , from a cookie the client already had) and
$uid_set will contain an outbound value for the ID ( i.e. , that which
nginx generated for this request).
Now, our situation looks like this:
We can read this value from Rails by looking at
request.env['HTTP_X_NGINX_BROWSER_ID_SET'], which will contain a value like
brid=D07FA8C019EA0753B600AD0F02030303. If we configure our nginx logs to output the contents of
$uid_set, we’ll get it there, too. On the next request, we’ll get the exact same string in
request.env['HTTP_X_NGINX_BROWSER_ID_GOT'], instead, because
nginx is telling us it’s a UID passed by the client, rather than generated by
Past the first request, we’ll also get the value in
cookies[:brid] — but in a different format.
nginx sends us the ID as a hex string, but the cookie it sends to the client is Base64-encoded, so, while it represents the same value, it looks like
These are two incompatible formats, and neither of them is ideal if you want to store these values in your database — you may wish to save on precious buffer cache by using the most-compact possible format of pure binary data. Further, these IDs actually have internal structure; they’re generated using things like the IP address of the server, the start time of the web server process, the process ID, and a sequence token, which can be of use.
To help with this, we’re proud to offer WebServerUid, a small Ruby class that can:
- Read any of these formats (hex, Base64, or binary);
- Return any of these formats;
- Compare and hash itself cleanly ( i.e. , <, >=, <=>, ==, and
hashall work correctly);
- Expose the internal structure of the UID (things like the IP address of the generating server, the PID of the generating process, and that process’s start time are in there);
- Generate new UIDs from scratch, using the same algorithm as Apache and
Using WebServerUid, it takes about one minute to configure your web server to generate unique IDs for visitors and have them easily accessible in your Rails application. Our
ApplicationController contains something like this:
def current_browser_id WebServerUid.from_header(request.env['HTTP_X_NGINX_BROWSER_ID_SET'], 'brid') || WebServerUid.from_header(request.env['HTTP_X_NGINX_BROWSER_ID_GOT'], 'brid') || WebServerUid.from_base64(cookies['brid']) end
…and we can now store simply current_browser_id.to_binary_string in our database for a highly-efficient storage format; we can use this class on the way out to transform it to a more human-readable format.
Using these techniques, you can have unique IDs added throughout your Web stack in a matter of a half-hour or so. Enjoy!