Here at Swiftype, one of the ways we work hard to improve our product is to use various analytics tools (including our own search analytics!) to watch how people are using our website, so we know what’s working for our customers and what isn’t. We use (and love!) Mixpanel, Google Analytics, and in-house tools built around various databases and log files.
When you’re doing these kinds of analytics, giving each user — or, especially, visitor — a unique ID is paramount. For logged-in users, this is easy; you just use their user ID. For visitors, you need to generate some kind of synthetic ID and use that.
This is easy enough to do: generate a UUID or sufficiently-large random number, hand it to the visitor in a cookie that never expires, and be done with it. wipes hands Done!
…well…almost.
There’s one problem — and it’s a big deal. Most web servers have, at best, a lot of difficulty logging outbound Set-Cookie
headers; they typically only log inbound Cookie
headers from cookies the client already has. This means you won’t get this generated ID in the log line for the very first request the user makes — and this is the very most important single HTTP request they will ever make, because it tells you how they found your site and what page they landed on. You can see what’s going on below; the cookie simply isn’t present in the request that your server is logging:
Fortunately, both Apache and nginx provide modules that solve this problem quite nicely. Apache’s mod_uid and nginx’s httpuseridmodule both can generate a unique token for each visitor, issue it to them in a cookie, and add it to your HTTP log file, even on the first request.
Let’s do this with nginx
, by adding the following to our /etc/nginx/nginx.conf
(by default, nginx
compiles in support for http_userid_module
already):
userid on;
userid_name brid;
userid_path /;
userid_expires max;
proxy_set_header X-Nginx-Browser-ID-Got $uid_got;
proxy_set_header X-Nginx-Browser-ID-Set $uid_set;
This tells nginx
to generate a unique ID for all requests, to store it in a cookie named brid
, to set its expiration to the maximum time allowed, and to pass it back to our Rails site in a header. There are actually two headers, because $uid_got
will contain any inbound value for the ID ( i.e. , from a cookie the client already had) and $uid_set
will contain an outbound value for the ID ( i.e. , that which nginx
generated for this request).
Now, our situation looks like this:
We can read this value from Rails by looking at request.env['HTTP_X_NGINX_BROWSER_ID_SET']
, which will contain a value like brid=D07FA8C019EA0753B600AD0F02030303
. If we configure our nginx logs to output the contents of $uid_set
, we’ll get it there, too. On the next request, we’ll get the exact same string in request.env['HTTP_X_NGINX_BROWSER_ID_GOT']
, instead, because nginx
is telling us it’s a UID passed by the client, rather than generated by nginx
itself:
Past the first request, we’ll also get the value in cookies[:brid]
— but in a different format. nginx
sends us the ID as a hex string, but the cookie it sends to the client is Base64-encoded, so, while it represents the same value, it looks like wKh/0FMH6hkPrQC2AwMDAg==
instead.
These are two incompatible formats, and neither of them is ideal if you want to store these values in your database — you may wish to save on precious buffer cache by using the most-compact possible format of pure binary data. Further, these IDs actually have internal structure; they’re generated using things like the IP address of the server, the start time of the web server process, the process ID, and a sequence token, which can be of use.
To help with this, we’re proud to offer WebServerUid, a small Ruby class that can:
- Read any of these formats (hex, Base64, or binary);
- Return any of these formats;
- Compare and hash itself cleanly ( i.e. , <, >=, <=>, ==, and
hash
all work correctly); - Expose the internal structure of the UID (things like the IP address of the generating server, the PID of the generating process, and that process’s start time are in there);
- Generate new UIDs from scratch, using the same algorithm as Apache and
nginx
.
Using WebServerUid, it takes about one minute to configure your web server to generate unique IDs for visitors and have them easily accessible in your Rails application. Our ApplicationController
contains something like this:
def current_browser_id
WebServerUid.from_header(request.env['HTTP_X_NGINX_BROWSER_ID_SET'], 'brid') ||
WebServerUid.from_header(request.env['HTTP_X_NGINX_BROWSER_ID_GOT'], 'brid') ||
WebServerUid.from_base64(cookies['brid'])
end
…and we can now store simply current_browser_id.to_binary_string in our database for a highly-efficient storage format; we can use this class on the way out to transform it to a more human-readable format.
Using these techniques, you can have unique IDs added throughout your Web stack in a matter of a half-hour or so. Enjoy!
If you enjoyed this post, subscribe to our blog newsletter for more tips such as this one or transparently storing MongoDB BSON IDs in a RDBMS.