The Swiftype Blog

MetaEvents RubyGem: DRY Up, Structure, & Document Your Mixpanel Events

Here at Swiftype, we’re huge fans of Mixpanel. Mixpanel is a service that provides very easy, yet very scalable and powerful, user-centric analytics to Web and mobile applications — with just a few lines of code, you can be up and running, gaining deep insight into how your users are using your product. One of Mixpanel’s great strengths is how easy it is to get up and running; you can embed just a few lines of JavaScript and be analyzing your application in a few minutes. For example:

# app/views/layouts/application.html.erb:# app/views/pricing/show.html.erb:# app/views/pricing/paid.html.erb:

These few little snippets of code will allow you to track user progress through a paid-plan flow; you’ll be able to see which users are looking at which plans, breaking them down by their current plan and/or which plan they looked at, track conversion rates through the flow, and so on.

Mixpanel is centered around events (like User Looked at Pricing Plan or User Signed Up for Pricing Plan), which are emitted when you call mixpanel.track (and which are the basic unit of pricing for Mixpanel), andproperties (like currentPlan, newPlan, or oldPlan), which are included with events and which are free. One of the keys to a high-quality Mixpanel integration is to pass lots and lots of properties: the more properties you pass, the more ways you’ll have to analyze your data. This is particularly helpful when applied in a speculative fashion: if you work to pass lots of data now, then the number of historical analyses you’ll be able to do in a month, six months, or a year goes up greatly. When you’re staring at the data, trying to figure out what’s going on, it’s so much nicer to think “ah, let me look at X!” than to think “gee, I really wish I’d measured X — maybe if I add it now, I can answer this question in another two months”.

Expanding the implementation

Let’s expand our implementation. We might want to add a few more properties about the user; Mixpanel lets us do this using something called super properties (using the register call), which are passed with every event once set:

# app/views/layouts/application.html.erb:

And let’s pass considerably more properties on each event. The price of a plan is a critical factor, as are its capabilities; we definitely want to record those as well as the plan name, as we might change a plan’s price or capabilities over time. We’ll also throw in plan IDs, since the names might change:

# app/views/pricing/show.html.erb:# app/views/pricing/paid.html.erb:

From a Mixpanel point of view, this is a lot more powerful: we can now do calculations based on differences in price, slice and dice by yearly and monthly prices, support type, max users, and so on. We’ll be able to answer questions like “what was the effect in upgrades when we increased the maximum number of users from 3 to 5 on the middle-level plan, for those users who were still on the basic plan?”. We’ll be able to be clear and consistent in our data, and observe historical prices and capabilities of plans, even if we change them later.

…And now, the problems ensue

However, from a code point of view, this is the leading edge of turning into what we technical folks call a big ol’ mess. Our code is verbose, it isn’t DRY at all, and, as a result, it’s very error-prone. (How many of you noticed that I accidentally passed the name of the old plan twice in the second example, rather than the ID and the name?) And this is with just two events — can you imagine what it’s going to look like when we have twenty, or fifty?

Further, maintaining this code in the long run is going to be a nightmare. If we add another property to plans that we’re interested in monitoring, we have to go update every single event and add that property, or we’ll have inconsistent data. If we want to change property names — again, we have to go update every single call site. We have two events right now; in a real production system, we might have thirty or fifty. Yuck.

Not quite as obvious, but perhaps even more important, is the fact that having a clean record of exactly what an event means — and changes that might affect that event! — is critical for correct analysis. For example, if we decide to show pricing plans to everybody directly on their home page, the number of events for User Looked at Pricing Plan is going to skyrocket, and so the conversion rates to User Signed Up for Pricing Plan are going to plummet. Sure,you might be able to remember this, right now — but in another year, when you have six more people looking at it, is everybody going to remember all of the fifteen different significant changes you made over that year when looking at your results? There has to be a better way, right?

It may help to consider, on a more theoretical level, what’s happening when you use Mixpanel effectively. The properties you pass are effectively a snapshot of various parts of your database; the user is likely the single most important part of that snapshot, but there are plenty of other objects that contribute, too. That might be a database row representing a user-to-user communication, a taxi ride, a stay overnight, or a search engine, depending on your domain, but essentially you are reflecting a denormalized snapshot of a chunk of your database to Mixpanel with each event — this is how it can be so effective for you. When you consider it this way, it becomes even more clear why adding some structure and mechanism can be of huge advantage: with the right framework, you ought to be able to gather that database information and pass it very easily, almost implicitly, rather than having to maintain huge lists of properties all over your application.

What About Super Properties? Mixpanel’s “super properties”, while incredibly useful, can also be problematic. The implementation of these is straightforward: Mixpanel’s library issues a permanent cookie to your end user that records the current set of “super properties” that is registered; when firing an event client-side, it simply merges these properties in with any specified in the event. This is a really simple, useful, and powerful model, and is great when you’re starting out. However, there are several caveats: perfect updating is required — if you change data server-side and forget to re-call Mixpanel.register, that data will be perpetually incorrect in Mixpanel; inaccessible server-side — if you fire events server-side (and, in our experience, you inevitably will have to at some point, like email generation or background tasks), you simply won’t have access to that data at all; easy to tamper with — it’s really easy for users to change their own super properties. As you’ll see below, our MetaEvents library replaces “super properties” with implicit properties, which largely eliminate all these issues.

Introducing MetaEvents

We’d love to introduce you to Swiftype’s solution for all these problems: the MetaEvents RubyGem. Let’s take a look at what our code from the above example would look like using MetaEvents. First, we define methods on some of our models that convert them to properties, and set up MetaEvents in our ApplicationController:

# app/models/user.rb
  def to_event_properties
    { :signup_date => created_at, :account_type => account_type,
       :signin_count => signin_count }
  end

# app/models/plan.rb
  def to_event_properties
    { :id => id, :name => name, :monthly_price => monthly_price,
       :yearly_price => yearly_price, :max_users => max_users,
       :support_type => support_type }
  end

# app/controllers/application_controller.rb
  def meta_events_tracker
    @meta_events_tracker ||= MetaEvents::Tracker.new(
      current_user.id, request.remote_ip, :current_user => current_user)
  end

And now we can fire events from our controllers just this easily:

#app/controllers/plan.rb
  def show
    @plan = Plan.find(params[:id])
    meta_events_tracker.event!(:plan, :show, :plan => @plan)
  end

  def pay
    # ...
    meta_events_tracker.event!(:plan, :paid,
      :old_plan => @old_plan, :new_plan => @new_plan)
  end

(Here, we’re firing events server-side; we’ve found this to be more flexible and consistent than client-side events, but it’s just as easy to fire the events client-side, if you prefer.)

Several interesting things are happening here:

  • MetaEvents allows us to pass implicit properties on every single request (the MetaEvents::Tracker.new call); this is like Mixpanel’s “super properties”, only more reliable (because they’re guaranteed always up-to-date) and in your full control;
  • MetaEvents lets us pass objects as properties; it expands them using their #to_event_properties method, and integrates them into events, prefixing them with whatever key you passed them in with;
  • MetaEvents provides a flexible model for firing events server-side; it’s easy to fire them asynchronously using Resque or a similar system.

We’re still passing through every bit as much data as before, only now it’s completely DRY. Adding properties is a piece of cake, properties will be completely consistent across events, and 100% up-to-date information about the current user will be passed on every single event.

Finally, let’s look at what config/meta_events.rb, which defines our events, might look like a year from now:

category :plan do
  event :show, "2014-03-03", "user looks at a pricing plan" do
    note "2014-06-14", "pburkart", "we moved plan display onto the dashboard...vast increase in displays"
    note "2014-07-11", "lweyand", "moved off dashboard unless a user was over plan limits"
    note "2014-10-17", "mvellez", "more aggressive visual display of plan on dashboard"
    note "2014-12-09", "lweyand", "holiday promotion lightbox added"
    note "2015-01-05", "mvellez", "holiday promotion lightbox removed"
  end

event :paid, "2014-03-03", "user pays for a new pricing plan" do
  note "2014-08-09", "mvellez", "removed street address from payment form -- turns out we don't need it"
  note "2014-08-17", "lweyand", "added PayPal support"
  note "2014-11-13", "pburkart", "upped free trial from 30 to 60 days"
  note "2014-12-14", "pburkart", "reduced free trial back to 30...turns out 60 didn't make any difference"
  end
end

Not only does this file become a canonical record of what events you’re firing (as adding an event here is required before you can fire it), it also becomes a historical record of changes to your events. This tree is even exposed via MetaEvents, so you could easily turn this into an HTML report for the pointy-haired bosses around.

MetaEvents, defined

MetaEvents is a RubyGem that provides a framework for structuring your events, efficiently exposing large numbers of properties, adding implicit properties based on the currently logged-in user and browser, and firing events either server-side or client-side. When used in a large-scale Ruby application:

  • You’ll be able to understand your events — forever. MetaEvents provides a Ruby-based DSL to document your events; you will have a permanent record of what each event is for, when it was introduced, and any changes you’ve made. This alone will probably make your product managers and business folks very happy, if experience is any guide.
  • You’ll pass far more properties, and they’ll be consistent across all events. MetaEvents encourages you to define #to_event_properties methods on your models, and then pass entire models to its methods; it then automatically merges all properties of those models into the event. Now, when you think “hey, I wonder if…”, you’re much more likely to already have that data in Mixpanel for weeks or months than to have to add it now.
  • You’ll capture environmental/contextual data automatically. MetaEvents lets you define implicit properties, which are fired with every single event and are typically properties from the currently-logged-in user, browser, account, or so on. Better than Mixpanel’s “super properties” because they come from your database right now and work server-side, these further increase the axes along which you can do analysis.
  • You’ll still reap these benefits when firing events from the client. MetaEvents provides very easy ways to define events server-side and fire them client-side, either automatically on links or via any mechanism you choose.

Implementing MetaEvents doesn’t take long at all, and it can happily coexist with your existing Mixpanel code — you should be up and running within an hour, tops, and be able to expand rapidly from there.

Get started with MetaEvents now!

If you need to track users who aren’t logged in (and who doesn’t?), you might also want to take a look at our WebServerUid RubyGem, which provides an easy way to generate a unique browser ID for visitors.

MetaEvents and a predecessor system have been used in two different large-scale Rails web applications, providing detailed analysis at scale: over 500,000 events per day, with between 20-50 properties each, all with extremely little maintenance or overhead. Although its release is recent, the ideas have proved themselves over the course of several years. It is also thoroughly tested and documented; we think you’ll find the code easy to read and well-structured.

WebServerUid: Easy Unique Browser IDs for Rails & Better Analytics

Here at Swiftype, one of the ways we work hard to improve our product is to use various analytics tools (including our own search analytics!) to watch how people are using our website, so we know what’s working for our customers and what isn’t. We use (and love!) Mixpanel, Google Analytics, and in-house tools built around various databases and log files.

When you’re doing these kinds of analytics, giving each user — or, especially, visitor — a unique ID is paramount. For logged-in users, this is easy; you just use their user ID. For visitors, you need to generate some kind of synthetic ID and use that.

This is easy enough to do: generate a UUID or sufficiently-large random number, hand it to the visitor in a cookie that never expires, and be done with it. wipes hands Done!

…well…almost.

There’s one problem — and it’s a big deal. Most web servers have, at best, a lot of difficulty logging outbound Set-Cookie headers; they typically only log inbound Cookie headers from cookies the client already has. This means you won’t get this generated ID in the log line for the very first request the user makes — and this is the very most important single HTTP request they will ever make, because it tells you how they found your site and what page they landed on. You can see what’s going on below; the cookie simply isn’t present in the request that your server is logging:
Web_Server_UID_01_No_Cookie
Fortunately, both Apache and nginx provide modules that solve this problem quite nicely. Apache’s mod_uid and nginx’s httpuseridmodule both can generate a unique token for each visitor, issue it to them in a cookie, and add it to your HTTP log file, even on the first request.

Let’s do this with nginx, by adding the following to our /etc/nginx/nginx.conf (by default, nginx compiles in support for http_userid_module already):

userid on;
userid_name brid;
userid_path /;
userid_expires max;

proxy_set_header X-Nginx-Browser-ID-Got $uid_got;
proxy_set_header X-Nginx-Browser-ID-Set $uid_set;

This tells nginx to generate a unique ID for all requests, to store it in a cookie named brid, to set its expiration to the maximum time allowed, and to pass it back to our Rails site in a header. There are actually two headers, because $uid_got will contain any inbound value for the ID ( i.e. , from a cookie the client already had) and $uid_set will contain an outbound value for the ID ( i.e. , that which nginx generated for this request).

Now, our situation looks like this:

Web_Server_UID_02_Cookie_Set

We can read this value from Rails by looking at request.env['HTTP_X_NGINX_BROWSER_ID_SET'], which will contain a value like brid=D07FA8C019EA0753B600AD0F02030303. If we configure our nginx logs to output the contents of $uid_set, we’ll get it there, too. On the next request, we’ll get the exact same string in request.env['HTTP_X_NGINX_BROWSER_ID_GOT'], instead, because nginx is telling us it’s a UID passed by the client, rather than generated by nginx itself:

Web_Server_UID_03_Cookie_Got

Past the first request, we’ll also get the value in cookies[:brid] — but in a different format. nginx sends us the ID as a hex string, but the cookie it sends to the client is Base64-encoded, so, while it represents the same value, it looks like wKh/0FMH6hkPrQC2AwMDAg== instead.

These are two incompatible formats, and neither of them is ideal if you want to store these values in your database — you may wish to save on precious buffer cache by using the most-compact possible format of pure binary data. Further, these IDs actually have internal structure; they’re generated using things like the IP address of the server, the start time of the web server process, the process ID, and a sequence token, which can be of use.

To help with this, we’re proud to offer WebServerUid, a small Ruby class that can:

  • Read any of these formats (hex, Base64, or binary);
  • Return any of these formats;
  • Compare and hash itself cleanly ( i.e. , <, >=, <=>, ==, and hash all work correctly);
  • Expose the internal structure of the UID (things like the IP address of the generating server, the PID of the generating process, and that process’s start time are in there);
  • Generate new UIDs from scratch, using the same algorithm as Apache and nginx.

Using WebServerUid, it takes about one minute to configure your web server to generate unique IDs for visitors and have them easily accessible in your Rails application. Our ApplicationController contains something like this:

def current_browser_id
  WebServerUid.from_header(request.env['HTTP_X_NGINX_BROWSER_ID_SET'], 'brid') ||
  WebServerUid.from_header(request.env['HTTP_X_NGINX_BROWSER_ID_GOT'], 'brid') ||
  WebServerUid.from_base64(cookies['brid'])
end

…and we can now store simply current_browser_id.to_binary_string in our database for a highly-efficient storage format; we can use this class on the way out to transform it to a more human-readable format.

Using these techniques, you can have unique IDs added throughout your Web stack in a matter of a half-hour or so. Enjoy!

If you enjoyed this post, subscribe to our blog newsletter for more tips such as this one or transparently storing MongoDB BSON IDs in a RDBMS.

ObjectIdColumns: Transparently Store MongoDB BSON IDs in a RDBMS

Here at Swiftype, we use both MongoDB and MySQL to store some of our core metadata — not search indexes themselves, but users, accounts, search engines, and so on. As we’ve migrated data from MongoDB to MySQL, we’ve found ourselves needing to store the primary keys of MongoDB documents in MySQL.

While it’s possible to use more-or-less arbitrary data in MongoDB as your _id, very, very frequently you will simply use MongoDB’s built-in ObjectId type. This is a data type similar in concept to a UUID; it can be generated on any machine at any time, and the chance it will be globally-unique is still extremely high. Some relational databases offer native support for UUIDs; we thought, why shouldn’t we teach Rails how to get as close to that ideal as possible with ObjectIds, too?

The result has been our objectid_columns RubyGem, which we are proud to release as open source under the MIT license. Using ObjectIdColumns, you can store MongoDB ObjectId values as a CHAR(24) or VARCHAR(24) (which stores the hexadecimal representation of the ObjectId in your database), or as a BINARY(12), which stores an efficient-as-possible binary representation of the ObjectId value in your database.

No matter how you choose to store this data, it’s automatically exposed from your ActiveRecord models as an instance of the bson gem’s BSON::ObjectId class, or the moped gem’s Moped::BSON::ObjectId class. (ObjectIdColumns is compatible with both equally; the two are extremely similar.)

my_model = MyModel.find(...)
my_model.my_oid # => BSON::ObjectId('52eab2cf78161f1314000001')

You can assign values as an instance of either of these classes, or as a String representation of an ObjectId — in either hex or pure-binary forms — and it will automatically translate for you:

my_model.my_oid = BSON::ObjectId.new # OK
my_model.my_oid = "52eab32878161f1314000002" # OK
my_model.my_oid = "R\xEA\xB2\xCFx\x16\x1F\x13\x14\x00\x00\x01" # OK

ObjectIdColumns even transparently supports queries; the following will all “just work”:

MyModel.where(:my_oid => BSON::ObjectId('52eab2cf78161f1314000001'))
MyModel.where(:my_oid => '52eab2cf78161f1314000001')
MyModel.where(:my_oid => 'R\xEA\xB2\xCFx\x16\x1F\x13\x14\x00\x00\x01'))

Enjoy! Head on over to the objectid_columns GitHub page for more details, or just drop gem ‘objectid_columns’in your Gemfile and go for it!

If you enjoyed the tips in this tutorial, make sure to bookmark our blog and subscribe for more announcements like our new Swiftype Ruby Gem.

Our Cloud Stack at Swiftype

Swiftype site search was featured as LeanStack’s service of the week. As part of that I wrote a guest blog post about how Swiftype uses cloud services to run our business.

“Implementing a better product with less hassle is really only half the advantage of using a service like ours. The other half — which doesn’t seem to get as much marketing play — is that by leveraging the product of a company dedicated to a single, specific technology, you realize the gains of having a full-time team of domain experts dedicated to improving your search feature, without assuming any of the cost. At Swiftype we spend all of our time thinking about, developing, and iterating on search, and every time we ship an improvement, all of our customers reap the benefits instantly. Our experience has shown that at most companies it can be a full-time job just maintaining an internal search system, much less improving it over time. When search isn’t a core competency of your company, we believe you’re better off letting us take care of the details. And of course the same philosophy applies to our company as well, which is why we leverage so many existing cloud-based services in our daily operations. Anywhere that we can save time and resources using a product that another company focuses their full effort on delivering is a win for us, because it allows us to spend our resources on what we do best — building great search software.”

Read the post to learn more about our cloud stack and the services we use.

If you liked this post, please remember to bookmark our blog and subscribe to our newsletter. We’ll be posting announcements and more from the Swiftype team, as well as our friends and partners who power their search with Swiftype, such as Laughing Squid.

How We Use Swiftype to Understand our Customers

Paul Graham’s advice to entrepreneurs is simple – “Make something people want.” Make being the easy part and what people want being the much harder part. In the startup world, there are several interesting techniques for figuring out what people want. Customer Development, User Surveys, Crowdsourced idea generation etc. However my recent favorite is Swiftype’s weekly analytics email. Let me explain.

Quickly See What People Are Searching For

The following screenshot is from Swiftype’s sample report:

Top searches by number of queries

The first section of the email let’s you see at a glance what your users are searching for. We use Swiftype to power our documentation search, so our search terms tell us what our users most need help with. The top search for us right now is “email.” This make senses because our users typically want to know how to setup email. The top few keywords gave us a good sense of what our users are looking for right after signing up and have helped us shape up our product tour.

Figure out What New Stuff to Build

The second section of the email is more interesting. You can see which searches returned no results at all:

Top searches with No Results

In our case, the missing searches could mean one of the two things: * A feature/functionality that we have but which is missing documentation. * A feature that we don’t have.

For us it’s mostly the latter. For example the top result for us in this category is “reports”, since we don’t have reporting yet (our early adopters did not care for it but we are working on it now). Using this feature we also realized that people are looking for integrations like Pivotal, JIRA etc. Based on this, we decided to work on a hosted app platform that we will be rolling out in a few weeks.

Either way, we learn exactly where we need to improve. It could be improvements to an existing feature (adding documentation, improving the UX) or ideas for new features. Used with other techniques like user interviews and analytics, Swiftype has really helped us improve our app. In the future, we plan on using Swiftype to power our app directory search so we can find out ideas for new apps. The same technique can be applied to your marketing site as well.

Sitemap.xml Support for Swiftype

At Swiftype we’re always working on new ways to improve the quality of the crawl of your website, and today we’re announcing Swiftype crawler support for the Sitemap.xml protocol.

The Sitemap.xml protocol is a well-documented and widely implemented standard for specifying exactly which set of URLs you would like web crawlers to index on your website, and if your website supplies a sitemap.xml file to our crawler we will dutifully follow your specifications as our crawler builds a search index for your website.

If you aren’t familiar with Sitemap.xml files, we’ll take you through a quick tutorial here, and there is additional information in our documentation section as well as the official protocol page.

To get started, create a simple sitemap.xml file. An example sitemap.xml that specifies 3 URLs might look as follows:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>http://www.yourdomain.com/</loc>
  </url>
  <url>
    <loc>http://www.yourdomain.com/faq/</loc>
  </url>
  <url>
    <loc>http://www.yourdomain.com/about/</loc>
  </url>
</urlset>

Next, you’ll put the sitemap.xml file on your web server at a location that is accessible by our crawler. Many sites place the sitemap at the root of the domain (i.e. http://www.yourdomain.com/sitemap.xml), but any location is fine. Whatever location you choose, you should specify the location in your Robots.txt file as follows:

User-agent: *
Sitemap: http://www.yourdomain.com/sitemap.xml

If you’re unfamiliar with the Robots.txt file, you can find more information at the official Web Robots page.

Once your robots.txt file is updated and your sitemap.xml file has been uploaded you’re finished. The next time the Swiftype crawler visits your website we’ll recognize your sitemap.xml file follow the links you specify.

As always, if you’re having trouble or want more information, feel free to get in touch. Also, don’t forget to follow the blog so you don’t miss out on great content from our friends like Bob Hiler from Mixergy.

Exclude Unwanted Content with Swiftype

Are there parts of your site you won’t want indexed? We’ve got you covered.

To exclude parts of your website by path, you can use Path Exclusions. You can exclude pages starting with, containing, or ending with the text you specify. For advanced users, we also support regular expression matches.

To add a path exclusion, click on a crawler-based engine, then select the Domains tab, then the domain to which you want to add path exclusions.

 

As you type your exclusion, we’ll show you a sample of the pages that will be removed from the index.

Once you’re happy with the exclusions, hit the Recrawl button to put them into effect.

On an individual page, you can exclude content (for example, your header or footer) by adding the data-swiftype-index attribute set to false.

Here’s an example:

An example page with content exclusion
  

 

This is your page content, which will be indexed by the Swiftype crawler.

This content will be indexed, since it isn’t surrounded by an excluded tag.

 

By combining Path Exclusions and Content Exclusion, you can precisely control how your website is indexed by Swiftype.

As always, if you have trouble, please reach out.

Announcing Swiftype: Modern Search for Sites and Apps

Swiftype founders Matt Riley and Quin Hoxie

Today we are announcing Swiftype, the best way to add search to your site or app. Quin and I have long been frustrated by how hard it is to add good search to web sites and apps, so we decided to do something about it.

Swiftype has an API you can use to index arbitrary content, but we also can crawl your site so you can get started in minutes. We’ll create a search engine literally while you watch. You can install it on your site using a simple JavaScript embed and your users will enjoy great, fast search results and autocompletion. In addition, Swiftype lets you customize search results with drag-and-drop and gives you detailed analytics about the queries your users are making.

Swiftype is already powering search for customers like Twilio, TwitchTV, Listia, and Fastly.

If you’d like to hear more or discuss adding Swiftype search to your site or app, please reach out. Also, be sure to keep an eye on the Swiftype blog – we’ll be releasing frequent updates.

You can read more about our launch on TechCrunch, or give us a try.

Subscribe to our blog