The Facebook Data Science Team announced its first software release, saying in a note on its page that it is making available an open-source version of its PlanOut tools for A/B testing and other field experiments.
Part of our job as data scientists is to inform strategic and product decisions. Does a new feature we are testing improve communication? Does having more friends on Facebook increase the value people get out of the service? While a correlation between variables may suggest a particular causal relationship, it is hard to use such data to credibly answer many of these questions because of difficult-to-adjust-for confounding factors. Furthermore, when you change the rules of the game — like launching a completely new feature — it’s often impossible to have any data at all to anticipate the effects of a future change.
Because of this, data scientists, engineers, and managers turn to randomized experiments, which are commonly referred to as “A/B tests.” Typically, A/B tests are used as a kind of “bake-off” between proposed alternatives. But experiments can also go beyond bake-offs and be used to develop generalizable knowledge that is valuable throughout the design process.
Despite the abundance of experimental practices in the Internet industry, there are few tools or standard practices for running online field experiments. And existing tools tend to focus on rolling out new features, or automatically optimizing some outcome of interest.
On PlanOut specifically, they wrote:
PlanOut gives engineers and scientists a language for defining random assignment procedures. Experiments — ranging from simple A/B tests, to factorial designs that decompose large interface changes, to more complex within-subjects designs — can be expressed with only a few lines of code. In this way, PlanOut encourages running experiments that are more akin to the kind you see in the behavioral sciences.
Logging can also be a pain-point for many experiments. Logging is often considered separate from the randomization process, which can make it difficult to keep track or even define exactly which units (e.g., user accounts) are in your experiments. This is especially problematic if engineers change experiments by adding or removing additional treatments midway through the experiment. PlanOut helps reduce these kinds of errors by automatically logging exposures and providing a way of tying outcomes to experiments. Finally, this kind of exposure logging increases the precision of experiments, which leads to lower false negatives (“Type II errors”).
A single experiment is rarely definitive. Instead, follow-on experiments might need to be run as development of a new product takes place, or as decision-makers evaluate the effects of a major product rollout. While it is fairly straightforward to run a single experiment, there are few established protocols for how follow-on experiments should be run, and how data for these experiments should be logged. PlanOut includes a management system that organizes experiments, and the parameters they manipulate, into namespaces. This allows distributed teams to work on related features, and launch follow-on experiments in a way that minimizes threats to validity.