EXPERIMENTATION WORKS: The Surprising Power of Business Experiments

Stefan Thomke

The following is an excerpt from EXPERIMENTATION WORKS: The Surprising Power of Business Experiments by Stefan Thomke (Harvard Business Review Press). In it, he argues that when it comes to improving customer experiences, trying out new business models, or developing new products, even the most experienced managers often get it wrong. They discover that intuition, experience and big data alone don’t work. What does work? Running disciplined business experiments. The book shows what constitutes best experimentation practice, illustrates how that works at leading companies, and answers some fundamental questions. What makes a good experiment? How can testing work in online and brick-and-mortar businesses? In B2B and B2C? How do companies build an experimentation culture? 

In 2015, IBM wasn’t much of an experimentation organization. The company’s IT function offered testing services, but the costly experiments (thousands of dollars per test) were charged back to business units and had to follow a rigid process. Service capacity was limited to just one testing specialist, who was also the gatekeeper; many proposed experiments weren’t accepted unless he felt that they were strong enough candidates for a “win.” The result: the company ran only ninety-seven tests in 2015. With a one-specialist bottleneck, no user-friendly testing tools, and low awareness in business groups, this low number should not have been surprising. And if the goal was to limit testing on IBM’s business customers—to keep this capability out of the hands of marketing groups—some managers certainly didn’t mind the small scale. The problem, of course, was that it’s hard to find many princes if you’re kissing only ninety-seven frogs per year.

This all changed when IBM’s testing philosophy changed from central control to democratization, in spite of objections from the CIO’s office. Ari Sheinkin, vice president of marketing analytics, with the support from Michelle Peluso, IBM’s new CMO, took over business experimentation. Sheinkin declared: “Making decisions through real-time feedback is my dream of how an organization should be run, and large-scale testing was at the core.” This meant convincing and empowering over fifty-five hundred marketers worldwide to run their own tests. To get started, Sheinkin’s team selected scalable, easy-to-use testing tools, introduced a framework for disciplined experiments, and made online tests free for all business groups—no more charge-backs. (Marketing analytics paid all support and software license fees from a central budget.) A Center of Excellence, which grew to twelve people by 2018, supported marketers in all aspects of designing and running experiments, making testing easy. Sheinkin explained: “Our communication made it clear that this was a new way of working, not just another path to get work done.”

Even with extra resources, organizational changes, and new tools, widening the scope of involvement required creative interventions. To get marketing units across all geographies to run their first experiment, IBM ran a “testing blitz” during which a total of thirty online experiments had to be run in thirty days. Units identified leaders who later played an important role in rolling out testing to more groups. Modifications to web pages were kept simple and structured: color changes, headline text, layout of buttons. Even though most tests didn’t result in any statistically significant improvements, a few were spectacularly successful. Because web pages had never been optimized with scientific rigor, key performance indicators jumped by more than 100 percent. Some groups worried that their landing pages didn’t have enough customer traffic to run meaningful tests. This led to a focus on the most important landing pages and a consolidation of web pages with low traffic. It also raised important questions: Did IBM really need millions of web pages if most were rarely visited?

To reorient IBM’s culture toward experimentation, management followed a three-pronged approach: rituals, repetition, and recognition. Interventions included quarterly contests for the most innovative or most scalable experiments. Winners received publicity in a company newsletter and trips to professional conferences, where they could listen and speak to thought leaders and other experimentation practitioners. The growing testing community at IBM could also follow blogs, attend office hours with questions, and receive training at all levels of expertise. In a nutshell, corporate support was available to anyone interested in running experiments.

But not all interventions involved rewards; at times, IBM also had to change its policies to drive behavior. So, for example, marketing units were told that they could no longer draw from corporate budgets unless an experimentation plan was in place. Even when it came to spending their own ad budgets, marketers were strongly encouraged to start with such a plan. The policy changes were the result of an important insight: one-off experiments, even if there were many, often lacked follow-up action and iteration. An experimentation plan required a more holistic approach that considered how hypotheses informed each other, the sites on which they were run, goals and metrics linked to business outcomes, projected sample sizes, implementation steps, and so on. Most of all, a good plan allowed for iterations and led to the exploration and optimization of much bolder themes, such as “the introduction of emotional elements in online business-to-business interactions.” Not all experiments were about establishing the causal relationship between treatments and performance variables. Some experiments added value by moving teams away from a local optimum—the best solution in a small neighborhood of possibilities—and allowed for fresh approaches to improving customer experiences, or by appealing to a new breed of customers (e.g., young people who had never experienced IBM).

IBM’s efforts to democratize experimentation worked. The firm rolled out the new testing platform to twenty-three business units in 170 countries. In 2017, the company ran a total of 782 tests, involving nearly a quarter of marketers worldwide (see table 6-1). Some tests now involved personalization for customer experiences. As IBM had gotten better at running experiments with scientific precision and collecting massive data on individual customers, it could now test tailored experiences for smaller—and more homogenous—customer groups.

In 2018, the number of tests surged to 2,822, and by then, hundreds of marketers had become serious testers. Interest from other business groups grew as well: 12 percent of experiments originated outside marketing units. But, according to Sheinkin, more work is needed: “For many marketers, experimentation is still number three on their list of priorities. The top two items are usually day-to-day responsibilities, such as preparing for the next big meeting. Running experiments needs to become number one.” An ongoing cultural challenge is for people to develop a truly experimental mindset, which is more than just running experiments encouraged by its senior leaders. IBM found that possibly the hardest group to get on board was middle management, whose traditional role of translating executive direction into action was upended by this new way of managing—that is, following the scientific method, whenever possible and in real-time, and making decisions by experiments.

Reprinted by permission of Harvard Business Review Press. Excerpted from EXPERIMENTATION WORKS: The Surprising Power of Business Experiments by Stefan H. Thomke. Copyright 2020 Stefan H. Thomke. All rights reserved.

Stefan Thomke joined the Technology and Operations Management division at Harvard Business School (HBS) in 1995, on completion of his doctoral studies at MIT. Thomke's research is focused on the process, economics, and management of experimentation and testing in the context of innovation management. As a leading faculty member at HBS, Thomke has worked with practitioners across the world on the management of innovation and development of product, process, and technology.

Thomke's highly cited research has been published as research articles, case studies and notes extensively in books and leading journals such as California Management Review, Harvard Business Review, Journal of Product Innovation Management, Management Science, Organization Science, Research Policy, Sloan Management Review, Strategic Management Journal and Scientific American.

Your comment will appear on this page upon moderator's approval