Everybody Lies
Published:
Everybody Lies
By: Seth Stephens-Davidowitz
Allison’s Rating:
Everybody Lies by Seth Stephens-Davidowitz uses humor and personal exposition to analyze current social issues with data science methodologies. The purpose of this book is not to build a foundation in the mathematics or statistics behind the data, but instead unpacks how to identify where big data can help improve decision making for all issues, big and small. Stephens-Davidowitz deep dives into Facebook data, google searches for porn, and baseball statistics. While these particular outcomes of data science may not be meaningful or produce a profit for businesses, understanding the fundamental powers of big data is paramount. Throughout the chapters Stephens-Davidowitz’s data driven passion for solving puzzles rings true, providing the reader with an outline what he constructs as the four powers of Big Data.
Power 1: New Types of Data
The first power identified by Stephens-Davidowitz is that data is emerging from sources all around us. This new data (data reimagined) provides a new perspective on old problems. The primary focus of new data in Everybody Lies is digital data, gathered from social media, mouse clicks and page views. This new data equips researchers with the necessary tools to conduct quick litmus tests with cheap, easy to collect data. Thinking about how this can be applied in companies, it is akin to the issue Stephens-Davidowitz dissected regarding slow government processing. Companies, particularly those who were late adopters of Big Data are bogged down with legacy systems which cause data latency issues, competing priorities, disparate data sources and face a general lack of “data know how” to harness this digital data. With these obstacles in mind success is not measured by increase in data collected, but rather by structuring and looking at the data already collected in a new, iterative, way.
Power 2: Providing Honest Data
Stephens-Davidowitz indicates that the research answers are often surprising, or even at times counter intuitive. Companies that may be hesitant to trust the output/recommendation of a model will struggle to accept one that flies in the face of past tradition. For this reason, it is elementary that a company allows the data to tell a story and cultivate a culture that trusts in the data collected. This foundation of trust for data sources is nonnegotiable when seeking to produce sound data driven decisions, and even Stephens-Davidowitz admits this caveat in his Facebook research and analysis (pg 151).
For some companies, this could be a dramatic change from previous behavior of fitting the data to an already decided upon narrative. Stephens-Davidowitz addresses this issue by allowing data to, “lead us from problems to solutions” (pg 162) and allowing the data to kindle curiosity by showcasing multiple perspectives and telling a powerful data fueled story. These companies will first require a shift in perspective towards big data. When there are strong personalities in a company with years of “tribal knowledge” it is important to strike a balance between this concept of “small data” to complement, and fill the gaps (pg 256) of decisions derived from big data.
One challenge I had with Stephens-Davidowitz perspective of collecting new digital data is the claim that, “you just need to know that something works, not why” (pg. 71). When this notion was introduced I thought of the ethical implications (albeit addressed in later chapters) and the dangers of blinding trusting data without first doing a cognitive check of potential bias in the data set. In his examples of Strawberry pop tarts and storms there may not be the potential for harmful bias, but when this “don’t ask questions” mindset is applied to models for loan applications (pg 258) or jail time (pg 235) there could be serious ethical implications. It becomes a delicate balance of trusting the model and relying on human interference when necessary.
Power 3: Zooming in on big data sets
The third power of big data is the ability to establish clarity on an issue amidst a sea of big data. This dissection of data is coined by Stephens-Davidowitz as “zooming in” and helps gain new insights from small portions of big data sets (pg 171). Through paying particular attention to the granularity of data that can be collected through big data, particularly digitally based data, researchers can solve seemly complex problems with ease.
One challenge with the simple notion to “zoom in” on data the messy and often disparate state of the raw data. Companies are faced with conducting cost benefit analysis for cleaning & structuring the data as compared to the value gained through analysis. Stephens-Davidowitz suggests that instead of tackling the entire data set to zoom in, it is possible to focus on key variables and explore their relationship to the problem at hand. One question this spurred for myself is how do companies know when to stop cleaning or collecting data and to work with the data already collected. Working on a team that has access to Google Analytics to track user engagement with our website it is a common request to track clicks or page views of particular activities. However, when is enough? When does the development cost to track an action outweigh the return on investment? One root cause of this issue is attempting to build solutions for problems that the business believes to exist, instead of listening to what the data tells us and making decisions for engineering work from the data narrative. Stephens-Davidowitz would encourage companies who struggle with this dichotomy to “zoom in” and dive into the data.
Power 4: Continuous Causal Experiments
The fourth and final power of big data as explained by Stephens-Davidowitz is the newfound ability to conduct numerous inexpensive causal experiments. These rapid, randomized, controlled experiments expose causality, as opposed to correlation (pg 208). Working on an e-commerce team this tenet of big data was of particular interest to me, especially since my company’s uses the same rapid testing tool Optimizely. In the digital age, there are little to no costs associated with conducting A/B tests that can directly increase revenue for businesses. Companies must understand a balance for this casual testing. This is because “testing literally everything” (pg 217) works with digital mouse clicks, but not with understanding the biomechanics of a shoe or a popularity of shoe colors two years from now.
Closing Thoughts
I “sooooo” (pg 83) enjoyed reading Everybody Lies. The social questions addressed by Stephens-Davidowitz were topical and at times the findings were hilarious (and as Stephens-Davidowitz warned, completely counter intuitive to my initial assumptions!). I wish I had read this book prior to my Data Analytics and Visualization classes. I found myself sharing with my family and friends random facts I am now armed with - and joked to go stock up on strawberry pop tarts as a storm rolled through. Most importantly, Stephens-Davidowitz sparked in me a curiosity to see if data can help solve my problems. It is this thirst for answers that will allow researchers to look at data with new perspectives and help businesses to use data, coupled with intuition, to solve the worlds seemingly overtly complex and insurmountable problems.