The purpose of this presentation assignment is to research emerging technologies that impact businesses and society and how individuals, businesses, and government organizations go about protecting us

Does Big Data Provide the Answer? Case Study

Today’s companies are dealing with an avalanche of data from social media, search, and sensors, as well as from traditional sources. According to one estimate, 2.5 quintillion bytes of data per day are generated around the world. Making sense of “big data” to improve decision making and business performance has become one of the primary opportunities for organizations of all shapes and sizes, but it also represents big challenges.

Businesses such as Amazon, YouTube, and Spotify have flourished by analyzing the big data they collect about customer interests and purchases to create millions of personalized recommendations for books, films, and music. A number of online services analyze big data to help consumers, including services for finding the lowest price on autos, computers, mobile phone plans, clothing, airfare, hotel rooms, and many other types of goods and services. Big data is also providing benefits in sports (see the Interactive Session on Management), education, science, health care, and law enforcement.

Healthcare companies are currently analyzing big data to determine the most effective and economical treatments for chronic illnesses and common diseases and provide personalized care recommendations to patients. For example, the state of Rhode Island has been using InterSystems’ HealthShare Active Analytics tool to collect and analyze patient data on a statewide level. The state’s Quality Institute found that about 10 percent of major lab tests performed in over 25 percent of the state’s population were medically unnecessary—a discovery that has since helped Rhode Island tighten spending as well as improve quality of care. Big data analytics are helping researchers pinpoint how variations among patients and treatments influences health outcomes. For instance, big data’s granularity could help experts detect and diagnose multiple variants of asthma, pointing physicians to the precise treatment plan called for by each patient’s unique case.

There are limits to using big data. A number of companies have rushed to start big data projects without first establishing a business goal for this new information or key performance metrics to measure success. Swimming in numbers doesn’t necessarily mean that the right information is being collected or that people will make smarter decisions. Experts in big data analysis believe that too many companies, seduced by the promise of big data, jump into big data projects with nothing to show for their efforts. They start amassing mountains of data with no clear objective or understanding of exactly how analyzing big data will achieve their goal or what questions they are trying to answer. Organizations also won’t benefit from big data that has not been properly cleansed, organized, and managed—think data quality.

Big Data does not always reflect emotions or intuitive feelings. For example, when LEGO faced bankruptcy in 2002 to 2003, the company used big data to determine that Millennials have short attention spans and easily get bored. The message from the data led LEGO to de-emphasize their small iconic bricks in favor of large simplistic building blocks. This change only accelerated LEGO’s decline, so the company decided to go into consumers’ homes to try and reconnect with once-loyal customers. After meeting with an 11-year-old German boy, LEGO discovered that for children, playing and showing mastery in something were more valuable than receiving instant gratification. LEGO then pivoted again to emerge after its successful 2014 movie into the world’s largest toy maker. Patterns and trends can sometimes be misleading.

Huge volumes of data do not necessarily provide more reliable insights. Sometimes the data being analyzed are not a truly representative sample of the data required. For example, election pollsters in the United States have struggled to obtain representative samples of the population because a majority of people do not have landline phones. It is more time-consuming and expensive for pollsters to contact mobile phone users, which now constitute 75 percent of some samples. U.S. law bans cell phone autodialing, so pollsters have to dial numbers by hand individually and make more calls, since mobile users tend to screen out unknown callers. Opinions on Twitter do not reflect the opinions of the U.S. population as a whole. The elderly, poor people, or introverts, who tend not to use social media—or even computers—often get excluded.

Although big data is very good at detecting correlations, especially subtle correlations that an analysis of smaller data sets might miss, big data analysis doesn’t necessarily show causation or which correlations are meaningful. For example, examining big data might show that the decline in United States crime rate was highly correlated with the decline in the market share of video rental stores such as Blockbuster. But that doesn’t necessarily mean there is any meaningful connection between the two phenomena. Data analysts need some business knowledge of the problem they are trying to solve with big data.

Just because something can be measured doesn’t mean it should be measured. Suppose, for instance, that a large company wants to measure its website traffic in relation to the number of mentions on Twitter. It builds a digital dashboard to display the results continuously. In the past, the company had generated most of its sales leads and eventual sales from trade shows and conferences. Switching to Twitter mentions as the key metric to measure changes the sales department’s focus. The department pours its energy and resources into monitoring website clicks and social media traffic, which produce many unqualified leads that never lead to sales.

All data sets and data-driven forecasting models reflect the biases of the people selecting the data and performing the analysis. Google developed what it thought was a leading-edge algorithm using data it collected from web searches to determine exactly how many people had influenza and how the disease was spreading. It tried to calculate the number of people with flu in the United States by relating people’s location to flu-related search queries on Google. Google consistently overestimated flu rates, when compared to conventional data collected afterward by the U.S. Centers for Disease Control (CDC). Several scientists suggested that Google was “tricked” by widespread media coverage of that year’s severe flu season in the United States, which was further amplified by social media coverage. The model developed for forecasting flu trends was based on a flawed assumption—that the incidence of flu-related searches on Googles was a precise indicator of the number of people who actually came down with the flu. Google’s algorithm only looked at numbers, not the context of the search results.

The New York Police Department (NYPD) recently developed a tool called Patternizr, which uses pattern recognition to identify potential criminals. The software searches through hundreds of thousands of crime records across 77 precincts in the NYPD database to find a series of crimes likely to have been committed by the same individual or individuals, based on a set of identifying characteristics. In the past, analysts had to manually review reports to identify patterns, a very time-consuming and inefficient process. Some experts worry that Patternizr inadvertently perpetuates bias. The NYPD used 10 years of manually identified pattern data to train Patternizr, removing attributes such as gender, race, and specific location from the data. Nevertheless such efforts may not eliminate racial and gender bias in Patternizr if race and gender played any role in past police actions used to model predictions. According to Gartner Inc. analyst Darin Stewart, Patternizr will sweep up individuals who fit a profile inferred by the system. At best, Stewart says, some people identified by Patternizr will be inconvenienced and insulted. At worst, innocent people will be incarcerated.

Companies are now aggressively collecting and mining massive data sets on people’s shopping habits, incomes, hobbies, residences, and (via mobile devices) movements from place to place. They are using such big data to discover new facts about people, to classify them based on subtle patterns, to flag them as “risks” (for example, loan default risks or health risks), to predict their behavior, and to manipulate them for maximum profit. Privacy experts worry that people will be tagged and suffer adverse consequences without due process, the ability to fight back, or even knowledge that they have been discriminated against.

Insurance companies such as Progressive offer a small device to install in your car to analyze your driving habits, ostensibly to give you a better insurance rate. However, some of the criteria for lower auto insurance rates are considered discriminatory. For example, insurance companies like people who don’t drive late at night and don’t spend much time in their cars. However, poorer people are more likely to work a late shift and to have longer commutes to work, which might increase their auto insurance rates.

More and more companies are turning to computerized systems to filter and hire job applicants, especially for lower-wage, service-sector jobs. The algorithms these systems use to evaluate job candidates may be preventing qualified applicants from obtaining these jobs. For example, some of these algorithms have determined that, statistically, people with shorter commutes are more likely to stay in a job longer than those with longer commutes or less reliable transportation or those who haven’t been at their address for very long. If asked, “How long is your commute?” applicants with long commuting times will be scored lower for the job. Although such considerations may be statistically accurate, is it fair to screen job applicants this way?

Sources: Grant Wernick, “Big Data, Small Returns,” Data-Driven Investor, January 13, 2020; “Big Data 2020: The Future, Growth and Challenges of the Big Data Industry,” www.i-scoop.com, accessed January 25, 2020; Brian Holak, “NYPD’s Patternizr Crime Analysis Tool Raises AI Bias Concerns,” searchbusinessanalytics.com, March 14, 2019; Lisa Hedges, “What Is Big Data in Healthcare and How Is It Already Being Used?” October 25, 2019; Alex Bekker, “Big Data: A Highway to Hell or a Stairway to Heaven? Exploring Big Data Problems,” ScienceSoft, May 19, 2018; and Gary Marcus and Ernest Davis, “Eight (No, Nine!) Problems With Big Data,” New York Times, April 6, 2014.