Draft revising

Introduction

The digital era and the increased use of the internet has led to the accumulation of enormous amounts of data which needs to be processed to make sense to whoever owns this kind of data. The increased amounts of data make it easier for its use in the social, economic and scientific fields.  The dynamics in the world today will lead to the generation of more and more data which calls for mathematics to make the data meaningful. Data processing takes on several mathematical algorithms including the techniques of statistical learning, signal analysis, distributed optimization and compress sensing among others.

Literature review

Snidjer et al (2012) carried out a research to identify the sources of the large amounts of data and to identify how much information can be extracted from such data and how this kind of information can be extracted. Roughly there are 11 zerabytes of data in the world which is produced by the current dynamics brought about by watching TV, making phone calls, surfing the web, buying and selling and travelling among other things that lead to production of data. (Snidjer et al, 2012) There is a connection between this enormous amount of data and mathematics in that the information in this big data can only be extracted with statistical techniques mostly the machine learning algorithm with the aid of data scientists (Snidjer et al, 2012).  This argument could be supported by the findings of Lohr (2012) that the big data is mathematics of effectiveness and when there is a lot of data in the same place then this data can be normalized to conform to some set standards, central rules and taxonomies resulting in interesting patterns and outliers in the data. Outliers could be because of using so much data or at times due to human error and by using the statistical techniques the outliers can always be eliminated and the patterns observed turn out to be critical strategies in various field and especially in the field of business and the big data remains useful (Lohr, 2012).

         Chen et al. (2014) published a research result which shows that most big data takes the form of images and most of it comes from the internet and specifically social media. These leads to the need to use mathematical algorithms that classify, analyze, interpret and compress the images to make more meaningful information from them. The growth in mathematical algorithms has led to the use of pure mathematic in the analysis and interpretation of these images to replace the statistical techniques that have always been used in this kind of analysis (Chen et al. 2014). The computer now comes in to help with the application of the highly technical tools in the theory of equation to help with the complicated equations. One of the areas in pure math that is important in the classification of images is the algebraic topology which is a mathematical branch tha utilizes tool from the algebraic structures to study a set of points alongside a set of neighborhoods at each point. In order to see how the various components of an image fits together the category theory is used, this theory investigates mathematical structures and concepts on a highly abstract level (Chen et al. 2014).

In a survey by Chang et al (2014) to compare the traditional data processing techniques and the Big Data they find out that traditionally the type of data available was limited and there were specific technologies available to handle this kind of data. The computing devices were used in storing and processing data but the increasing amounts of data recently have led to the involuntary elimination of these old methods of data processing as they have become incapable of handling big data (Cheng et al, 2014). Big data has since been divided into the structured data, the semi structured data and the unstructured data. Structured data includes relational databases and spreadsheets in which similar items with similar characteristics are stored (Chang et al, 2014). The unstructured data include email, photos, social media and multimedia which are complicated and does not follow any particular rule therefore cannot be evaluated using the standard statistical methods. Semi structured data include Emails and EDIs which are characterized by bringing similar entities together and they do not necessarily have to posses the same attributes (Chang et al, 2014). These data sets prove to be too complex and the traditional data processing techniques are not able to process them.

The results of from the survey by Chang et al (2014) support the findings of La Valle et al. (2011) which were based on the studies to identify the major evolutions in data processing following the Big Data era. La Valle et al. (2011) states that the computers used to be programmed in the traditional data processing techniques in order to perform task which they could perform and they did it perfectly for those that they could do. It now happens that programming the machines to perform the same task could be tedious and now even impossible therefore the need for machine learning. In machine learning the machines are trained on images and voice recordings to help identify any forms of statistical irregularities in the data (La Valle et al, 2014). The computers are set to some mathematical algorithms to help them focus only on the relevant patterns this is the contribution of mathematics to the big data and also to computer science and engineering. La Valle et al (2014) explains that the computers are trained on how to transform a new input into a reasonable output. The algorithms used in the processing of these big data are established by the mathematicians.

Waller & Fawcett, (2013) examined how unsupervised learning techniques used in machine learning had the potential of replacing the traditional data processing techniques and even get to advance in the future. They found out that unsupervised learning is another technique often used in machine learning to identify the hidden connections in data sets. This technique aims to reduce the work load by identifying points that will help establish a two-dimensional space or a three dimensional space instead of the multi dimensional space (Waller & Fawcett, 2013). “This will reduce the need to have several data points and instead it would be easier to create a surface on which loops, folds and kinks can be identified to help explore the data” (Waller & Fawcett, 2013). This is a method that has already been put in place and is in use but with the increasing dynamics more and more methods will be need to help solve the same problems.

Cheng, Chang & Storey (2012) carried out a research to identify the impacts of Big Data and its analytics on business intelligence and management. From their findings it is clear that the ability of mathematicians to extract important information from the enormous data is of great benefit to business persons. “Analytics is an important area where big data transforms businesses” (Cheng, Chang & Storey, 2012) from analytics important decisions can be taken by the business, such discoveries include analyzing customer behavior and preferences, reducing churn, lowering the cost of new customer acquisition, improving up selling cross selling and customer targeting, accurate allocation of investments across sales channels, channel management improvement and precise measurement of campaigns. These are some of the contributions the big data has had on businesses through analytics.

In conclusion the arrival of the big data era has given room and opportunities for the improvement in several scientific disciplines. These improvements will also call upon the experts to address the challenges with the big data in order to realize success. Some of the challenges include error handling, lack of structure, heterogeneity, timeliness and visualization at all stages of data processing. The challenge of scale for example can be addressed by a research to identify ways of making the processors work faster. Scale is a big challenge in Big Data because the data is continuously increasing in volume while the processor speeds remain the same this limits the computer resources to the increasing data volumes (McAfee & Brynjolfsson, 2012). Another challenge is timeliness which is becoming a problem because when the data set is too large then the amount of time required to analyze these data also increases because the computers are not yet modified to quickly process the enormous data (McAfee & Brynjolfsson, 2012). Privacy in the Big Data has been an issue both socially and technically since ensuring privacy of very large data sets is too difficult considering the fact that data has to be linked from several sources (McAfee & Brynjolfsson, 2012). Addressing these challenges will be effective in realizing the benefit of the Big Data.

But the mathematics and computer science approach is only half of the equation when it comes to big data and its possible effects on industry. With the technological advancements of the past decade, big data has become the new norm - the new standard when it comes to businesses and how they run. More and more companies are not only starting to see the value of big data, but also have begun to see the potential risk of being left behind if they fail to incorporate data sciences into their models. In an Ernst & Young presentation on big data and how it is changing the landscape of modern business, they said it best when they stated, "The idea of data creating business value is not new, however, the effective use of data is becoming the basis of competition" [1]. The way the world works is fundamentally changing based on this newfound practice of collecting consumer data to create profiles, personalize products and ads, and find ways to reach new consumers. The EY presentation went on to say that "[c]ompanies that invest in and successfully derive value from their data will have a distinct advantage over their competitors — a performance gap that will continue to grow as more relevant data is generated, emerging technologies and digital channels offer better acquisition and delivery mechanisms, and the technologies that enable faster, easier data analysis continue to develop" [1]. So what, then, is big data? Big Data is "the dynamic, large and disparate volumes of data being created by people, tools and machines; it requires new, innovative and scalable technology to collect, host and analytically process the vast amount of data gathered in order to derive real-time business insights that relate to consumers, risk, profit, performance, productivity management and enhanced shareholder value" [1].

        Surely if a company as invested in consumer markets and practices as EY can see the value in harnessing the power of data to fuel decision making, the average businessman could extrapolate the value to their own business. An increasingly digital world has opened up a whole new avenue for how we interact with people, and as is the nature of most things digital, lends itself to collecting and recording information. Take the example EY provides of the movie rental industry. In the days of home owned video stores, customer profiles were skin deep and movie recommendations were provided based on the owner's or employee's opinions and customer purchases at just that location. With the advent of online streaming services and the advancement of big data technology, nowadays recommendations can be made on the fly using a collection of data on a customer's past movie rental history, the data of all a business's other customers and what they watched after seeing similar movies, and data on what the original customer does and doesn't like based on review history to create detailed, accurate recommendations that make it more likely the consumer stays around longer and spends more money on the service [1]. In this way, businesses are becoming better than ever at creating unique experiences that drive value individually for each customer. And this increased value proposition has not gone unnoticed. "In a survey of the state of business analytics by Bloomberg Businessweek (2011), 97 percent of companies with revenues exceeding $100 million were found to use some form of business analytics" [2].

        Essentially, what all this information boils down to is that big data has taken the world by storm. And besides its incredible usefulness as a predictive tool, big data has itself sparked the growth of an entire new business analytics industry. The overwhelming belief in business today after the advent of big data technology essentially is summed up like this, "[d]ata-driven decisions are better decisions" and "big data enables managers to decide on the basis of evidence rather than intuition" [3]. As such, data analytics as an industry has taken off. Forbes writer Bernard Marr wrote that, " savvy businesses will start to offer data services to even very small companies" [4]. But even with this recognition that big data is the future of business as we know it, there is still a disconnect in both technology and ability when it comes to the analytical part of the equation. "A report last year by the McKinsey Global Institute, the research arm of the consulting firm, projected that the United States needs 140,000 to 190,000 more workers with “deep analytical” expertise and 1.5 million more data-literate managers, whether retrained or hired" [5]. The shortage of qualified analytical experts is sure to be addressed naturally in coming years as more and more businesses, large and small alike, begin to see and act on the value provided by big data analytics - this largely on the back of the almost communal view that big data is "a new class of economic asset, like currency or gold" [5]. Similar to how the microscope allowed to see and measure things at the cellular level, a feat which revolutionized research, big data is now allowing businesses and managers to see a company's operations at an increasingly cellular level [5].

            Big data isn't without its drawbacks, however. Plenty of new challenges have come up as a result of the new way of doing business. For one, "[p]rivacy advocates take a dim view [of business analytics], warning that Big Data is Big Brother, in corporate clothing" [5]. With the collection of data, which as stated earlier has almost become akin to currency in the digital age, privacy breaches have become more serious a concern than ever. Cyber security and information recording legislature have come to the forefront of consumer's minds. Cyber security, for instance, has become a billion dollar industry on its own, as companies spend millions of dollars trying to protect both sensitive information they collect, like credit card information, in addition to general data like consumer habits that could ruin a competitive advantage if leaked. Another unique problem stems from how new big data is as an analytical tool. EY states that "[w]hile the ability to capture and store vast amounts of data has grown at an unprecedented rate, the technical capacity to aggregate and analyze these disparate volumes of information is only just now catching up" [1]. And as such, "[w]ith huge data sets and fine-grained measurement...there is increased risk of 'false discoveries.'" [5]. As a Stanford statistics professor Travis Hastie says, "the trouble with seeking a meaningful needle in massive haystacks of data is that many bits of straw look like needles" [5]. The tools to accurately use information that is collected are constantly evolving and sometimes make predictions or assumptions in their models that are inaccurate. For instance, "[a] model might spot a correlation and draw a statistical inference that is unfair or discriminatory, based on online searches, affecting the products, bank loans and health insurance a person is offered" [5].

References

Chang, R. M., Kauffman, R. J., & Kwon, Y. (2014). Understanding the paradigm shift to computational social science in the presence of big data. Decision Support Systems, 63, 67-80.

Chen, H., Chiang, R. H., & Storey, V. C. (2012). Business intelligence and analytics: From big data to big impact. MIS quarterly, 36(4).

Chen, M., Mao, S., & Liu, Y. (2014). Big data: A survey. Mobile Networks and Applications, 19(2), 171-209.

LaValle, S., Lesser, E., Shockley, R., Hopkins, M. S., & Kruschwitz, N. (2011). Big data, analytics and the path from insights to value. MIT sloan management review, 52(2), 21.

Lohr, S. (2012). The age of big data. New York Times, 11(2012).

McAfee, A., & Brynjolfsson, E. (2012). Big data: the management revolution. Harvard business review, 90(10), 60-68.

Snijders, C., Matzat, U., & Reips, U. D. (2012). " Big Data": big gaps of knowledge in the field of internet science. International Journal of Internet Science, 7(1), 1-5.

Waller, M. A., & Fawcett, S. E. (2013). Data science, predictive analytics, and big data: a revolution that will transform supply chain design and management. Journal of Business Logistics, 34(2), 77-84.

[1] Ernst and Young. “Big Data: Changing the Way Business Compete and Operate”. EY.com/publication. April 2014

[2] Chen, Hsinchun, Roger HL Chiang, and Veda C. Storey. "Business intelligence and analytics: From big data to big impact." MIS quarterly 36.4 (2012).

[3]McAfee, Andrew, and Erik Brynjolfsson. "Big data: the management revolution." Harvard business review 90.10 (2012): 60-68.

[4]Marr, Bernard. "4 Ways Big Data Will Change Every Business." Forbes. Forbes Magazine, 08 Sept. 2015. Web. 26 July 2017.

[5]Lohr, Steve. "The age of big data." New York Times 11.2012 (2012).