A Proposal Report on Data Security in Big Data

BIG DATA AND MATHEMATICAL COMPUTATION 14









Big Data and Mathematical Computation










Big Data and Mathematical Computation

Introduction

The digital era and the increased use of the internet has led to the accumulation of enormous amounts of data which need to be processed before they can be of any use. This increased amount of data, commonly referred to as big data, has made it far more applicable and useful in a variety of social, economic and scientific fields. The dynamics of the world today, with the growth of technology, will inevitably lead to the generation of more and more data, which in turn calls for advanced mathematics in order to make the data meaningful. Data processing takes several mathematical algorithms to accomplish, the most common of which is statistical learning - a framework utilized in machine learning. Other aspects include signal analysis, which is an approach to analyzing, modifying and synthesizing signals in order to understand information and attributes of a given element, distributed optimization, and compress sensing, which is a technique in processing signals that aims at acquiring and reconstructing signals within linear systems.

So what, then, is the big data which all of these mathematical computations try to make use of? Big Data is "the dynamic, large and disparate volumes of data being created by people, tools, and machines; it requires new, innovative and scalable technology to collect, host and analytically processes the vast amount of data gathered. This is done to derive real-time business insights that relate to consumers, risk, profit, performance, productivity management and enhanced shareholder value" (Ernst and Young 2014). With the advent of companies like Google and Facebook that generate unfathomable amounts of data, companies now have a new way to drive management decisions and impact the future of their industries in new and exciting ways.

Literature Review

Snidjer et al. (2012) carried out research to identify the sources of these vast amounts of data, determine how much information could be extracted from such data, and conceptualize how this kind of information could be extracted. There are roughly 11 zettabytes of information in the world, produced by the current dynamics brought about by watching TV, making phone calls, surfing the web, buying and selling, and traveling among other things that lead to the production of data. (Snidjer et al., 2012) There is a connection between this enormous amount of data and mathematics in that the information in this big data can only be extracted with statistical techniques, mostly the machine learning algorithm, with the aid of data scientists (Snidjer et al., 2012).  

This argument could be supported by the findings of Lohr (2012) that big data is the mathematics of effectiveness, and when there is a lot of data in the same place, then this data can be normalized to conform to some set standards - central rules and taxonomies resulting in unusual patterns and outliers in the data. Outliers could generally be attributed to the use of so much data or at times due to human error, and by using the statistical techniques the outliers can usually be eliminated. These observed patterns have turned out to be critical for strategical planning in various fields, especially in the area of business (Lohr 2012).

Chen et al. (2014) published a research report which showed that most big data takes the form of images, and most of it comes from the internet - specifically social media. This led to the need to use mathematical algorithms that classify, analyze, interpret and compress the images to make more meaningful information out of them. The growth in mathematical algorithms has led to the use of pure mathematics in the analysis and interpretation of these images to replace the statistical techniques that have always been employed in this kind of analysis (Chen et al. 2014). This is where advanced technology and computers now come in to help with the application of the highly technical tools in the theory of equation to perform complicated equations. One of the areas in pure math that is important in the classification of images is algebraic topology, which is a mathematical branch that utilizes tools from algebraic structures in order to study a set of points alongside a set of neighborhoods at each point. To see how the various components of an image fit together, the category theory is used. This approach investigates mathematical structures and concepts on a highly abstract level (Chen et al. 2014).

In a survey by Chang et al. (2014) to compare traditional data processing techniques and Big Data, they found out that traditionally the type of data available was limited, and there were specific technologies available to handle this kind of data. Computing devices were used for storing and processing data, but increasing amounts of evidence recently have led to the automatic elimination of these old methods of data processing as they have become incapable of handling big data (Cheng et al., 2014). Big data has since been divided into structured data, semi structured data, and unstructured data. Structured data includes relational databases and spreadsheets in which similar items with similar characteristics are stored (Chang et al., 2014). Unstructured data includes email, photos, social media, and multimedia which are complicated and do not follow any particular rule and therefore cannot be evaluated using standard statistical methods. Semi structured data includes Emails and EDIs, which are characterized by bringing similar entities together, and not necessarily having the same attributes (Chang et al., 2014). These data sets have proven to be too complex, and traditional data processing techniques are not able to process them.

The results of from the survey by Chang et al. (2014) support the findings of La Valle et al. (2011) which were based on studies to identify the major evolutions in data processing following the Big Data era. La Valle et al. (2011) states that computers used to be programmed in traditional data processing techniques to perform a task which they could perform, and they worked perfectly in regards to being able to complete these now obsolete procedural analyses. It now happens that programming the machines to do the same function could be tedious, or even impossible, thus creating the need for machine learning. In machine learning, the machines are trained on images and voice recordings to help identify any forms of statistical irregularities in the data (La Valle et al., 2014). The computers are set to some mathematical algorithms to help them focus only on the relevant patterns, which sums up the major contribution of mathematics to big data and computer science and engineering. La Valle et al. (2014) explains that the machines are trained on how to transform a new input into a reasonable output. The algorithms used in the processing of big data are established by the mathematicians.

Waller & Fawcett, (2013) examined how unsupervised learning techniques used in machine learning had the potential of replacing traditional data processing techniques and even advancing in the future. They found out that unsupervised learning is another technique often used in machine learning to identify the hidden connections in data sets. This method aims to reduce the work load through the identification of points that will help establish a two-dimensional space or a three-dimensional space instead of multi-dimensional space (Waller & Fawcett, 2013). “This will reduce the need to have several data points and instead it would be easier to create a surface on which loops, folds, and kinks can be identified to help explore the data” (Waller & Fawcett, 2013). This is a method that has already been put in place and is in use, but with the increasing dynamics, more and more plans will need to help solve the same problems.

Cheng, Chang & Storey (2012) carried out research to identify the impacts of Big Data and its analytics on business intelligence and management. From their findings, it is clear that the ability of mathematicians to extract valuable information from the extensive data is of great benefit to business persons. “Analytics is an important area where big data transforms businesses” (Cheng, Chang & Storey, 2012) from analytics, important decisions can be made by the firm. Such discoveries include analyzing customer behavior and preferences, reducing churn, lowering the cost of new customer acquisition, improving up selling, cross selling, and customer targeting, accurate allocation of investments across sales channels, channel management improvement, and precise measurement of campaigns. And these are just some of the contributions that big data has had on businesses through analytics.

Big Data's Application to Business

Using all these new mathematical measures and computations, business has been able to evolve and grow with technology to reach new heights. With the technological advancements of the past decade, big data has become the new norm - the new standard when it comes to businesses and how they run. More and more companies are not only starting to see the value of big data but also have begun to see the potential risk of being left behind if they fail to incorporate data sciences into their models. In an Ernst & Young presentation on big data and how it is changing the landscape of modern business, they said it best when they stated, "The idea of data creating business value is not new. However, the effective use of data is becoming the basis of competition" (Ernst and Young 2014). The way the world works is fundamentally changing based on this newfound practice of collecting consumer data to create profiles, personalize products and ads, and find ways to reach new customers. The EY presentation went on to say that "[c]ompanies that invest in and successfully derive value from their data will have a distinct advantage over their competitors — a performance gap that will continue to grow as more relevant data is generated, emerging technologies and digital channels offer better acquisition and delivery mechanisms, and the technologies that enable faster, easier data analysis continue to develop" (Ernst and Young 2014).

        Surely if a company as invested in consumer markets and practices as EY can see the value in harnessing the power of data to fuel decision making, the average businessman could extrapolate the value to their own business. An increasingly digital world has opened up a whole new avenue for how we interact with people, and as is the nature of most things digital, lends itself to collecting and recording information. Take the example EY provides of the movie rental industry. In the days of home owned video stores, customer profiles were skin deep, and movie recommendations were presented based on the owner's or employee's opinions and customer purchases at just that location. With the advent of online streaming services and the advancement of big data technology, nowadays recommendations can be made on the fly using a collection of data on a customer's past movie rental history, the data of all a business's other clients and what they watched after seeing similar films, and data on what the original customer does and doesn't like based on review history to create detailed, accurate recommendations that make it more likely the consumer stays around longer and spends more money on the service (Ernst and Young 2014). In this way, businesses are becoming better than ever at creating unique experiences that drive value individually for each customer. And this increased value proposition has not gone unnoticed. "In a survey of the state of business analytics by Bloomberg Businessweek (2011), 97 percent of companies with revenues exceeding $100 million were found to use some form of business analytics" (Chen, Hsinchun, Chiang, Storey 2012).


Analysis

        Essentially, what all this information boils down to is that big data has taken the world by storm. And besides its incredible usefulness as a predictive tool, big data has itself sparked the growth of an entirely new business analytics industry. The overwhelming belief in the market today after the advent of big data technology essentially is summed up like this, "[d]ata-driven decisions are better decisions" and "big data enables managers to decide by evidence rather than intuition" (McAfee, Andrew, Brtnjolfsson 2012). As such, data analytics as an industry has taken off. Forbes writer Bernard Marr wrote that " savvy businesses will start to offer data services to even tiny companies" (Marr 2015). But even with this recognition that big data is the future of business as we know it, there is still a disconnect in both technology and ability when it comes to the analytical part of the equation. "A report last year by the McKinsey Global Institute, the research arm of the consulting firm, projected that the United States needs 140,000 to 190,000 more workers with “deep analytical” expertise and 1.5 million more data-literate managers, whether retrained or hired" (Lohr 2012). The shortage of qualified analytical experts is sure to be addressed naturally in coming years as more and more businesses, large and small alike, begin to see and act on the value provided by big data analytics - this largely on the back of the almost communal view that big data is "a new class of economic asset, like currency or gold" (Lohr 2012). Similar to how the microscope allowed to see and measure things at the cellular level, a feat which revolutionized research, big data is now allowing businesses and managers to see a company's operations at an increasingly cellular level (Lohr 2012).

Drawbacks of Big Data

            Big data isn't without its drawbacks, however. Plenty of new challenges has come up as a result of the new way of doing business. For one, "[p]rivacy advocates take a dim view [of business analytics], warning that Big Data is Big Brother, in corporate clothing" (Lohr 2012). With the collection of data, which as stated earlier has almost become akin to currency in the digital age, privacy breaches have become more pressing concern than ever. Cyber security and information recording legislature have come to the forefront of consumer's minds. Cyber security, for instance, has become a billion-dollar industry on its own, as companies spend millions of dollars trying to protect both sensitive information they collect, like credit card information, in addition to general data like consumer habits that could ruin a competitive advantage if leaked. Another unique problem stems from how new big data is as an analytical tool. EY states that "[w]hile the ability to capture and store vast amounts of data has grown at an unprecedented rate, the technical capacity to aggregate and analyze these disparate volumes of information is only just now catching up" (Ernst and Young 2014). And as such, "[w]ith huge data sets and fine-grained measurement...there is increased risk of 'false discoveries.'" (Lohr 2012). As a Stanford statistics professor Travis Hastie says, "the trouble with seeking a meaningful needle in massive haystacks of data is that many bits of straw look like needles" (Lohr 2012). The tools to accurately use information that is collected are always evolving and sometimes make predictions or assumptions in their models that are inaccurate. For instance, "[a] model might spot a correlation and draw a statistical inference that is unfair or discriminatory, based on online searches, affecting the products, bank loans and health insurance a person is offered" (Lohr 2012).

Conclusion

In conclusion, the arrival of the big data era has given room and opportunities for the improvement in several scientific disciplines. These improvements will also call upon the experts to address the challenges with the big data to realize success. Some of the problems include error handling, lack of structure, heterogeneity, timeliness, and visualization at all stages of data processing. The problem of scale, for example, can be addressed by research to identify ways of making the processors work faster. The scale is a big challenge in Big Data because the data is continuously increasing in volume while the processor speeds remain the same this limits the computer resources to the increasing data volumes (McAfee & Brynjolfsson, 2012). Another challenge is timeliness which is becoming a problem because when the data set is too large, then the amount of time required to analyze these data also increases because the computers are not yet modified to quickly process the large data (McAfee & Brynjolfsson, 2012). Privacy in the Big Data has been an issue both socially and technically since ensuring the privacy of extensive data sets is too difficult because data has to be linked from several sources (McAfee & Brynjolfsson, 2012). Addressing these challenges will be useful in realizing the benefit of the Big Data.

References

Chang, R. M., Kauffman, R. J., & Kwon, Y. (2014). Understanding the paradigm shift to computational social science in the presence of big data. Decision Support Systems, 63, 67-80.

Chen, H., Chiang, R. H., & Storey, V. C. (2012). Business intelligence and analytics: From big data to big impact. MIS quarterly, 36(4).

Chen, M., Mao, S., & Liu, Y. (2014). Big data: A survey. Mobile Networks and Applications, 19(2), 171-209.

Ernst and Young. “Big Data: Changing the Way Business Compete and Operate”. EY.com/publication. April 2014

LaValle, S., Lesser, E., Shockley, R., Hopkins, M. S., & Kruschwitz, N. (2011). Big data, analytics and the path from insights to value. MIT sloan management review, 52(2), 21.

Lohr, S. (2012). The age of big data. New York Times, 11(2012).

Marr, Bernard. "4 Ways Big Data Will Change Every Business." Forbes. Forbes Magazine, 08 Sept. 2015. Web. 26 July 2017.

McAfee, A., & Brynjolfsson, E. (2012). Big data: the management revolution. Harvard business review, 90(10), 60-68.

Snijders, C., Matzat, U., & Reips, U. D. (2012). " Big Data": big gaps of knowledge in the field of internet science. International Journal of Internet Science, 7(1), 1-5.

Waller, M. A., & Fawcett, S. E. (2013). Data science, predictive analytics, and big data: a revolution that will transform supply chain design and management. Journal of Business Logistics, 34(2), 77-84.