Automating Big-Data Analytics

By Corey S.

Big-Data is what we call very large data sets that are difficult to analyze through traditional means of data processing. Companies spend thousands of dollars to hire data scientists to try to find ways that they can get this mass of data to work for them instead of sitting stagnant on petabytes of storage in a data warehouse. This data could be potentially used to track trends in consumer behavior, or to identify peak times of sales. The current process of analyzing big data often involves licensing expensive software, and/or hiring a data scientist to find those trends. But what if this could be improved upon? What if a large portion of this could be taken care of by itself?

Seeking to automate data analytics, an MIT start up has developed software they call the “Data Science Machine”, capable of analyzing and predicting trends in large oceans of raw data. The software is able to tune itself, to figure out what parameters would create the most useful and accurate prediction of trends. Once those parameters are decided, it selects, “a subset of the most relevant variables and chooses the best machine learning technique for determining the relationship between the variables and the model predictions” (IEEE 2015). When tested in three separate data science competitions, the Data Science Machine obtained results that beat 615 out of 906 human teams that were competing. In addition to that success margin, the Data Science Machine was able to find its solution in two to 12 hours. Whereas the human teams took months on some problems. The software also, “achieved predictions that were 94 percent, 96 percent and 87 percent as accurate as the winning models submitted in each competition” (IEEE, 2015). This percentile displays that, despite the speed the predictions were found at, the skills and expertize of a good data scientist still produce more accurate and therefore valuable results..

As stated, even with these advancements and improved efficiencies, a good data scientist still can’t be beat and will always be in demand. The software developed by MIT is nowhere near replacing those individuals. Instead it is a way of increasing their value by allowing them to be more efficient. One way it could be utilized is a scientist, “could run Data Science Machine and use its results as a baseline to build a better predictive model” (IEEE 2015). Meaning that the software could quickly develop a solid starting point for scientists to dig deeper into. Allowing humans and machines to work together to develop more timely and informed decision making.

Beyond empowering data scientist at large companies that can afford to have them on the payroll, the Data Science Machine can empower smaller companies to begin making more well informed data driven decisions as well. Letting even non technology based companies to make informed business decisions using big data. The automation of big data analytics will certainly lead to more well informed and strategic decision making from large and small companies in the years to come. Allowing any company to begin to crunch numbers and allow their data to work for them. Without necessarily needing to pay the salary of a skilled data scientist.


Hsu, Jeremy. Artificial Intelligence Outperforms Human Data Scientists. (2015). Retrieved from      human-data-scientists
Lisa Arthur. What Is Big Data? (2013). Retrieved from
ValueWalk: MIT’s ‘data science machine’ beats humans in human intuition (2015).Chatham: Newstex. Retrieved from