What to do With Too Much Data{2}

In the article, the author discusses how the modern database often extends beyond a few hundred entities; modern day companies regularly are wading through terabytes of information, trying to drag useful & meaningful context out of massive loads of information. Several massive problems are brought up – searching through the data is tedious and yields irrelevant results, metadata could vary in the usefulness and the context might not be comprehended by others, attributes could mean the same thing but be sorted separately (ex: Mac, Macintosh, Apple Computer, iMac could all be different ways to describe the same product), and it is very difficult to standardize the data and determine who regulates and incorporates the standardization – and if it’s even worth the time to do so. Thus, the solution offered is simple – relax the standard. Let there be a little differentiation, and create unified product descriptions that can catch multiple ways of describing the same object, determine responsibilities for who is going to ensure data integrity. Even then, there is no hard solution, and the conclusion is that there must be a future implementation of database management systems that can form patterns and relationships with data, have well-documented information on where data is originating from, and develop a system to understand how much is being lost by inaccuracies in the data.

This relates to several aspects covered in class; for starters, as we learn how to sort through data, we begin to understand the constraints of utilizing Queries to retrieve and sort data – it becomes clear that massive amounts of data coming from varying sources would be an absolute nightmare to sort through, and if those sources don’t have a standardized form of input, heaven help us. Additionally, there is discussion of massive sorting and processing of data in the article, and that is something the book strongly pushes us to comprehend – the days of gigabytes’ worth of information per company is over, and now is the time for Terabytes of information to be sorted through quickly enough to be useful. Finally, the article mentions improving data integrity, a focus on our own data input from the start of the class.

This topic is relevant to my interests in MIS – it reveals the shortcomings of the current approaches to data management and database management systems, the future of where database management should be moving towards, and even tells us specifically what we may expect in the near future. This is a great article for seeing where Queries and information management needs to improve, and we shall be the ones who will no doubt oversee this change.

Works Cited
Helland, P. (2011). If you have too much data, then good enough is good enough. Association for Computing Machinery.Communications of the ACM, 54(6), 40. http://search.proquest.com/docview/874208906?accountid=10357