by Sam T
In this peer reviewed journal, the author discusses about how every company bases some of it’s decisions on external data sources. There are so many data sources from publicly available web services, to sales data to government census data. There’s so much external data and the many ways to get it, it has to be treated differently than internal data. In order to accommodate every user’s data in a timely manner, there has to be a trade-off, where there are three, flexibility, quality and cost when it comes to data integration.
In the journal, the author describes “flexibility as to how easily you can purpose the data for the end users’ needs.”, Quality “is both a function of the
source of the data and the process by which it flows through the organization” and cost as the expenses to how well flexibility and quality are met for the users. There are different ways to incorporate data such as ETL (Extract-transform-load), in the core data warehouse, or BI (business intelligence), via reports. In the end, the authors conclude there is no correct layer to integrate external data into the enterprise flow. There are many factors to evaluate periodically. By choosing the right integration approach, it can provide users with answers in a timely manner with a good balance of flexibility, quality and cost.
I chose this article because we discussed about data warehouses for a couple of weeks and it really intrigued me to how companies got external data and how they used it. This article showed me what a problem integrating external data was as there is so much data out there. In order to make good use of external data, you have to sort it out to what the user needs, make sure the quality of the data is relevant and still keep the costs low as best you can.
Petschulat, S. (2010, January). Other People’s data. Communications of the ACM, http://0-web.ebscohost.com.opac.library.csupomona.edu/ehost/pdfviewer/pdfviewer?sid=b8ed2008-e60b-431b-9269-a72a891e5f14%40sessionmgr113&vid=5&hid=108