Hadoop or EDW{1}

by Brian B
The article that I picked with week is called “Big Data Debate: End Near for Data Warehousing?” by Doug Henschen. The article starts off by giving some background to EDW (Enterprise Data Warehouse). It says that while the technology behind EDW is time tested and thoroughly developed it remains rigid and inflexible when you have to go back and make changes to your data model.  It also says that this is often a very time consuming process that often costs a lot of money to plan out and implement if you ever finish actually modeling and developing the system. It then talks about Hadoop, “which lets you store data on a massive scale at low cost (compared with similarly scaled commercial databases) (Henschen, 2012).” The author says that this is an improvement over normal EDW because it allows more flexibility when it comes to making changes down the road. The problem is that it is not as developed as EDW so it can be difficult to find people who have an intimate knowledge of the software. The article then opens up into a debate between Ben Werther (Pro Hadoop) and Scott Gnau (Pro EDW). Werther essentially says that EDW is a dated technology because by the time you push out the model and get everything implemented you have what amounts to a view of the world a year or more ago, which may or may not be applicable to your business needs today, wasting your companies time and resources. Gnau’s argument boils down to the fact that while Hadoop maybe more flexible it does not allow you to have very good control over the data you have collected. He says that with all of that data being un-modeled it will cause issues for analyst’s to view and sort the data, which is why EDW will stick around to make their jobs more manageable.

I thought that this article was relevant to our class because we talked last week about Data Warehousing and this article just came out 5 days ago, so it seemed like it would be an excellent current event to cover.  To me the best part was the debate between the two sides because it really showed the pro and con argument for either the new technology or the old tried and true method of data sorting and storage. It seems like right now we are in a transitional period between the old methods where people are holding on to what they know, sort of like the push back from older more entrenched users when Windows 8 came out compared to Windows 7, and new people in the field who are jumping at the opportunity to work with the bleeding edge of tech and get in at the forefront of innovation.

I liked the article because not only did it cover both sides of the argument it had more than just one person talking about the issue. Since the article had three separate people it gave three separate viewpoints which each had its own perspective on the topic at hand.  It seems to me like it would be a good practice to learn both the old method involving EDW and the new method of using Hadoop because they seem like they could eventually become compliments of each other. I also see value in both arguments, some companies just need to collect as much data as possible, using something like Hadoop, and some companies need to have structured data because of the type of business that they conduct.

Henschen, D. (2012, 11 19). Big Data Debate: End Near For Data Warehousing? Retrieved from InformationWeek: http://www.informationweek.com/big-data/news/big-data-analytics/big-data-debate-end-near-for-data-warehousing/240142290