Data Warehouse

Data Cleansing {4}

by Garcello D

The article I decided to blog about this week is called “Data Cleansing for Data Warehousing,” it was written by Ari Baumgarten on February 27, 2007. The author opens up with an analogy comparing a politician to data cleansing by stating how Politicians raise money can be compared to data cleansing a warehouse in the sense that one cannot exist without the other. Data Cleansing is also known to be the most time intensive and contentious process for data warehousing projects but what really is data cleaning? Well I’m about to break it down for you. read more...

Data Warehouse Maturity Model {Comments Off on Data Warehouse Maturity Model}

by Andrew M
The article I read this week was entitled “A Model of Data Warehousing Process Maturity” by Arun Sen, K. Ramamurthy and Atish Sinha. The point of this article was to talk about some of the issues that business’ experience when dealing with Data Warehouses. Some of the issues that firms experience with Data Warehousing are metadata management, data changes and results not being relevant to the end user. The authors say this is due to the lack of experience of some Data Warehouse engineers and also the fact that Data Warehouse’s are hard to create and upkeep. The authors suggest using a maturity model which would help with upkeep and design. This would also show the parties who are interested what the life time of this Data Warehouse will be. read more...

Nonprofits and Data Warehousing {3}

by Kevin S
Data warehousing can be a large undertaking for any organization, particularly so for nonprofits. In the article “Data Warehousing for Nonprofits” by Laura S. Quinn, she addresses the issue and offers advice in which all organizations can take note of, nonprofit or not. The main heart of the article is to inform the reader of the potential warehousing has, and to help analyze your own organization to decide whether or not a data warehouse would be a good business solution. She gives an overview of what a data warehouse is and its benefits such as the ability to automatically pull data from it to give detailed reports. She also covers some cons, such as the high costs and technical experience required to implement. There are also some helpful questions the reader should ask themselves about their organization before considering a warehouse. read more...

Facebook Effectively Using its Big Data {5}

by Kathy S
According to the article, Facebook processes about 2.5 billion pieces of content and 500 plus terabytes of data every day. They receive 2.7 billion “Like” actions and 300 million photo uploads per day. Facebook’s Vice President of Engineering says that Big data gives them major insights and helps them to make an impact to their business. He says if they’re not taking advantage of their collection of data, then it’s just a pile of useless data. When Facebook’s data is processed into useful information they are able to make out “new products, understand user reactions, and modify designs in near real-time.” The beneficial data that Facebook possesses is passed on to its advertisers. Facebook tracks how ads perform across different dimensions of users based on gender, age, interests, so they can see which ads are being more effective. Then those specific ads are shown more to make the ad successful. Lastly, this article talks about their “Project Prism”, which is Facebook’s plan to have their live data-set hosted across their data centers in different states across the country. The article also mentions that users might be uncomfortable with the idea that Facebook employees have access to their information and activity, but they assure that it has numerous protections against abuse. If data is being accessed then it is logged so Facebook can track which employees are looking at what. The VP assures that if there is any employee prying where they’re not supposed to, they’re fired. They have a “zero-tolerance policy.” read more...

Best Practices for Data Warehousing {2}

by Robert T
In the peer review article “Best Practices in Data Warehousing to Support Business Initiatives and Needs,” authored by Jeff Lawyer and Shamsul Chowdury, the two authors discuss the importance and decisions certain businesses would have to implement in order to run their business efficiently. The authors illustrate how many companies in the 1990s were having a difficult time adjusting to the success of the many computer applications as well as the blooming of internet use. What was most difficult, according to Chowdury and Lawyer, was choosing which architecture to implement. The two general types of architectures were the Bill Inmon Style and the Ralph Kimball Style. The Inmon Stlye is one that is considered to be application neutral and could be named an enterprise data warehouse. The Kimball Style, however, has data prearranged. The authors also mention that with the “stove-pipes” of data, the cross use of data between businesses was unknown. “Under the Kimball approach, data are arranged in an application- or data-view-specific manner [8]. Under the Inmon approach, data are arranged according to the rules of normalization and remain application-
and data-view-independent [13].”
As for data warehousing growth, most data warehousing initiatives have concluded that there is a continuous need for incremental updates to the data warehouse. The authors suggest to treat the warehouse as an ongoing application. “Keeping your data warehouse team intact after the initial build is very important in order to sustain the capability to react to this need. To paraphrase a popular saying, ‘Data warehousing is not a destination – it is a journey’”. The authors were studying the data warehouse journey of one U.S. retailing company in 1995. The company used their warehouse to store only 80 gigabytes of information. The
80 gigabyte Inmon-style data warehouse was used to select customers for a targeted creditstimulation marketing program. The database has grown to hold nearly 7 terabytes with two hundred tables and two-thousand seven hundred columns. read more...

What is the Cloud? {2}

by Nelson T
In an article that I have read from PC Magazine Online, cloud computing is the next big thing coming to consumers and businesses. Cloud computing allows people to have complete access to their personal data as well as data from other users as well. This is called Personal Cloud Computing. An example of one of the many benefits of the personal cloud is automatic sync. If a user searches for music using their mobile device, purchases a piece of music from their favorite artist, the mp3 file downloads not only on their mobile device but also downloads automatically to their home computer and other devices that are linked that cloud account. Other items include the user’s address book, email and documents.  Having access to the cloud is fairly cheap and this is attracts business. Businesses can use the cloud to do some serious data mining on the fly. Having access to information coming from their users. A way that a cloud can be distinguished is from key attributes such as it being from a third party vendor that is usually off site, accessed over the internet, minimal IT knowledge are a few attributes. Cloud computing is very useful for a connected consumer and for the success of a business. As an owner of an Apple iPhone, I am a user of their cloud service named iCloud. I find this service to be very useful. Just like my example from earlier, anything i do on my mobile smartphone is sync automatically to cloud servers and then pushed to my laptop and other devices that are connected to the iCloud service. I have personally realized the benefits of this service and its been great since the start. I rely on it almost everyday to make sure I get what i need from anywhere I am connected. read more...

Healthcare and data warehousing {5}

by Daniel M
The article that i read is about how the feds are building a data warehouse that will have all everyones healthcare data, their social security number, and all of their treatments and diagnosis. The data will be updated daily by the healthcare providers and it will cover the data from current federal employees and retirees, military personnel, postal workers and their families. It also will include data from participants in the national pre-existing-conditions insurance program and the multi-state option plan created under the Patient Protection and Affordable Care Act, according to the announcement. The idea behind this is that the best course of action can be used for the treatment and that the tax payers aren’t wasting their money on treatments that don’t need to be done or have already been done by another doctor. The article also goes on to talk about how law enforcement and local government agencies can use this data for research purposes. read more...

What to Expect in Data Warehousing {2}

by Penny P
Data warehouses play a critical role in corporation/institutions. They allow users to quickly retrieve data to be used for data mining or for data analysis tools. The results that they get from these tools are then used to support business decisions that would be made. The authors of the article designed a case study that would help beginners understand the basics of data warehousing. Using the enrollment data from Universities, the main objective was to prepare the data for a mining system that would be used for predicting student enrollment. read more...