Database

Getting Top Quality Data for your Database {1}

by Leonardo S
The article I chose for this week is titled “Managing Data Source quality for data warehouse in manufacturing services”. The main topic of the article is focused around data quality. According to the article, one of the primary success factors of a data warehouse is the quality of the data. There are a few downsides from getting low quality data. Often times someone will have to go back into the database and fix the mistakes. This can take a lot of time and effort which could better be spent on other projects. Another flaw that comes from having bad data is that your data analysis will be all wrong. Database analysts will end up wasting even more time reviewing the data again after it has been fixed. The article mentions a few ways to reduce the amount of low quality data going into your database. Two methods of doing this are Total Data Quality Management and Quality Management System requirements. From what I understand, these are guidelines for collecting and inputting data that helps limit the amount of low quality data. read more...

Google’s Solution to Unify Their Databases {4}

by Brian B
The article I chose this week is named “Google Spans Entire Planet With GPS-Powered Database” by Cade Metz. The article starts off by talking about a Google Engineer named Vijay Gill while he was at a conference. The question he was posed was how he would change how “Google’s datacenters if he had a magic wand (Metz, 2012).” His answer was “he would use that magic wand to build a single system that could automatically and instantly juggle information across all of Google’s data centers (Metz, 2012).” The interesting part of this article is that Google has done just that. The solution that he had in his answer is called Spanner. Spanner is a system that lets Google “juggle data across as many as 10 million servers sitting in “hundreds to thousands” of data centers across the globe (Metz, 2012).” The power of Spanner is that it lets many people handle the data around the world, while “all users see the same collection of information at all times (Metz, 2012).” Spanner accomplishes this task with its TrueTime API. Along with this API Google has also gone to the trouble of setting up master servers with built-in atomic clocks coupled with GPS to ensure accurate server times. This allows the entire network to stay roughly synched up with all of the different parts of Google’s data infrastructure. The article goes on to say that usually companies will just use a third party as their clock instead of installing their own.  It ends on the fact that this kind of approach would be cost too much for most companies to implement, but that Google tends to be ahead of the curve. read more...

DBA’s move to Cloud {5}

by Garcello D
My final blog is called “Database Administrators prepare to move to the Cloud,” it was written by Maxwell Cooter from techworld.com and was written about a year ago. The article starts of stating how cloud computing is supposed to transform the use of databases within enterprises. According to a survey on Database trends more than a third of the database professionals think that cloud computing is to have the biggest transformational effect on database technology.   Seventy three percent of the individuals that took the Survey voted up for cloud, meaning they believe that moving to cloud would have the most effect on their lives. The results of the survey also stated that production database performance was nominated as the biggest factor that kept staff awake at night with 43 percent placing that as the top of their list. read more...

A Cloud-Based Data Warehouse Service from Treasure Data {2}

by Kathy S
The author of this article focused on the cloud-based data warehouse company, Treasure Data. The company received $1.5 million in funding that includes an investment from Yukihiro “Matz” Matsumoto — the creator of the Ruby programming language. Treasure Data has developed a service that brings high-end analysis to businesses that don’t have the resources to afford a solutions from major companies like IBM, Oracle or Teradata. According to the CEO of Treasure Data, Hiro Yoshikawa, the total cost of ownership for a data warehouse suite from one of the enterprise players can cost as much as $5 million. Treasure Data is a subscription service that, at low end, costs $1,500 per month or $1,200 per month with a 12 month commitment. Yoshikawa says that on average the cost over time is more than 10 times less than what an enterprise data warehouse offering would cost. Treasure data has more than 10 customers that include “Fortunes 500” companies and it has more than 100 billion records stored and is processing 10,000 messages per second. Also, Treasure Data borrows from Hadoop but with a twist. Unlike Hadoop, Treasure Data does not require an infrastructure investment. read more...

DBA, a great position for a CIS major {4}

by Kevin S
The main purpose of a DBA is to perform maintenance and optimization tasks on a daily basis. However, according to Craig Mullins, DBA’s often become much more than that. Because the DBA is often relied upon by both IT and business associates, the realm of what is asked and/or expected constantly grows. A DBA should expect this, and accept it as it makes him/her more valuable to a company while also extending their own personal abilities. Opportunities to grow may include (in addition to the standard DBA duties):  experience with new technologies, a better understanding of the meaning of data, actively participating in application development, or perhaps just a better understanding of business. read more...

Data Warehouses for Educational Analysis {1}

by Eric C
In a peer reviewed journal entitled “Building a Data Warehouse to Analyze Entrance Exams” written by Dr. Kornelije Rabuzin, explains building a data warehouse dedicated to student data for analysis. Using business intelligence and the combination of databases and data warehouses, it is possible to analyze student data for educational purposes. Such analysis will determine, based on an entrance exam administered to high school students, to see if students are ready for college and what materials do they know or do not know. Having such data and information can give administrators a handful of reports to determine what the best steps are for students going to college after high school. Based on the types of information mentioned in the article, it didn’t seem much can be done to analyze data based on entrance exam scores, high school grades, and regular exam scores. However, it was difficult to extract such information from various databases and it took much programming and two months. The data was loaded and organized into a star schema structure. But after loading everything into the new database warehouse, it was very useful to determine the student selection process and targeting high schools for certain skills in students. read more...

A Strong Database Tool {1}

by Shigom H
Regis Charlot, author of peer-reviewed journal “Providing an Infrastructure For A Cross-Database Management Tool”  presents his teams very own software tool called dbAnalyst as a solution to the many problems encountered in managing a  large database. Database software tools are vendor-specific and require extensive knowledge from experienced Database professionals. dbAnalyst is a database software tool with many features such as the ability to reverse engineer, generate database alert, and explore database content across different databases. The authors make a case for dbAnalyst by presenting several real-life examples in where utilizing their tool can be beneficial. For example, an electronic medical record that consist of multiple software from different vendors can be a headache to deal with.  As discussed in class, some of the challenges with heterogeneous data might include maintaining similar schema structures across databases,  and  migrating database content from different databases. Although the task might seem simple, it  is actually costly and requires a handful of experienced database administrators. Thus, dbAnalyst is an open-architecture software tool that works across different platforms to address these problems. read more...

Data Warehousing: Inmon Style and Kimball Style {2}

by Leonardo S
The article I chose for this week is titled “Best Practices in Data Warehousing to Support Business Initiatives and Needs”. The article talks about two different styles used in data warehousing and lists when you would use one versus the other. The first style is the “Bill Inmon Style” which uses a top-down approach. In contrast, the “Ralph Kimball Style” uses a bottom-up approach. The article uses a major U.S. retail company as an example. This company uses the Inmon style simply because it is what fits well for that particular business. This style of data warehousing demands that you have a third-normal form relational format for your data. The Kimball style, on the other hand, requires that you use a multidimensional style arrangement. In the case of this retail company, the application neutral aspect of the Inmon style made it an easy pick. read more...

Data Cleansing {4}

by Garcello D
 

The article I decided to blog about this week is called “Data Cleansing for Data Warehousing,” it was written by Ari Baumgarten on February 27, 2007. The author opens up with an analogy comparing a politician to data cleansing by stating how Politicians raise money can be compared to data cleansing a warehouse in the sense that one cannot exist without the other. Data Cleansing is also known to be the most time intensive and contentious process for data warehousing projects but what really is data cleaning? Well I’m about to break it down for you. read more...

SQL Packages and Data Warehousing {1}

by Ming X
The article I read is called the development of ordered SQL packages to support data warehousing, by Wilfred Ng and Mark Levene. “Data warehousing is a corporate strategy that needs to integrate information from several sources of separately developed Database Management Systems.” The authors mention that future database management systems should provide adequate facilities to manage a wide range of information arising from such integration. Since the order of data is usually involved in business queries; users can extend the relational model to incorporate partial orderings into data domains and describe the ordered relational model. User can use OSQL, which allows querying over ordered relational databases. Ordered SQL (OSQL) is an extension of the Data Definition Language (DDL) and Data Manipulation Language (DML) of SQL for the ordered relational model. OSQL can be applied to solve various problems that arise in relational DBMSs involving applications of temporal information, incomplete information and fuzzy information under the unifying framework of the ordered relational model. There are three OSQL packages: OSQL_TIME, OSQL_INCOMP and OSQL_FUZZY. read more...