Uncategorized Archive

Other People’s data

by Sam T
In this peer reviewed journal, the author discusses about how every company bases some of it’s decisions on external data sources. There are so many data sources from publicly available web services, to sales data to government census data. There’s so much external data and the many ways to get it, it has to be treated differently than internal data. In order to accommodate every user’s data in a timely manner, there has to be a trade-off, where there are three, flexibility, quality and cost when it comes to data integration. read more...

DBA’s move to Cloud

by Garcello D
My final blog is called “Database Administrators prepare to move to the Cloud,” it was written by Maxwell Cooter from techworld.com and was written about a year ago. The article starts of stating how cloud computing is supposed to transform the use of databases within enterprises. According to a survey on Database trends more than a third of the database professionals think that cloud computing is to have the biggest transformational effect on database technology.   Seventy three percent of the individuals that took the Survey voted up for cloud, meaning they believe that moving to cloud would have the most effect on their lives. The results of the survey also stated that production database performance was nominated as the biggest factor that kept staff awake at night with 43 percent placing that as the top of their list. read more...

Data Warehousing and Data Quality

by Asim K
In their peer-reviewed Journal, published in a booklet for Hawaii’s International Conference on System Sciences, Rudra and Yeo explore the key factors that determine what make data in a data warehouse inefficient or lacking in quality. They begin with a basic introduction on the concept of Data Warehousing and its history, purpose, etc., then go into the aim of the study (which is mainly catered to data warehousing for companies and industries in Australia). Data Quality is then explained in quick-and-dirtly detail, bullet pointed in a very direct manager, mentioning that the quality of data refers to “how relevant, precise, useful, and timely data is”. Rudra and Yeo explain that it has been found that many end users such as managers are unaware of the quality of the data they use in a data warehouse so there are many setbacks because of ineffective planning. They start beginning to explain data inconsistency in which there are different versions of the same data in a database – this section is concluded when mentioning that there is a direct relationship between data consistency and data integrity (chart provided in citation). After the background information of the research is given, the authors of the journal go into their findings in which they see that the quality of data is measured by: completeness of data, consistency of entries, accuracy of data, uniqueness of account numbers, and durability of business rules that pin down the data. Rudra and Yeo conclude that the mos common ways that data gets polluted in a data warehouse is that data is never fully captured, “heterogeneous” systems are integrated incorrectly, and there is a lack of planning on part of the management. read more...

A Vulnerability in Microsoft XML

by Sam T
This article discusses about a flaw in Microsoft XML that can allow attackers to gain access to a system. The vulnerability is known to all versions of Microsoft OS, and other popular supported programs such as Microsoft Office 2003 and 2007. This vulnerability can be exploited just by simply loading a malicious web page but most users have been taught not to click on any suspicious links. One way attackers are getting around this is to take over a well known site that many users are already going to. Experts say one European Medical Site was hijacked and implanted with a corrupted code, exploiting the XML flaw. Although Microsoft gave advice on how to reduce the risk of the flaw, Microsoft has not released an update to cover this exploit. The author goes on and discusses how to protect yourself such as making sure your security software is up to date with it’s definitions and also to use the Fix-it tool from Microsoft which will implement measures to block the site the vulnerability is at. read more...

Design and Performance of Databases

by Eric C
When it comes to building databases, performance is perhaps the top priority for database engineers. Speed and efficiency is the key when it comes to gathering information using queries for end users. There are various ways to improve the efficiency of databases and that includes the incorporation of vertical and horizontal partitioning. In a peer reviewed article written by Agrawal, Narasayya, and Yang, entitled “Integrating Vertical and Horizontal Partitioning into Automated Physical Database Design,” the authors discuss how vertical and horizontal partitioning can both improve the performance and manageability of databases. Horizontal partitioning takes the rows of tables that have common values and puts them together into one or more tables. Vertical partitioning is the same concept, except using the columns of several tables. Furthermore, there are two main methods of partitioning, as mentioned in the article: hash and range. Hash involves the distribution of rows evenly across different tables with the use of a hash key. Range involves the partition of minimum and maximum values of data in many columns. In order to make the database easier to manage with these methods, it is required to have the indexes and tables aligned. An index is aligned when the index uses the same partitioning technique as the tables. However, achieving such tasks is complex and therefore the physical design is very important. For example, there are many ways to horizontally and vertically partition a database, as well as the alignment and as a result, choosing the correct design is critical. If such designs are not implemented correctly, that database could perform slower or even cause lockups when running queries. Once correctly implemented, the database can receive an improvement of up to 20% in running queries. read more...

Megaupload Not Secure Enough

by Andrew H
For this weeks blog I chose to read an article called “Meagaupload Is Dead. Long Live Mega!” by Charles Graeber. The article talks about Kim Dotcom who was recently indicted by the U.S government for conspiracy and briefly thrown in jail and his partners in the digital storage locker Megaupload have no intentions of quitting. Instead they have decided to introduce a new technology later this year that will again allow users to share and upload big data files however by different rules. It is a subscriber-based cloud system that allows users to upload, access and store huge data files on their databases; however this time each file will be encrypted and the user will be sent a unique for the files decryption. The point of this is to put any liabilities of stolen or illegal documents, videos, music etc. in the hands of the users that uploaded them. The article continues on about how the company Mega will not have any means of decrypting the files so that they cannot be responsible by them. read more...

ASP.net Code Cloning

by Robert L
Frequently change in requirements, tight delivery deadline and complex application architecture slow down web applications development and encourage code cloning. Web application frameworks mainly support developers to speed up development by providing libraries for database access, session management, and they often promote code reuse. In this paper, we provide a systematic study of cloning in six (6) Web Applications of different sizes, developed using Classic ASP.NET and ASP.NET MVC framework to find out whether there is any relation between frameworks and code cloning. The contribution of our study is: 1) the study results shows which framework in .NET technology can be chosen to avoid cloning in development of web application; 2) the cloning metrics that we have calculated and applied in our study may be useful in other similar studies. read more...

Lawsuit Against Google Books

by Rizwan A
Summary:

This article talks about how digitizing, archiving, copying of print media is becoming one of the largest infringements in history. One that is the most notable digital infringements is Google’s Books project. Google Books became an easy access for anyone to find references to their millions of books and magazines. Of these books many are claimed to be illegally scanned by Google, which then gives their database of books to HathiTrust, a partnership of universities and libraries. There are some groups of writer unions, who have filed a lawsuit against Google stating that Google Books is publishing copyrighted works without the author’s authorization. Currently, there are about 7 million plus books from Google that are claimed to be illegally scanned, because these scans were unauthorized, the writers are seeking to withdraw all the illegally scanned books from Google’s database. Google is currently offering the authors to either sign a contract which gives them their share or remove their book from their database. If the authors do not contact Google, the books will become available to the public. According to the article the lawsuit is still active as of September 15, 2011 and hopefully in the interest of every one, Google and the writer unions will reach an agreement soon. read more...

Insight to Namespaces

by Bernard T
This week’s blog’s assignment gave us a choice between several topics; I chose to do mine on Namespaces, more specifically .NET Namespaces. Namespaces, the article mentioned enables users to group logically related classes together but points out that classes are not required to be provided to use a Namespace. Namespaces are a collection of objects, each containing different sets of objects grouped according to their functionality. Advantages of Namespaces include preventing naming collisions; this means that if for example two or more companies produce a component with the same name, Namespaces will provide a way to distinguish them from each other. Namespaces also has the added benefit of making it easier to understand the purpose of a given class, grouping your classes that manipulate images together for example into a System.Drawing namespace makes it easier to remember what and where the classes are. The article gave numerous examples of Namespaces; System.Data is one that contains all the classes needed to interact with data sources and without it, it would be impossible for .NET to function compatibly with Active X Data Objects for .NET. The article also pointed out however, that some Namespaces are automatically imported into ASP.NET. read more...

THE BIG DATA?

by Rizwan A
Summary:
Big Data which means a large amount of dataset. A lot of companies have been dealing with the use of big data since a very long time ago and back than it was expensive so only large companies were able to utilize these big data’s. As it is states in this article that as compute ring and storage became cheaper, and it became possible for smaller companies to harvest these data. Also it mentions the next big leap in the data world is the concept of cloud computing and with cloud computing we can use thousands of virtual computer to do the data mining for us, and not have to worry about  maintaining and initial cost of acquiring computing and storage. Google is a company that utilizes big data better than many other companies. Google also has massive amount of computing power. As virtualized computing becomes more popular, could big data technologies such as MapReduce, Hadoop, Hive, and MapReduce allows the division of worlds across servers located in different locations. read more...