Distributed Data{2}


In Michael Miller’s column FowardThinking, he talks about
the many new systems that have been emerging due
to the huge amounts of data generated by the  internet
companies.  He refers to these systems as distributed
file systems and because the type of data is different
than the typical data used in RDMSs these companies
have had to develop their own software to handle and
mine this new type of data which includes log and click
data and Web traffic.    Some of the companies that
have created these new databases are Apache Software
Foundation which has created Hadoop inconjuction with
Apache Hive data warehousing tools and the Pig platform. 
Google has also developed Google File System and Big
Table while Amazon has created Dynamo. 

This week we are learning about data warehousing and
many of these companies must use warehousing in one
place or another.  Meaning they must store their data
on site or off.  I’m just wondering if data warehousing
has become the new Cloud?  It’s a place to store
data and run analytics  that sounds like data warehousing.

I am curious about the term distributed file systems.    Apache
claims their Hadoop is fault tolerant and it relaxes a few POSIX
requirements to enable streaming access to file system data.
This means they store alot of redundant data to ensure that
they can run their business continually and fast.  I now understand
that noSQL does not mean no SQL but not only SQL. 

Miller, M. (May 4, 2012). Storing Massive Data: Distributed Data
and the noSQL Movement. PCMagazine.com.
Referenced from:  http://forwardthinking.pcmag.com/pc-hardware/297512-storing-massive-data-distributed-data-and-the-nosql-movement