Big Data and Language: How do they relate?{Comments Off on Big Data and Language: How do they relate?}


In an interview between WIRED.com and Martin Wattenberg, a mathmatician and computer scientist at IBM’s Watson Research Center in Cambridge, Wattenberg expresses the importance of “Big Data”. Wattenberg specializes in large texual data sets, meaning that he focuses on terabytes of language. Wired.com peruses Wattenberg’s brain in inquiring why is he so captivated by reading such data sets – Wattenberg points out the importance of language. Language is one of humanity’s core mediums in which we are able to read, explore, and encode our identity as human beings, much like the blog post I’m typing right now. For example, one may have a database in the petabytes of words, literature, books, and yet we can see that even twelve words from voltaire, and example given by Wattenberg, can hold a lifetime of experience. And so when there are petabytes upon petabytes of information to analyze, Wattenberg has created a visual representation of everyone’s favorite quick-information website: Wikipedia. The visual representation assigns a color to each word in the dictionary and then maps out the usage amount of each word – this helps Wattenberg analyze data easily.

Wattenberg, I believe, is no less than a genius. He is using language available freely to the world and rather than just gaining the benefit out of it on the surface level, like many of us do – Wattenberg is taking the opportunity to construct a diagnosis of the human state of mind to track patterns and behaviors. In his argument on a bar chart with numbers vs actual text, Wattenberg’s argument in saying that making a bar chart for a complex set of numbers is actually very easy and there are a million ways to do it. When it comes to analyzing language, Wattenberg states that it is a lot more difficult and that to aid the analyzation, one must visualize information. As a visual learner myself, I whole-heartedly agree to this statement and encourage Wattenberg to continue his study of human behavior.

Mark Horowitz (2008). Visualizing Big Data: Bar Chart for Words. [ONLINE] Available at: http://www.wired.com/science/discoveries/magazine/16-07/pb_visualizing. [Last Accessed Octorber 21, 2012].