Compress or not to Compress

by Joeydes M
Summary:

This
article is about compressing HTML documents to improve web traffic and ultimately reduce traffic. It will also reduce the storage requirements of HTML data. The research was conducted by using Lossless HTML Transform (LHT) which works with general compression algorithms in conjunction. The author talks about the two main parts to the research, “The main components of the
algorithm are: a static dictionary or a semi-static dictionary of frequent alphanumerical phrases, and binary encoding of popular patterns, like numbers, dates or IP addresses.” The article then goes on to talk about the two types of compression used to bundle  ith this technology, one is similar to that of a “zip” file in the deflate method and the second is PPMVC. The testing files were done on HTML documents without images as well. The author found that the algorithm used in conjunction with the standard
compression the data was compress further on an average of 17%.

Reflection:

I had mixed feelings towards this article, on one hand I like the idea of reducing storage space and enabling the web to be more  efficient; on the other hand, the research seems to be flawed. The author did not use any pages that have images and that are a very rare thing in today’s website development. The Web 2.0 initiative is also engulfed with graphics. Almost all, if not all, of the high traffic sites today on the net have an massive amount of images. I would recommend that this study be done again and that the test subjects contain images. I like the thought and I can appreciate the effort the logic is there and the facilitators are heading down the right path, there just needs to be more practical and concrete evidence to support going forward with the idea.

The article was a little confusing to me though because it didn’t really explain the back end of a site and how that would be compressed. I realize that databases are already fairly compressed, but the objects that the database refers to (images, music,
video, etc) that will still need a considerable amount of space. I would also suggest that maybe the researchers work toward reducing the size of the larger objects on the web, not the small kilobyte HTML documents.

Citation:

Skibinski, P.; , “Improving HTML Compression,” Data Compression Conference, 2008. DCC 2008 , vol., no., pp.545, 25-27 March 2008 doi: 10.1109/DCC.2008.74 URL: http://0-ieeexplore.ieee.org.opac.library.csupomona.edu/stamp/stamp.jsp?tp=&arnumber=4483372&isnumber=4483270

1 thought on “Compress or not to Compress”

  1. I was under the impression that we were already compressing this stuff. A common compressed image file we use already is JPEG, so I can see why they didn’t bother. Compressing a file also adds overhead, so an already compressed file can actually be bigger than the original if you try to compress it more. I agree with you that they should work on larger files so my movies load faster. I’m looking at you, Netflix.

Comments are closed.