New form of HTML Compression

by Taylor G
This article I choose to write about I found really interesting.  Since this web development class deals with an understanding of HTML I enjoyed this article.  The article talks about how HTML is a standard for web pages, but it has disadvantages like “verbosity”.  The article deals with compression, so their solution to this problem would be using data compression.  Deflate is a general compression algorithm not tailored for HTML documents.  So they say that with a better compression algorithm based only for HTML documents would be able to achieve a much better compression ratio.  The main goal of their research was to find an efficient way to compress HTML documents, which in the long run would reduce internet traffic, and the storage of HTML.  They named their algorithm ‘Lossless HTML Transform’ (LHT).  They talked about two different dictionaries, a static dictionary and a semi-static dictionary.  Each dictionary and version has its own disadvantages.  Static LHT has a fixed English dictionary that it uses for compression.  Semi-static LHT doesn’t allow streams and it requires two passes over an input file.  The authors say that their compression algorithm can be combined with a general compression algorithm, in their case they used Deflate and PPMVC.  PPMVC achieves a very good compression ratio for a short amount of time and without the use of a lot of memory.  In their experiments they used HTML files without images from the internet.  The size of the files ranged from 5kB to 170kB.  As a result, compared to general compression algorithms, LHT improved HTML compression by an average of 17% while using Deflate and nearly 8% for PPMVC.

I found this article to be very appealing since most of us use the internet and download HTML pages on a daily basis.  When we are loading these pages we don’t think about what is going on in the background, like how this information is transferred to your computer, what kind of programs are being used on the servers or to send this information across a data line.  One of the main reasons I enjoy the CIS major so much is because I am able to learn how the technology we are using on a daily basis works.  This article talks about something really simple, but has the capability to save time, money, and resources over a large scale, and that was appealing enough for me.  I hope you really enjoyed this article as much as I did, and I would recommend you read the short article yourself.

Skibinski, P. (2008). Improving html compression. Data Compression Conference, 545. Retrieved from