HTML Elements Used to Detect Dangerous Web Pages{2}


by Antonio M
This article was very interesting and fascinating to follow. It talked about a proposed
algorithm that would be used to detect malicious and hazardous web pages. This particular
algorithm uses the strings that appear in HTML elements rather then using an older
algorithm that actual checks the text parts within a web page to see if it is considered
a hazardous web page. According to the authors it can be a harder using a text based
algorithm to check for hazardous web pages because some web pages may not contain as
much text to evaluate.  There may also be some malicious links of pictures, pop-ups or other things that
aren’t exactly written on a web page. With out going into to much detail this HTML algorithm will look at the
HTML code within a webpage and it will then extract HTML elements(<body>,<p>,etc). Once these
HTML elements have been extracted they will then be parsed through into strings “with the
separating characters \t , . / ! ” = % & { } [ ] _” and so on, which can then help determine
which string is considered to be malicious and hazardous to a webpage. Once these strings have
been extracted there will then be the use of what the authors call a “Support Vector Machine” (SVM).
The SVM will then be trained to remember what the malicious HTML looked because it is the SVM
that is actually looking at the web page and determining whether it is a hazardous site or not.

I think this relates to our class because it talks about HTML and another way that it can be
used, I guess you can say indirectly in trying to determine if a web page is safe. I also think
if we are to ever make web designing a career we need to learn how to protect web sites and
the different types of techniques that can be used.

At first this article was a little hard to comprehend because they were a lot of references
to different kinds of algorithms that would use other kinds of algorithms to do there
calculations. But with some time and patience I was some what able to understand what was
going on. I was enlightened by the way that they did there calculations and by the fact
of just actually reading the HTML code to see if a web page is hazardous.

reference:
Ikeda, K.; Yanagihara, T.; Matsumoto, K.; Takishima, Y.; , “Detection of Hazardous Information
Based on HTML Elements,” Computing and Communication Technologies, Research, Innovation, and
Vision for the Future (RIVF), 2010 IEEE RIVF International Conference on , vol., no., pp.1-4,
1-4 Nov. 2010 doi: 10.1109/RIVF.2010.5633302