HTML Elements Used to Detect Dangerous Web Pages

by Antonio M
This article was very interesting and fascinating to follow. It talked about a proposed
algorithm that would be used to detect malicious and hazardous web pages. This particular
algorithm uses the strings that appear in HTML elements rather then using an older
algorithm that actual checks the text parts within a web page to see if it is considered
a hazardous web page. According to the authors it can be a harder using a text based
algorithm to check for hazardous web pages because some web pages may not contain as
much text to evaluate.  There may also be some malicious links of pictures, pop-ups or other things that
aren’t exactly written on a web page. With out going into to much detail this HTML algorithm will look at the
HTML code within a webpage and it will then extract HTML elements(<body>,<p>,etc). Once these
HTML elements have been extracted they will then be parsed through into strings “with the
separating characters \t , . / ! ” = % & { } [ ] _” and so on, which can then help determine
which string is considered to be malicious and hazardous to a webpage. Once these strings have
been extracted there will then be the use of what the authors call a “Support Vector Machine” (SVM).
The SVM will then be trained to remember what the malicious HTML looked because it is the SVM
that is actually looking at the web page and determining whether it is a hazardous site or not.

I think this relates to our class because it talks about HTML and another way that it can be
used, I guess you can say indirectly in trying to determine if a web page is safe. I also think
if we are to ever make web designing a career we need to learn how to protect web sites and
the different types of techniques that can be used.

At first this article was a little hard to comprehend because they were a lot of references
to different kinds of algorithms that would use other kinds of algorithms to do there
calculations. But with some time and patience I was some what able to understand what was
going on. I was enlightened by the way that they did there calculations and by the fact
of just actually reading the HTML code to see if a web page is hazardous.

reference:
Ikeda, K.; Yanagihara, T.; Matsumoto, K.; Takishima, Y.; , “Detection of Hazardous Information
Based on HTML Elements,” Computing and Communication Technologies, Research, Innovation, and
Vision for the Future (RIVF), 2010 IEEE RIVF International Conference on , vol., no., pp.1-4,
1-4 Nov. 2010 doi: 10.1109/RIVF.2010.5633302

2 thoughts on “HTML Elements Used to Detect Dangerous Web Pages”

  1. I find it interesting that HTML5 is used to detect malicious sites. I would have thought that HTML5 would have been exploited to disrupt the user experience, but I guess that’s not the case. I hope it stays that way.

  2. Wow that’s so cool, I did not know HTML had an algorithm the ability to sniff out hazardous webpages, let alone having an algorithm. It is a good idea, because as it said in the article some pages don’t have a lot of text and just multimedia. Overall it is good to see that HTML has many faces, and I believe that is why everyone will be transitioning to HTML 5

Comments are closed.