Category: | Web Stuff |
Author/Contact Info | Briac Pilpré |
Description: | The following act can act as a HTML script filter, stripping Javascript, VBScript, JScript, PerlScript, etc. from the HTML code. This weeds out all the "scriptable" events from the HTML 4.01 specifications and all the <script> elements. It takes a filename as argument, or if there's no argument, read from STDIN. All the output is done on STDOUT. This piece of code should be pretty reliable, but I'd be interested to know if there's a flaw in this code. |
#!/usr/bin/perl -w use strict; use HTML::Parser; use vars qw(%attribs @elements); @attribs{qw( onblur onchange onclick ondblclick onfocus onkeydown onkeypress onkeyup onload onmousedown onmousemove onmouseout onmouseover onmouseup onreset onselect onsubmit onunload )} ++; @elements = qw( script ); my $parser = HTML::Parser->new( default_h => [ sub { print shift }, 'text' ], start_h => [ \&JSstrip, 'tagname, attr, attrseq' ], ignore_elements => \@elements, ); if ( $ARGV[0] ) { $parser->parse_file( $ARGV[0] ); } else { $parser->parse_file( \*STDIN ); } sub JSstrip { my ( $tagname, $attr, $attrseq ) = @_; print "<$tagname"; foreach (@$attrseq) { # The attribute is a script event handler unless ( exists $attribs{$_} ) { # I'm not sure if this regex is 100% reliable # (esp. in case of escaped quotes?) my $q = $attr->{$_} =~ /"/ ? "'" : '"'; print qq' $_=$q$attr->{$_}$q'; } } print ">"; } |
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Script Stripper
by Juerd (Abbot) on Dec 26, 2001 at 03:37 UTC | |
by japhy (Canon) on Dec 26, 2001 at 09:58 UTC | |
by Juerd (Abbot) on Dec 26, 2001 at 11:21 UTC |
Back to
Code Catacombs