Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

HTML Utility

by vroom (His Eminence)
on May 25, 2000 at 20:16 UTC ( [id://14790]=sourcecodesection: print w/replies, xml ) Need Help??
Create HTML files using perl on the fly
on Nov 20, 2007 at 01:18 UTC
by sKore
Create HTML files using perl on the fly.
Prst
on Mar 11, 2007 at 23:14 UTC
by Sixtease

Prst stands for "Preprocessor for Static HTML".

Prst is for generating a HTML page with a lot of code that would be cumbersome to write by hand. It's like PHP only you write in Perl and it's not intended for dynamic page generation.

To generate a webpage, you write a template where inside <% %> tags you call perl functions and their output is substituted for the tag.

See Prst's webpage.

Format POD as XHTML with embedded stylesheet
on Jun 09, 2006 at 20:39 UTC
by TGI
A simple script to convert pod into xhtml with an embedded stylesheet. I tend to write most of my notes in pod instead of using MS Word or similar tools, and I often need to provide a self contained html file to coworkers so that they can read my notes.
NetIQ Report Parser
on Mar 09, 2006 at 18:28 UTC
by JSchmitz
This is a script that uses the HTML::TableExtract module that Mojotoad wrote. It is very handy for stripping out just the error messages in NetIQ reports for emailing them out. Matt helped me with this a lot so I have to give him props for this one.
Lowercase link text
on Aug 20, 2005 at 23:15 UTC
by parv

I had enough of mixed case titles in my bookmarks. Compared to upper case characters, lower case characters occupy less screen space in menus (in variable width font). Thus, the genesis of the following program.

Using HTML::TreeBuilder, following program lcs the link text. In addition, one can specify one's own code to be run on each text and URL; use -text and -url options.

Since there are some known bugs in the HTML::TreeBuilder module, running tidy on the output (file) is recommended.

UPDATE, Aug 21 2005:
-- Corrected error in code string compilation (related to symbolic references (oops!)).
-- Moved the options handling near the end.

html2code.pl
on Aug 20, 2005 at 06:31 UTC
by tcf03
a utility to translate an email or any text to HTML numeric entities. I generally use this to translate email addresses on web pages into numeric entities - the thought is that it may keep spiders from grabbing email addresses. Im not exactly sure if it does or not, but ignorance is bliss ;) It doesnt tolerate undefined characters.
Yet another POD to HTML converter
on Mar 12, 2005 at 09:54 UTC
by Corion

Thilosophy was looking for a converter of pod to HTML. I uploaded mine, which mostly generates a local HTML tree with relative internal links.

Steal the corresponding css from http://www.corion.net/perl-dev/style.css.

Yet another HTML diff
on Sep 03, 2004 at 12:17 UTC
by zby
Compares two versions of a HTML page. The output is a list of "What is New" on the second one and is constructed so that can be safely included on a web page, but still contains some basic formatting, link and image relative addresses are fixed. The output contains as well some context information usefull for filtering changes (for example to reject changes that contain only the publication date or random quotes).

It's all very heuristic and there is no guarantee it would work for all sites, but I hope with some tuning it can be usefull for most. For me it worked much better than HTML::Diff. It does the parsing totally separately from the diffing, thus they both can be independently fine tuned. For parsing it uses HTML::PullParser so it shoudl be more reliable than the other diffs that use only regexp for the parsing.

It is a work in progress. I use it as a basis for my Active Bookmarks change aggregator.

tmpldoc.pl - generate documentation for HTML::Template templates
on May 18, 2004 at 17:44 UTC
by LTjake

I noitced the post "Can I automatically generate documentation from HTML::Template?" yesterday and thought that it was a neat idea. So, i whipped this script up.

Example Usage: tmpldoc.pl test.tmpl | pod2html > test.html

Update: jeffa said it should use HTML::Template -- DUH! :)

Screen-scraping using XSH - O'Reilly Animal lister
on Oct 22, 2003 at 21:32 UTC
by merlyn
Using the XSH language, screen-scrape O'Reilly's "Animals" page, generating a new XML file showing the list organized alphabetically by animals and the covers that use that animal.

From a forthcoming Linux Magazine column of mine.

The output looks like:

<root> .. <cover> <animal>Turtle</animal> <book>Using and Managing PPP</book> </cover> <cover> <animal>Victoria crowned pigeons</animal> <book>lex &amp; yacc</book> </cover> <cover> <animal>Wall creepers</animal> <book>Transact-SQL Programming </book> </cover> <cover> <animal>Wallaby &amp; joey</animal> <book>Enterprise JavaBeans</book> <book>WebLogic Server 6.1 Workbook for Enterprise Javabeans</book> <book>WebSphere 4.0 AEs Workbook for Enterprise Javabeans</book> </cover> <cover> <animal>Warriors</animal> <book>Security Warrior</book> </cover> <cover> <animal>Weasel</animal> <book>Web Design in a Nutshell</book> </cover> .. </root>
Bulk HTML Munging
on Aug 13, 2003 at 05:16 UTC
by Ovid

At my last job, we had a problem whereby many static HTML documents needed their footer replaced with a server side include, but the documents had been coded by hand and the HTML in the footers was very irregular. Because there were hundreds of documents, I wrote the following tool to allow for bulk matching and replacing of messy HTML. It's more powerful than you might think, so please read the POD for more information. The actual code is a wee bit sloppy as I had to get this written quickly. This is real, live production code from yours truly :)

Many thanks to ONSITE! Technology, Inc. for giving me permission to release this as open source.

(Trivia note: this program inspired $bad_names eq $bad_design)

makebnlinks
on Jul 22, 2003 at 07:19 UTC
by jaldhar

If you are a Barnes & Noble affiliate, you'll find this script handy. It allows you to make a link to a book on the site without going through their clunky web form. You can also customize the output unlike on the B & N site. This script only does books but I'm going to do a CPAN module which will support all the types of merchandise B & N sells. I'm thinking of calling it Business::Bnaffiliate. Does that sound ok?

Can-o-Raid v1.0
on May 15, 2003 at 13:11 UTC
by hacker
Can-o-Raid is an offensive CGI that will pollute web-based email address harvester's data stores with thousands upon thousands of fake (non-existant) email addresses. The script is re-entrant, but doesn't look like it to the harvesters.

What it does, is generate a page of fake email addresses, which all "look" perfectly valid, but aren't. Many of the addresses shown in the page are mailto links, which lead nowhere, and others LOOK like mailto links, but are actually hrefs back into the script itself, trapping the harvester. A recent scan of my web logs shows one harvester getting 21,598 hits to this page in a night, which is roughly 4,319,600 fake email addresses that I stuffed their system with.

The benefit of this script, is that those fake email addresses will eventually overpopulate the "real" email addresses they have stored. If they sell their collection of email addresses to someone else, most of their collection will be junk, invalid. Eventually they'll have to delete their entire database of email addresses, and start again. Also, trying to deliver to a non-existant domain with a non-existant email address will slow down the delivery with millions of bogus DNS queries.

You can see this in action here. Hit reload a few times, and look VERY closely at some of those links.

This can certainly be improved and probably refactored, patches are welcome. I've forgotten where I got the idea for this, so apologies to whomever started me down this path, but here's the code thus far. Enjoy.

Update: Reduced the number of unnecessary comments (thanks halley)

Update: Added LAI's fix using map();

HTML::Scrubber - Perl extension for scrubbing/sanitizing html
on Apr 18, 2003 at 10:26 UTC
by PodMaster
use HTML::Scrubber; my $html = q[ <B> bold </B> <i> italic </i> <u> underlined </u> ]; my $scrubber = HTML::Scrubber->new( allow => [ qw[ p b i u hr br ] ] ); print $scrubber->scrub($html); $scrubber->deny( qw[ p b i u hr br ] ); print $scrubber->scrub($html); __END__
bold italic underlined bold italic underlined

Grab a vanilla tarball here.

update:
I've done the TODO, and uploaded HTML-Scrubber-0.02.tar.gz to CPAN. Version 0.01 kept here for historical purposes.

SlurpPal v1.0
on Apr 11, 2003 at 17:14 UTC
by hacker
I was discussing on ChatterBox several different approaches to creating a Donation Tracking "Thermometer" for tracking our community donations to some of our Free Software projects, with the intent of graphing a bandwidth-over-donations display for the users of the various projects. SlurpPal (name subject to change) is the result of some of that code to automate this process.

SlurpPal v1.0 will log into PayPal, "click" the History button, fill out a small form there to set the display History values to the past year of donations for a specified project (based on the email address registered in PayPal with that project), and then print the results. It uses WWW::Mechanize and HTML::TableExtractor to get the bits it needs.

I'm going to be integrating this with some actual graphs (or graphics) to draw the thermometer as discussed in the node above soon. Expect that in version 1.1 of SlurpPal.

Comments, optimizations, criticism, and discussion welcome. Thanks go to (no particular order) tye, bart, Corion, castaway, arturo, and others I may have forgotten.

Photo Album HTML Adder
on Oct 08, 2002 at 09:44 UTC
by DarkSniper
I wrote this code a few months ago because i need to add photogalleries to my site. Instead of doing 15pages manual i let it do it by it self. improvements can be done,especially with update_menus() and the progress meters. It is a massive code i know but im kinda proud of it :). Comments?
HTML::LinkExtractor
on Aug 20, 2002 at 14:04 UTC
by PodMaster
Finally, a better link extractor, in a module, HTML::LinkExtractor (does the things people wished HTML::LinkExtor did )

See pod for description and documentation.

Use pod2html with a patched version Pod::Html which correctly interprets <a href="">f</a> in verbatim blocks (my mail to perl5 porters).

update:
I do have a HTML::TokeParser::Simple version of this ;D

and later i fixed a typo

UPDATE: Mon Aug 26 11:09:37 2002 GMT
I just put it up on CPAN (version 0.04). Enjoy HTML::LinkExtractor

HTML::TokeParser::Listerine
on Apr 13, 2002 at 19:04 UTC
by Amoe

Makes HTML::TokeParser return a list when get_tag and get_token are called in list context. Other than that, identical to using a while to iterate over. It's to enable me to say:

my @links = map { $_->[1]{href} } $parser->get_tag('a')

And expect it to work. Sating the addiction of map-junkies. :)

This code would possibly be better applied to HTML::PullParser, but if I applied it to that I'd have to reimplement get_tag and do some other stuff which I don't want to. I think, anyway.

Extracting information from the SETI@Home PM group
on Dec 11, 2001 at 20:51 UTC
by Rhose
I had never used the HTML::TableExtract module, so I created this script as a learning experience. As normal, if anyone has suggestions on things which I could do better, I would love to read them.
Web Deployment Schemes
on Nov 25, 2001 at 01:13 UTC
by Rich36

These applications allow the user to store deployment schemes in a database and use that information to FTP the files to a remote location.
The idea behind this code is to have a method to easily send a set of files for a web page out to an ftp server and to be able to resend the set of files when necessary.

add2deploy.pl - store the deployment schemes in a database
deploy.pl - ftp the files to the specified remote location.

See the POD for more information. These applications have only been tested on the Windows platform

UPDATE: Added checking for binary/acsii file in deploy.pl (19:35 11/24/01).

SCS2CSS
on Nov 16, 2001 at 09:55 UTC
by staeryatz
This is a utility for a larger program of mine, Webcpp (http://webcpp.sf.net). This utility will convert Webcpp's native colour schemes (*.scs) to CSS, which is a new compatible scheme format for the Webcpp 0.6+ series.
XML Pretty Printer
on Sep 04, 2001 at 02:21 UTC
by OeufMayo

This is a small script that turns a valid XML file into a colorful HTML file! Yay!

Some handlers have not been used (most notably entities, notations), but they should be, eventually.

Update Tue Sep 4 07:46:13 UTC 2001: added mirod's suggestion. Thanks mirod!

html2pyx
on Aug 31, 2001 at 03:13 UTC
by OeufMayo

Pyxie is an alternative way of representing XML datas. These datas are represented in a really simple way, one information per line.
The nice thing about PYX is the ease of parsing the informations you get, on the other hand, there are a lot of features found in the XML format that can't be representated by PYX (CDATA, entities,...)

Now, I know the module XML::PYX exists, and it even comes with a script called pyxhtml, which does pretty much what this code does.
But XML::PYX per se isn't really flexible if you want a finer control over what's being kept or not in the HTML file.

Hopefully, this code can be easily customized to suit your needs, provided you know how to use HTML::Parser (which is really fun to use, especially the v.3).

And the really cool thing is that your HTML doesn't have to be a valid XML file! (I wouldn't try to feed it Word 2000 pseudo-HTML though...)

More infos on PYX

Very Flexible HTML Template System
on Jun 19, 2001 at 21:09 UTC
by Torgo
This is a little bit of Perl code that can be included into any CGI script or HTML file generator that has been an absolute life-saver for me, both at home and at work. I'm new to the site, so I thought I'd share it with yous all.
HTML To ASP Converter
on Apr 30, 2001 at 21:48 UTC
by patgas
This script grabs an HTML file, and converts it into a VBScript Response.Write command for use in ASP pages. Allows custom levels of indenting, and does proper double-quote escaping. Simple, really, but I find myself using it all the time.
DBIx::XHTML_Table
on Apr 30, 2001 at 08:51 UTC
by jeffa
This is now available as the CPAN module DBIx::XHTML_Table. Get it at CPAN or this cool mirror. Feel free to visit the homepage. The code posted here is left for others to point and laugh at. :D
Update HTML Doc
on Apr 16, 2001 at 16:09 UTC
by Rudif
ActiveState Perl installer creates a html tree and a TOC file for access to the perl documentation from a browser.
The PPM updates this tree and the TOC when installing packages.

However, in several circumstances you may wish to use this script here, to convert pod found in module files to html and/or to update the TOC:
  1. you added html files found on the web
  2. you installed a module from CPAN whose files contain pod
  3. you installed your own scripts or modules containing pod
Update: the ryddler's 'quick and dirty utility' that I started from was originally posted right here at PM. Thanks to $code or die for making the connection.

Web Color Spectrum Generator
on Apr 06, 2001 at 21:01 UTC
by extremely
This is a simple little color generator much like the ones discussed in this node Shading with HTML colors - color_munge. This one can do spectral rotation from red to green to blue without shifting brightness or can do all kinds of wacky color shifts. It can go thru the spectrum in either direction too. I'll post the code on my website too, and maybe even a CGI that you can tinker with. As a bonus I'll put up the original code for you to laugh at on the site this weekend.
Make and index html doc files
on Mar 03, 2001 at 23:11 UTC
by Rudif
Script pods2htmlextracts the pod documentation from a multitude of pod, pm and pl files in a source directory tree into the corresponding html files. It will create/update a html directory tree, populate it with html files, and optionally create an index file and a 2-frame browser frameset with the index in the l.h. frame and the current html file in the r.h. frame.

My script is an extension of script of same name which is distributed with the module Pod-Tree-1.06 by Steven McDougall.

I added the option and code that generates the 2-frame frameset similar to that used in ActiveState Perl doc.
I also fixed a few minor problems, documented in my script.

To install, drop the script below into a directory that is in your path and name it pods2html.pl. Next, install the prerequisite modules from CPAN: Pod-Tree and HTML-Stream.
To create or update a html doc tree from pods in your perl work directory, invoke
pods2html <workdir> <htmldir> --frames
To view the html doc index, point your browser to file <htmldir>/default.html.

Rudif
Template HTML
on Feb 13, 2001 at 21:00 UTC
by thealienz1

Takes a directory, and all sub directories, of files, and copys and parses them to a template HTML file.

Used for a site I made where the people were to lazy to jsut insert the template into each page, but this make it easier to change if you change the template again.

Yes I understand that SSI can be used, but I still lazy to do that too... ENJOY!

Change Absolute to Relative links in HTML files
on Feb 05, 2001 at 02:22 UTC
by dkubb

This utility will recurse through a specified directory, parse all the .htm and .html files, and replace any absolute URL's with relative URL's to a base you define.

You can also specify what types of links to parse: img, src, action, or any others. Please see HTML::Tagset's %linkElements hash, in the module's source, for a precise breakdown of supported tag-types.

This program was good practice for trying out Getopt::Declare, an excellent command-line parser. Please note the parameter specification below the __DATA__ tag.

Disclaimer: Always use the -b switch to force backups, just in case you have non-standard HTML and the HTML::TreeBuilder parser mangles it.

Comments and suggestions for improvement are always welcome and very much appreciated.

shtml publisher
on Jan 25, 2001 at 20:17 UTC
by willdooUK
Utility to expand Include statements in html files, allowing them to be viewed without running a web server.
Image to table converter
on Sep 30, 2000 at 08:11 UTC
by bastard
I hacked this thing together during my quest to get around the "no images on the home node under level 5" rule. (yes i know there are other ways) I'm not sure how useful it is, but since someone requested it i'll post it here in case anyone else is interested. (I suppose the code could also provide a simple example of the use of the GD image module.)

What does it do you may ask? Basically it converts an image to a relatively optimized table representation of the image. It accepts one parameter which is the image file you are going to convert. It dumps the table to STDOUT. It can accept the following image types: PNG, JPEG, XPM and GD2

Warning, this will create very large and complex tables. I have created a 120k table from 6k PNG image, so this thing is not appropriate for larger images. (before the COLSPAN enhancements it could generate tables many times larger)

Gtk+ HTML Tree Viewer
on Sep 20, 2000 at 21:17 UTC
by mdillon

this is a rewrite of a utility i did for a job where i was using HTML::TreeBuilder and XML::XPath to parse and search normal HTML documents using the powerful XPath query language.

this utility uses HTML::TreeBuilder to parse an HTML document from a URL specified on the command line or from an internal browser location line and displays it as a Gtk+ Tree in a window. only subtrees with text nodes or anchors are expanded.

there are (simple) XPath queries displayed in the status bar that could be used to extract that node from the document (for example, by converting it to XHTML with HTML::TreeBuilder and then using XML::XPath, or by traversing the TreeBuilder parse tree and programmatically constructing an XPath parse tree).

it's probably not a bad example of simple Gtk+ GUI programming. more may be yet to come in the way of functionality (and comments).

this was written and tested against Gtk 0.7003.

there is support for using GtkHTML as well, if your installation is functional (mine was partially functional when i wrote the code, but stopped working after i upgraded from GtkHTML 0.4 to 0.6.1 and recompiled Gtk::HTML)

most recently updated: 24 Sep 2000

delirium
on Jul 05, 2000 at 03:23 UTC
by beppu
a filter to make your HTML delirioius
Automatic CODE-tag creation (Prototype)
on Jun 21, 2000 at 20:28 UTC
by Corion
Out of a discussion about how we can prevent newbies from posting unreadable rubbish, here is a program that tries to apply some heuristics to make posts more readable. This version isn't the most elegant, so it's called a prototype.
Random Color Generator
on Feb 03, 2000 at 08:05 UTC
by Elihu
This is a cgi script that generates an 8 by 8 grid of random colors with their appropriate hex values. Useful for picking colors for web pages.
Create HTML files using perl on the fly
on Nov 20, 2007 at 01:18 UTC
by sKore
Create HTML files using perl on the fly.
Prst
on Mar 11, 2007 at 23:14 UTC
by Sixtease

Prst stands for "Preprocessor for Static HTML".

Prst is for generating a HTML page with a lot of code that would be cumbersome to write by hand. It's like PHP only you write in Perl and it's not intended for dynamic page generation.

To generate a webpage, you write a template where inside <% %> tags you call perl functions and their output is substituted for the tag.

See Prst's webpage.

Format POD as XHTML with embedded stylesheet
on Jun 09, 2006 at 20:39 UTC
by TGI
A simple script to convert pod into xhtml with an embedded stylesheet. I tend to write most of my notes in pod instead of using MS Word or similar tools, and I often need to provide a self contained html file to coworkers so that they can read my notes.
NetIQ Report Parser
on Mar 09, 2006 at 18:28 UTC
by JSchmitz
This is a script that uses the HTML::TableExtract module that Mojotoad wrote. It is very handy for stripping out just the error messages in NetIQ reports for emailing them out. Matt helped me with this a lot so I have to give him props for this one.
Lowercase link text
on Aug 20, 2005 at 23:15 UTC
by parv

I had enough of mixed case titles in my bookmarks. Compared to upper case characters, lower case characters occupy less screen space in menus (in variable width font). Thus, the genesis of the following program.

Using HTML::TreeBuilder, following program lcs the link text. In addition, one can specify one's own code to be run on each text and URL; use -text and -url options.

Since there are some known bugs in the HTML::TreeBuilder module, running tidy on the output (file) is recommended.

UPDATE, Aug 21 2005:
-- Corrected error in code string compilation (related to symbolic references (oops!)).
-- Moved the options handling near the end.

html2code.pl
on Aug 20, 2005 at 06:31 UTC
by tcf03
a utility to translate an email or any text to HTML numeric entities. I generally use this to translate email addresses on web pages into numeric entities - the thought is that it may keep spiders from grabbing email addresses. Im not exactly sure if it does or not, but ignorance is bliss ;) It doesnt tolerate undefined characters.
Yet another POD to HTML converter
on Mar 12, 2005 at 09:54 UTC
by Corion

Thilosophy was looking for a converter of pod to HTML. I uploaded mine, which mostly generates a local HTML tree with relative internal links.

Steal the corresponding css from http://www.corion.net/perl-dev/style.css.

Yet another HTML diff
on Sep 03, 2004 at 12:17 UTC
by zby
Compares two versions of a HTML page. The output is a list of "What is New" on the second one and is constructed so that can be safely included on a web page, but still contains some basic formatting, link and image relative addresses are fixed. The output contains as well some context information usefull for filtering changes (for example to reject changes that contain only the publication date or random quotes).

It's all very heuristic and there is no guarantee it would work for all sites, but I hope with some tuning it can be usefull for most. For me it worked much better than HTML::Diff. It does the parsing totally separately from the diffing, thus they both can be independently fine tuned. For parsing it uses HTML::PullParser so it shoudl be more reliable than the other diffs that use only regexp for the parsing.

It is a work in progress. I use it as a basis for my Active Bookmarks change aggregator.

tmpldoc.pl - generate documentation for HTML::Template templates
on May 18, 2004 at 17:44 UTC
by LTjake

I noitced the post "Can I automatically generate documentation from HTML::Template?" yesterday and thought that it was a neat idea. So, i whipped this script up.

Example Usage: tmpldoc.pl test.tmpl | pod2html > test.html

Update: jeffa said it should use HTML::Template -- DUH! :)

Screen-scraping using XSH - O'Reilly Animal lister
on Oct 22, 2003 at 21:32 UTC
by merlyn
Using the XSH language, screen-scrape O'Reilly's "Animals" page, generating a new XML file showing the list organized alphabetically by animals and the covers that use that animal.

From a forthcoming Linux Magazine column of mine.

The output looks like:

<root> .. <cover> <animal>Turtle</animal> <book>Using and Managing PPP</book> </cover> <cover> <animal>Victoria crowned pigeons</animal> <book>lex &amp; yacc</book> </cover> <cover> <animal>Wall creepers</animal> <book>Transact-SQL Programming </book> </cover> <cover> <animal>Wallaby &amp; joey</animal> <book>Enterprise JavaBeans</book> <book>WebLogic Server 6.1 Workbook for Enterprise Javabeans</book> <book>WebSphere 4.0 AEs Workbook for Enterprise Javabeans</book> </cover> <cover> <animal>Warriors</animal> <book>Security Warrior</book> </cover> <cover> <animal>Weasel</animal> <book>Web Design in a Nutshell</book> </cover> .. </root>
Bulk HTML Munging
on Aug 13, 2003 at 05:16 UTC
by Ovid

At my last job, we had a problem whereby many static HTML documents needed their footer replaced with a server side include, but the documents had been coded by hand and the HTML in the footers was very irregular. Because there were hundreds of documents, I wrote the following tool to allow for bulk matching and replacing of messy HTML. It's more powerful than you might think, so please read the POD for more information. The actual code is a wee bit sloppy as I had to get this written quickly. This is real, live production code from yours truly :)

Many thanks to ONSITE! Technology, Inc. for giving me permission to release this as open source.

(Trivia note: this program inspired $bad_names eq $bad_design)

makebnlinks
on Jul 22, 2003 at 07:19 UTC
by jaldhar

If you are a Barnes & Noble affiliate, you'll find this script handy. It allows you to make a link to a book on the site without going through their clunky web form. You can also customize the output unlike on the B & N site. This script only does books but I'm going to do a CPAN module which will support all the types of merchandise B & N sells. I'm thinking of calling it Business::Bnaffiliate. Does that sound ok?

Can-o-Raid v1.0
on May 15, 2003 at 13:11 UTC
by hacker
Can-o-Raid is an offensive CGI that will pollute web-based email address harvester's data stores with thousands upon thousands of fake (non-existant) email addresses. The script is re-entrant, but doesn't look like it to the harvesters.

What it does, is generate a page of fake email addresses, which all "look" perfectly valid, but aren't. Many of the addresses shown in the page are mailto links, which lead nowhere, and others LOOK like mailto links, but are actually hrefs back into the script itself, trapping the harvester. A recent scan of my web logs shows one harvester getting 21,598 hits to this page in a night, which is roughly 4,319,600 fake email addresses that I stuffed their system with.

The benefit of this script, is that those fake email addresses will eventually overpopulate the "real" email addresses they have stored. If they sell their collection of email addresses to someone else, most of their collection will be junk, invalid. Eventually they'll have to delete their entire database of email addresses, and start again. Also, trying to deliver to a non-existant domain with a non-existant email address will slow down the delivery with millions of bogus DNS queries.

You can see this in action here. Hit reload a few times, and look VERY closely at some of those links.

This can certainly be improved and probably refactored, patches are welcome. I've forgotten where I got the idea for this, so apologies to whomever started me down this path, but here's the code thus far. Enjoy.

Update: Reduced the number of unnecessary comments (thanks halley)

Update: Added LAI's fix using map();

HTML::Scrubber - Perl extension for scrubbing/sanitizing html
on Apr 18, 2003 at 10:26 UTC
by PodMaster
use HTML::Scrubber; my $html = q[ <B> bold </B> <i> italic </i> <u> underlined </u> ]; my $scrubber = HTML::Scrubber->new( allow => [ qw[ p b i u hr br ] ] ); print $scrubber->scrub($html); $scrubber->deny( qw[ p b i u hr br ] ); print $scrubber->scrub($html); __END__
bold italic underlined bold italic underlined

Grab a vanilla tarball here.

update:
I've done the TODO, and uploaded HTML-Scrubber-0.02.tar.gz to CPAN. Version 0.01 kept here for historical purposes.

SlurpPal v1.0
on Apr 11, 2003 at 17:14 UTC
by hacker
I was discussing on ChatterBox several different approaches to creating a Donation Tracking "Thermometer" for tracking our community donations to some of our Free Software projects, with the intent of graphing a bandwidth-over-donations display for the users of the various projects. SlurpPal (name subject to change) is the result of some of that code to automate this process.

SlurpPal v1.0 will log into PayPal, "click" the History button, fill out a small form there to set the display History values to the past year of donations for a specified project (based on the email address registered in PayPal with that project), and then print the results. It uses WWW::Mechanize and HTML::TableExtractor to get the bits it needs.

I'm going to be integrating this with some actual graphs (or graphics) to draw the thermometer as discussed in the node above soon. Expect that in version 1.1 of SlurpPal.

Comments, optimizations, criticism, and discussion welcome. Thanks go to (no particular order) tye, bart, Corion, castaway, arturo, and others I may have forgotten.

Photo Album HTML Adder
on Oct 08, 2002 at 09:44 UTC
by DarkSniper
I wrote this code a few months ago because i need to add photogalleries to my site. Instead of doing 15pages manual i let it do it by it self. improvements can be done,especially with update_menus() and the progress meters. It is a massive code i know but im kinda proud of it :). Comments?
HTML::LinkExtractor
on Aug 20, 2002 at 14:04 UTC
by PodMaster
Finally, a better link extractor, in a module, HTML::LinkExtractor (does the things people wished HTML::LinkExtor did )

See pod for description and documentation.

Use pod2html with a patched version Pod::Html which correctly interprets <a href="">f</a> in verbatim blocks (my mail to perl5 porters).

update:
I do have a HTML::TokeParser::Simple version of this ;D

and later i fixed a typo

UPDATE: Mon Aug 26 11:09:37 2002 GMT
I just put it up on CPAN (version 0.04). Enjoy HTML::LinkExtractor

HTML::TokeParser::Listerine
on Apr 13, 2002 at 19:04 UTC
by Amoe

Makes HTML::TokeParser return a list when get_tag and get_token are called in list context. Other than that, identical to using a while to iterate over. It's to enable me to say:

my @links = map { $_->[1]{href} } $parser->get_tag('a')

And expect it to work. Sating the addiction of map-junkies. :)

This code would possibly be better applied to HTML::PullParser, but if I applied it to that I'd have to reimplement get_tag and do some other stuff which I don't want to. I think, anyway.

Extracting information from the SETI@Home PM group
on Dec 11, 2001 at 20:51 UTC
by Rhose
I had never used the HTML::TableExtract module, so I created this script as a learning experience. As normal, if anyone has suggestions on things which I could do better, I would love to read them.
Web Deployment Schemes
on Nov 25, 2001 at 01:13 UTC
by Rich36

These applications allow the user to store deployment schemes in a database and use that information to FTP the files to a remote location.
The idea behind this code is to have a method to easily send a set of files for a web page out to an ftp server and to be able to resend the set of files when necessary.

add2deploy.pl - store the deployment schemes in a database
deploy.pl - ftp the files to the specified remote location.

See the POD for more information. These applications have only been tested on the Windows platform

UPDATE: Added checking for binary/acsii file in deploy.pl (19:35 11/24/01).

SCS2CSS
on Nov 16, 2001 at 09:55 UTC
by staeryatz
This is a utility for a larger program of mine, Webcpp (http://webcpp.sf.net). This utility will convert Webcpp's native colour schemes (*.scs) to CSS, which is a new compatible scheme format for the Webcpp 0.6+ series.
XML Pretty Printer
on Sep 04, 2001 at 02:21 UTC
by OeufMayo

This is a small script that turns a valid XML file into a colorful HTML file! Yay!

Some handlers have not been used (most notably entities, notations), but they should be, eventually.

Update Tue Sep 4 07:46:13 UTC 2001: added mirod's suggestion. Thanks mirod!

html2pyx
on Aug 31, 2001 at 03:13 UTC
by OeufMayo

Pyxie is an alternative way of representing XML datas. These datas are represented in a really simple way, one information per line.
The nice thing about PYX is the ease of parsing the informations you get, on the other hand, there are a lot of features found in the XML format that can't be representated by PYX (CDATA, entities,...)

Now, I know the module XML::PYX exists, and it even comes with a script called pyxhtml, which does pretty much what this code does.
But XML::PYX per se isn't really flexible if you want a finer control over what's being kept or not in the HTML file.

Hopefully, this code can be easily customized to suit your needs, provided you know how to use HTML::Parser (which is really fun to use, especially the v.3).

And the really cool thing is that your HTML doesn't have to be a valid XML file! (I wouldn't try to feed it Word 2000 pseudo-HTML though...)

More infos on PYX

Very Flexible HTML Template System
on Jun 19, 2001 at 21:09 UTC
by Torgo
This is a little bit of Perl code that can be included into any CGI script or HTML file generator that has been an absolute life-saver for me, both at home and at work. I'm new to the site, so I thought I'd share it with yous all.
HTML To ASP Converter
on Apr 30, 2001 at 21:48 UTC
by patgas
This script grabs an HTML file, and converts it into a VBScript Response.Write command for use in ASP pages. Allows custom levels of indenting, and does proper double-quote escaping. Simple, really, but I find myself using it all the time.
DBIx::XHTML_Table
on Apr 30, 2001 at 08:51 UTC
by jeffa
This is now available as the CPAN module DBIx::XHTML_Table. Get it at CPAN or this cool mirror. Feel free to visit the homepage. The code posted here is left for others to point and laugh at. :D
Update HTML Doc
on Apr 16, 2001 at 16:09 UTC
by Rudif
ActiveState Perl installer creates a html tree and a TOC file for access to the perl documentation from a browser.
The PPM updates this tree and the TOC when installing packages.

However, in several circumstances you may wish to use this script here, to convert pod found in module files to html and/or to update the TOC:
  1. you added html files found on the web
  2. you installed a module from CPAN whose files contain pod
  3. you installed your own scripts or modules containing pod
Update: the ryddler's 'quick and dirty utility' that I started from was originally posted right here at PM. Thanks to $code or die for making the connection.

Web Color Spectrum Generator
on Apr 06, 2001 at 21:01 UTC
by extremely
This is a simple little color generator much like the ones discussed in this node Shading with HTML colors - color_munge. This one can do spectral rotation from red to green to blue without shifting brightness or can do all kinds of wacky color shifts. It can go thru the spectrum in either direction too. I'll post the code on my website too, and maybe even a CGI that you can tinker with. As a bonus I'll put up the original code for you to laugh at on the site this weekend.
Make and index html doc files
on Mar 03, 2001 at 23:11 UTC
by Rudif
Script pods2htmlextracts the pod documentation from a multitude of pod, pm and pl files in a source directory tree into the corresponding html files. It will create/update a html directory tree, populate it with html files, and optionally create an index file and a 2-frame browser frameset with the index in the l.h. frame and the current html file in the r.h. frame.

My script is an extension of script of same name which is distributed with the module Pod-Tree-1.06 by Steven McDougall.

I added the option and code that generates the 2-frame frameset similar to that used in ActiveState Perl doc.
I also fixed a few minor problems, documented in my script.

To install, drop the script below into a directory that is in your path and name it pods2html.pl. Next, install the prerequisite modules from CPAN: Pod-Tree and HTML-Stream.
To create or update a html doc tree from pods in your perl work directory, invoke
pods2html <workdir> <htmldir> --frames
To view the html doc index, point your browser to file <htmldir>/default.html.

Rudif
Template HTML
on Feb 13, 2001 at 21:00 UTC
by thealienz1

Takes a directory, and all sub directories, of files, and copys and parses them to a template HTML file.

Used for a site I made where the people were to lazy to jsut insert the template into each page, but this make it easier to change if you change the template again.

Yes I understand that SSI can be used, but I still lazy to do that too... ENJOY!

Change Absolute to Relative links in HTML files
on Feb 05, 2001 at 02:22 UTC
by dkubb

This utility will recurse through a specified directory, parse all the .htm and .html files, and replace any absolute URL's with relative URL's to a base you define.

You can also specify what types of links to parse: img, src, action, or any others. Please see HTML::Tagset's %linkElements hash, in the module's source, for a precise breakdown of supported tag-types.

This program was good practice for trying out Getopt::Declare, an excellent command-line parser. Please note the parameter specification below the __DATA__ tag.

Disclaimer: Always use the -b switch to force backups, just in case you have non-standard HTML and the HTML::TreeBuilder parser mangles it.

Comments and suggestions for improvement are always welcome and very much appreciated.

shtml publisher
on Jan 25, 2001 at 20:17 UTC
by willdooUK
Utility to expand Include statements in html files, allowing them to be viewed without running a web server.
Image to table converter
on Sep 30, 2000 at 08:11 UTC
by bastard
I hacked this thing together during my quest to get around the "no images on the home node under level 5" rule. (yes i know there are other ways) I'm not sure how useful it is, but since someone requested it i'll post it here in case anyone else is interested. (I suppose the code could also provide a simple example of the use of the GD image module.)

What does it do you may ask? Basically it converts an image to a relatively optimized table representation of the image. It accepts one parameter which is the image file you are going to convert. It dumps the table to STDOUT. It can accept the following image types: PNG, JPEG, XPM and GD2

Warning, this will create very large and complex tables. I have created a 120k table from 6k PNG image, so this thing is not appropriate for larger images. (before the COLSPAN enhancements it could generate tables many times larger)

Gtk+ HTML Tree Viewer
on Sep 20, 2000 at 21:17 UTC
by mdillon

this is a rewrite of a utility i did for a job where i was using HTML::TreeBuilder and XML::XPath to parse and search normal HTML documents using the powerful XPath query language.

this utility uses HTML::TreeBuilder to parse an HTML document from a URL specified on the command line or from an internal browser location line and displays it as a Gtk+ Tree in a window. only subtrees with text nodes or anchors are expanded.

there are (simple) XPath queries displayed in the status bar that could be used to extract that node from the document (for example, by converting it to XHTML with HTML::TreeBuilder and then using XML::XPath, or by traversing the TreeBuilder parse tree and programmatically constructing an XPath parse tree).

it's probably not a bad example of simple Gtk+ GUI programming. more may be yet to come in the way of functionality (and comments).

this was written and tested against Gtk 0.7003.

there is support for using GtkHTML as well, if your installation is functional (mine was partially functional when i wrote the code, but stopped working after i upgraded from GtkHTML 0.4 to 0.6.1 and recompiled Gtk::HTML)

most recently updated: 24 Sep 2000

Personal PerlMonks Stats plot creator
on Jul 31, 2000 at 10:47 UTC
by ase
Here's my contribution to statistics nuts like myself.
This utility Logs in to Perlmonks (using ZZamboni's PerlmonksChat module), gets your writeup page and creates 3 plots from the data, which are ftp'd to a server of your choice.
I run it every few days to update the graphs. All modules besides PerlMonksChat.pm are available at CPAN. See my home node for an example of the results.

Update: I no longer post the graphs on my home node. The updated code given in the replies to this node is more modern. Thanks to everyone for the kind comments I received when I first wrote this.

Automatic CODE-tag creation (Prototype)
on Jun 21, 2000 at 20:28 UTC
by Corion
Out of a discussion about how we can prevent newbies from posting unreadable rubbish, here is a program that tries to apply some heuristics to make posts more readable. This version isn't the most elegant, so it's called a prototype.
delirium
on Jul 05, 2000 at 03:23 UTC
by beppu
a filter to make your HTML delirioius
Random Color Generator
on Feb 03, 2000 at 08:05 UTC
by Elihu
This is a cgi script that generates an 8 by 8 grid of random colors with their appropriate hex values. Useful for picking colors for web pages.
embedded table remover
on May 26, 2000 at 11:15 UTC
by BigJoe
This script you can run on a html document to remove all embedded tables that are in it. Assuming that the tables were programmed into the document correctly. By default it will remove all embedded and leave the main table but you can also tell how many embedded tables are allowed by changing the numofTables variable.
Code Viewer
on May 19, 2000 at 01:49 UTC
by BigJoe
This is a script that I put together for use on my source code page. This script then allows me to copy html and scripts into a dir and let people pick the ones they want to view and I don't have to set up a page for each. It does require a param sent to it by using ?html=filename.
Update 6/2/200 With the help of Fastolfe I have added some testing on the $in{html} to make sure it is not tainted.
Dark Theme for /. through Perl
on May 04, 2000 at 23:07 UTC
by PipTigger
Here's a little script I wrote for myself since I like light text on dark backgrounds (thanks again for the nice PerlMonks theme Vroom!) and /. doesn't have one... I know it's pretty suckie and could be a lot simpler. If you can make it better, please email it to me (piptigger@dimensionsoftware.com) as I use it everyday now. It doesn't werk yet for the ask/. section but when I find some time, I'll add that too. I hope someone else finds this useful. TTFN & Shalom.

-PipTigger
p.s. I tried to submit this to the CUFP section but it didn't work so I thought I'd try here before giving up. Please put the script on your own server and change the $this to reflect your locale. Thanks!
Propaganda Tile Browser
on Apr 14, 2000 at 05:14 UTC
by Anonymous Monk
This perl script (when deployed and executed in a directory containing images) will generate a nice HTML front-end for viewing the images remotely.

For an example of this script's output, have a look here.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2024-03-29 12:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found