Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Save parsed text to file

by pritchard12 (Initiate)
on Jul 16, 2009 at 22:46 UTC ( [id://780869]=perlquestion: print w/replies, xml ) Need Help??

pritchard12 has asked for the wisdom of the Perl Monks concerning the following question:

I am working with the HTML Parser module example which extracts plain text from html. After parsing the html I want to know what is the best way to strip out the special characters and extra line spacing, then save the plain text to a file. Thanks for your help.
#!/usr/bin/perl -w # Extract all plain text from an HTML file use strict; use HTML::Parser 3.00 (); my %inside; sub tag { my($tag, $num) = @_; $inside{$tag} += $num; print " "; # not for all tags } sub text { return if $inside{script} || $inside{style}; print $_[0]; } HTML::Parser->new(api_version => 3, handlers => [start => [\&tag, "tagname, '+1'"], end => [\&tag, "tagname, '-1'"], text => [\&text, "dtext"], ], marked_sections => 1, )->parse_file(shift) || die "Can't open file: $!\n";;

Replies are listed 'Best First'.
Re: Save parsed text to file
by poolpi (Hermit) on Jul 17, 2009 at 06:46 UTC

    See HTML::TokeParser::get_trimmed_text

    From the doc:

    Any entities will be converted to their corresponding character...
    ( HTML::Entities )
    ...Leading and trailing white space is removed.


    hth,
    PooLpi

    'Ebry haffa hoe hab im tik a bush'. Jamaican proverb

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://780869]
Approved by AnomalousMonk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2024-03-28 22:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found