Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Dump Text from HTML

by alfie (Pilgrim)
on Jul 18, 2001 at 12:50 UTC ( #97581=note: print w/replies, xml ) Need Help??


in reply to Dump Text from HTML

You have some common mistakes in your script:
You are matching greedy - add a ? after your +, like this:
$riga =~ s/<\/.+?>//g;
Also, you assume that the opening bracket and the closing are on the same line, which isn't usually neeeded. So adding the s modifier to your substitutions would help, too. And, you forgot to substitute &gt; with > :)

There is lot of space for optimizing it, like using different delimeters to avoid having to escape the slash, or doing more substitutions just in one line, like the first two:

$riga =~ s!</?.>!!g;
I hope you get what I mean, nice script anyway.
--
use signature; signature(" So long\nAlfie");

Replies are listed 'Best First'.
Re: Re: Dump Text from HTML
by Sigmund (Pilgrim) on Jul 28, 2001 at 20:06 UTC
    just a question: how do i parse my html code using the /s modifier if input is read line by line by the angle operator??? i mean, if i read one line using <INF> how can i expect that my script look into the following one just by using /s???? thanks bye SiG
10x 4 reply
by Sigmund (Pilgrim) on Jul 22, 2001 at 18:34 UTC
    i just read in the camel book about the use of "?" !!! I'll post my progresses. 10x 4 all your comments. others advice me to use HTML:Parse, but i don't want no module at all. after all, i exchange portability with efficiency at a very excellent convenience rate!! see ya SiG
Re: Re: Dump Text from HTML
by dentargiano (Initiate) on Jul 10, 2002 at 10:10 UTC
    Hi everyone As yo may see iīm a newbie in perl. I have used the "Dump text from html" code but i still have problems with some tags and other symbols that i canīt erase when i try to convert a html file to a text file. Also i have lot of space blank that i canīt optimize. Thanks. Dani
Re: Re: Dump Text from HTML
by dentargiano (Initiate) on Jul 10, 2002 at 10:22 UTC
    Hi I have been using your code Dump Text from HTML, but I still have problems when I try to convert a html file to a text file. First of all, I have a lot of space blanks that I would want to optimize. Second I have some tags like <FONT or <href that I want to erase. Finally I want to erase all scripts and images. Could you send me any changes you have made or improve in your code?. Thank you for your help.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://97581]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (1)
As of 2022-01-25 18:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:












    Results (67 votes). Check out past polls.

    Notices?