Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^3: How to output the words that you want that came thru an html file?

by NetWallah (Canon)
on May 04, 2012 at 06:19 UTC ( [id://968869]=note: print w/replies, xml ) Need Help??


in reply to Re^2: How to output the words that you want that came thru an html file?
in thread How to output the words that you want that came thru an html file?

Looks like you need a HTML::Parser module.

If that is too much, you can start with code like this:

my @content = split /^/, $response->content(); my $introduction; for my $line (@content){ next unless ( $introduction ) ||= $line=~/Game Introduction - Marvel +: Avengers Alliance/; my $EndSection = $line=~/<\/section>/; local $_ = $line; # Make a copy, so we do not modify @content.. m|^\s*<[^/>]+>(.+)</| and $_=$1; # Zap tags on both sides, if any s|<[^>]+>||g; # Zap single </onetag> tags print; last if $EndSection ; }
which , I suspect , is close to what you are looking for.

             I hope life isn't a big joke, because I don't get it.
                   -SNL

  • Comment on Re^3: How to output the words that you want that came thru an html file?
  • Download Code

Replies are listed 'Best First'.
Re^4: How to output the words that you want that came thru an html file?
by stone_ice (Initiate) on May 04, 2012 at 07:09 UTC

    It displayed the output thanks to that sir,, hmm can you please explain to me the lines 17 up to 20? Im confused on that part...

    Thanks

    Update

    I've read the code but still confused specially on part line 20-21, if Im not mistaken it's a regular expression.. So far the out put is almost ok , the last thing I would want to do is arranging them in order.. now the output is like this:

    Game Introduction - Marvel: Avengers Alliance Welcome to the quick start guide for Marvel: Avengers Alliance, a supe +rhero role role playing game from Playdom. Recruit heroes, battle enemies and ta +ke on your friend’s teams in this action/strategy game.</p> <p>I would like to arrange them to be aligned at the left side removin +g the spacing at the Game Introduction and role part, what I would li +ke to achieve now is something like this.
    Game Introduction - Marvel: Avengers Alliance Welcome to the quick start guide for Marvel: Avengers Alliance, a supe +rhero role playing game from Playdom. Recruit heroes, battle enemies and tak +e on your friend’s teams in this action/strategy game.

    Is there a way that somehow arrange the text inside the string? or is it normal for this kind of output because the input came from an html file

    Thanks
      I should have indicated in my previous post that the sample code is rather fragile, and very dependent on the way the website developer chooses to store his/her HTML.

      As Ea says below, HTML::TokeParser is a much better choice for robust processing.

      Having said that - to answer your questions: (I'm assuming that your lines 20,21 are these)

      m|^\s*<[^/>]+>(.+)</| and $_=$1; # Zap tags on both sides, if any # The line above looks for text enclosed in html tokens, and extract +s the text. # Eg: applying the regex to : "<h2>Some text</h2>" places "Some tex +t" into "$1", which is then copied into "$_" s|<[^>]+>||g; # Zap single </onetag> tags # The line above handles left-over single tags: # Eg: it zaps "<sometag/>" from "text1 <sometag/> text2" # Actually, it is rather crude, and does not care about tag terminat +ion, or matching.
      In order to format the text better, you need to collect it into a scalar. Instead of "print", collect it using:
      $collected_text .= $_;
      Of course, you should declare $collected_text outside the loop.
      Then, after the loop, you will need to parse and clean $collected_text, before printing it.

                   I hope life isn't a big joke, because I don't get it.
                         -SNL

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://968869]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (6)
As of 2024-04-19 07:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found