Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

meta parsing problems

by Anonymous Monk
on May 23, 2004 at 20:20 UTC ( [id://355760]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

This snippet below isn't parsing meta tags the way it should. This is using LWP::Simple and the source code (which works) is in $content.

This needs to match ANY meta tag in the format shown in the script, and yes I know I shouldn't do it this way but that's the question isn't on using a different module, it's how to fix this problem. Problem is, meta tags aren't always on the same line, they can all be bunched together like a paragraph and that's where this must be messing up.

Nothing prints at all, when I use an array that's split on /n it doesn't work because not all meta tags are separated by new lines.

Where is the error in this, anyone know?

my @meta_results; my $count = 0; my @lines = split /\n/, $content; while(<$content>) { if (/<meta name=\"(.+?)\" content=\"(.+?)\">/gi) { $count++; $meta_results[$count] = "$1::$2"; } } foreach (@meta_results) {print "$_\n";}

Replies are listed 'Best First'.
Re: meta parsing problems
by Joost (Canon) on May 23, 2004 at 21:59 UTC
Re: meta parsing problems
by exussum0 (Vicar) on May 23, 2004 at 20:27 UTC
      So what's the solution for the string problem? The array didn't work so I was forced to use a scalar, which obviously doesn't work either.

      I'm prepared for the reversed meta tags, I thought about that already but that's to be worked on after the meta tags work the first time around with this.

      thanks

Re: meta parsing problems
by Ctrl-z (Friar) on May 23, 2004 at 22:54 UTC
    this isnt fool proof, but it'll probably do what you want...
    foreach( $content =~ m#<meta (.*?)>#sgoi ) { my $name = $1 if( $_ =~ m#name\s*?=\s*?["'](.*?)["']#sgoi); my $cont = $1 if( $_ =~ m#content\s*?=\s*?["'](.*?)["']#sgoi); }



    time was, I could move my arms like a bird and...
      Eventhough the thread's a bit old... There are problems with this approach. You should rellay consider using HTML::TreeBuilder, it's as easy as
      use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new()->parse($data); for my $tag ($tree->look_down( _tag => "meta")) { $kWords{$tag->attr("name")} = $tag->attr("content"); }
      The above code takes care of spaces/linebreaks &s.o. And its fast and widely used. Just my 5cents. FJ
Re: meta parsing problems
by mrpeabody (Friar) on May 24, 2004 at 23:37 UTC
    This needs to match ANY meta tag in the format shown in the script, and yes I know I shouldn't do it this way but that's the question isn't on using a different module, it's how to fix this problem. Problem is, meta tags aren't always on the same line, they can all be bunched together like a paragraph and that's where this must be messing up.
    You don't want to hear about using a module, but then you complain about having to solve precisely the nontrivial problem that the module is designed to solve. "I'm having trouble mowing my lawn with these scissors. I don't want to use my lawnmower, so don't suggest that. How can I make these scissors work better?"

    You can either use the module and be done with a robust solution in ten minutes, or spend hours re-implementing the module and have a weak, brittle solution. Your choice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://355760]
Approved by Corion
Front-paged by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (6)
As of 2024-04-16 17:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found