Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Mojo::DOM help

by Anonymous Monk
on May 20, 2020 at 04:44 UTC ( #11116959=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello wise monks, I'm trying to parse several html pages but their format is different in that the info that I want is after three different class tags. I can find one tag and get the data I want just fine the problem is with the multiple tags. I can return values related to anyone but not all three which is what I want. I tried to use the "each" and "match" in mojo with no luck. I also tried looking for a null string value so I wouldn't be overwriting the mapping if I had already found what I needed after matching one tag and extracting the values. Anyway here's the code that I though could work:
$r2 = $dom2->find( '[class="LC20lb DKV0Md"]' or '[class="BNeawe vvjw +Jb AP7Wnd"]' or '[class="CVA68e qXLe6d"]' ) -> map( sub{ $_->text } ) +;
Any ideas on how to pull out the data I need behind the 3 tags and map so I can write should be greatly appreciated. Thanks, Newbmyer

Replies are listed 'Best First'.
Re: Mojo::DOM help
by haukex (Chancellor) on May 20, 2020 at 07:01 UTC

    You're using Perl's or, which is taking the logical combination of those three strings before calling the find method, and since the first string is a true value, the only thing you're passing to the find method is the first string. See Mojo::DOM::CSS's selectors: "or" is a comma, and "and" for attributes is [...][...]. But note that class gets special treatment, and you can simply use the .class selector to match classes, stringing them together for an "and". Also note the order of classes in the class attribute can change, which the following shows, but if you really want exact string matches you can use the [class="value"] selectors.

    use warnings; use 5.012; use Mojo::DOM; my $dom = Mojo::DOM->new(<<'HTML'); <div> <div class="LC20lb DKV0Md"> matches </div> <div class="DKV0Md"> no match </div> <div class="DKV0Md LC20lb"> matches </div> <div class="BNeawe vvjwJb AP7Wnd"> matches </div> <div class="BNeawe AP7Wnd vvjwJb"> matches </div> <div class="AP7Wnd vvjwJb BNeawe"> matches </div> <div class="BNeawe AP7Wnd"> no match </div> <div class="CVA68e qXLe6d"> matches </div> <div class="qXLe6d CVA68e"> matches </div> <div class="qXLe6d"> no match </div> </div> HTML $dom->find('.LC20lb.DKV0Md, .BNeawe.vvjwJb.AP7Wnd, .CVA68e.qXLe6d') ->each(sub { say });
Re: Mojo::DOM help
by marto (Archbishop) on May 20, 2020 at 07:05 UTC

    N.B. Google change the SERPs stuff every once in a while, and blacklist the IP of systems the believe are scraping them.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11116959]
Approved by kcott
Front-paged by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2020-06-05 11:07 GMT
Find Nodes?
    Voting Booth?
    Do you really want to know if there is extraterrestrial life?

    Results (37 votes). Check out past polls.