Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Mojo Dom extract

by ribo75017 (Initiate)
on May 27, 2015 at 08:55 UTC ( [id://1127948]=perlquestion: print w/replies, xml ) Need Help??

ribo75017 has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I wanna extract : - text from date, title, cat and lieu classes - anchors with Mojo::Dom but I am not able to do it :

Is it possible to get data like this : for each anchor into "ret" div, get (anchors, date elements, title, cat and lieu)

Thanks a lot

use Mojo::DOM; my $dom = Mojo::DOM->new(<<'HTML'); <div class="ret"> <a href="blabla.com/1234" title="Text 1"> <div class="rtm"> <div class="date"> <div>22</div> <div>mai</div> <div>19:52</div> </div> <div class="image"> <div class="imageclass-and-nb"><im +g src="blabla.com/img1234" alt="Text 1"></div> </div> <div class="all"> <h2 class="title">Text 1 Title</h2> <div class="cat">Blue</div> <div class="lieu">Dourdan</div> </div> </div> </a> <a href="blabla.com/1212" title="Text 2"> <div class="rtm"> <div class="date"> <div>22</div> <div>mai</div> <div>11:55</div> </div> <div class="image"> <div class="imageclass"><img src=" +blabla.com/img1212" alt="Text 2"></div> </div> <div class="detail"> <h2 class="title">Text 2 title</h2> <div class="cat">Blue</div> <div class="lieu">Champigny-sur-Marne< +/div> </div> </div> </a> </div> HTML print $dom->find('div.date')->map(sub{$_->children->each})->map(sub{$_ +->text})->each;

Replies are listed 'Best First'.
Re: Mojo Dom extract
by Anonymous Monk on May 27, 2015 at 09:05 UTC

    I wanna extract.... Is it possible...thanks

    Yes, its possible, you already have a start, why don't you simply do it?

      Because I don't know how to do it efficiently if possible, ie without 10 lines like :

      $dom->find('div.date')->map(sub{$_->children->each})->map(sub{$_->text})->each

      $dom->find('div.lieu')->map(sub{$_->children->each})->map(sub{$_->text})->each .. etc

      Would like:

      Dom->ret->a => anchor

      Dom->ret->rtm->date => date fields .. etc

        Because I don't know how to do it efficiently if possible, ie without 10 lines like :

        First do it any way possible, later reduce it :)

        Anyway ,

        for my $ret ( $dom->find('div.ret')->each ){ for my $aaaa ( $ret->find('a')->each ){ my $date = $aaaa->find('div.date')->first->all_text; my $lieu = $aaaa->find('div.lieu')->first->all_text; my $cat = $aaaa->find('div.cat')->first->all_text; my $titl = $aaaa->find('h2.title')->first->all_text; print join( "\t#\t", $date, $lieu, $cat, $titl), "\n"; } } __END__ 22 mai 19:52 # Dourdan # Blue # Text 1 Title 22 mai 11:55 # Champigny-sur-Marne # Blue # Text 2 t +itle

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1127948]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2024-04-23 19:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found