Hello folks,
I must admit: I dont understand the DOM and scraping websites is a pain if you dont know it.
I'm using Mojo::DOM to parse a document, but I want to select and grab some sub element too, specifying their class=
The following code is the best I was able to produce, but I want to know, for example, if a phone number comes from class="fa fa fa-phone" or class="fa fa fa-mobile-phone" (infact Francesco Petrarca has not a mobile) and I also want to grab the url of the avatar image.
I hope my code and data is not too big to read.
use strict;
use warnings;
use Mojo::DOM;
my $data = join '',<DATA>;
my $dom = Mojo::DOM->new( $data );
foreach my $memb ( $dom->find('[id="members-list"] li')->each ){
print "\n########\n";
my $writers_list = $memb
->find('*')
->map( 'text' )
->grep( qr/\S/ )
->join("\n")
;
print $writers_list;
}
__DATA__
<ul id="members-list" class="item-list" role="main">
<li>
<div class="item-avatar">
<a href="https://intra.example.com/coworkers/dantealighier
+i/"><img src="SRCURL/></a>
<span class="member-role">Sottoscrittore</span>
+
</div> <!-- .item-avatar -->
<div class="item">
<div class="item-title">
<a href="https://intra.example.com/coworkers/dantealig
+hieri/" class="heading"><h3>Dante Alighieri</h3></a>
+
</div>
<div class="item-meta"><span class="activity">active 6
+ days ago, 19 hours ago</span></div>
<div class="woffice-xprofile-list">
<span><i class="fa fa fa-phone"></i>011111111</spa
+n>
<span><i class="fa fa fa-mobile-phone"></i>3333333
+33</span>
<span><i class="fa fa fa-envelope-o"></i>dante.ali
+ghieri@example.com</span>
<span><i class="fa fa fa-check"></i>Poets and Writ
+ers</span>
</div>
</div>
<div class="action"></div>
<div class="clear"></div>
</li>
<li>
<div class="item-avatar">
<a href="https://intra.example.com/coworkers/francescopetr
+arca/"><img src="SRCURL/></a>
<span class="member-role">Sottoscrittore</span>
+
</div> <!-- .item-avatar -->
<div class="item">
<div class="item-title">
<a href="https://intra.example.com/coworkers/francesco
+ptetrarca/" class="heading"><h3>Francesco Petrarca</h3></a>
+
</div>
<div class="item-meta"><span class="activity">active 7
+ days ago, 22 hours ago</span></div>
<div class="woffice-xprofile-list">
<span><i class="fa fa fa-phone"></i>02222222</span
+>
<span><i class="fa fa fa-mobile-phone"></i></span>
<span><i class="fa fa fa-envelope-o"></i>francesco
+.petrarca@example.com</span>
<span><i class="fa fa fa-check"></i>Poets and Writ
+ers</span>
</div>
</div>
<div class="action"></div>
<div class="clear"></div>
</li>
</ul>
In my mind I'd like to populate an hash like:
my %members = (
'Dante Alighieri' => {
'avatar_url' => 'URL',
'fa fa fa-phone' => '02222222',
'fa fa fa-mobile-phone' => '333333333',
'fa fa fa-envelope-o' => 'dante.alighieri@example.com'
'fa fa fa-check' => 'Poets and Writers',
},
...
);
L*
There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.