Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

How to browse using WWW::Mechanize::Firefox

by ckj (Chaplain)
on May 29, 2012 at 07:33 UTC ( #972942=perlquestion: print w/ replies, xml ) Need Help??
ckj has asked for the wisdom of the Perl Monks concerning the following question:

My current link is http://www.lcps. org/Page/2309. In this page I've to get all the events information as per their date. Currently I'm just able to fetch the event not their respective date since all this is generated using jaavscript. This project is for enabling this site into mobile app.

Comment on How to browse using WWW::Mechanize::Firefox
Re: How to browse using WWW::Mechanize::Firefox
by Corion (Pope) on May 29, 2012 at 07:58 UTC

    I recommend using a tool such as Firebug, which allows you to select the elements of interest by mouse and shows you an Xpath expression for the element(s). Firebug also allows you to inspect the classes of the elements.

    inspecting one element, I see class="fc-event-title" - that might already be enough to find the elements that contain the event description:

    X:\repos\WWW-Mechanize-Firefox>perl -Ilib -w examples\scrape-ff.pl htt +p://www.lcps.org/Page/2309 .fc-event-title Spanish I A Days Mi Familia- Oral presentation TODAY!!! Spanish I B days Mi Familia Oral Presentation TODAY!!! Spanish I A days Spanish II Spanish II Chapter 3A Quiz HOLIDAY (Memorial Day)
      I'm getting this output through my perl script also,
      #!perl -w use strict; use WWW::Mechanize::Firefox; my $mech = WWW::Mechanize::Firefox->new(); $mech->get('http://www.lcps.org/Page/2309'); my $cal_content= $mech->content; while($cal_content=~m/"fc-event-title\s*ellipsis"(\s|\w)*>(.*?)<\/span +>/g){ print $2."\n"; }
      But the issue is how to get the dates too respective with their events. e.g. Output should be like this : 01/05/2012 Spanish I A Days 03/05/2012 Mi Familia- Oral presentation TODAY!!! Spanish I B days 04/05/2012 Mi Familia Oral Presentation TODAY!!! Spanish I A days 01/05/2012 Spanish II 08/05/2012 Spanish II Chapter 3A Quiz 28/05/2012 HOLIDAY (Memorial Day) Please make the changes in perl script itself.

        You will have to do some programming then. You will need to correlate the positions of the events with the date information. Personally, I would do that by using the page co-ordinates, but likely you can also get by by determining the column in which an element is positioned.

        I won't write a program for you because that requires deeper analysis and investment of more time than I'm willing to spend on what is mostly trial and error.

        Update: Consider switching the view to "List" view, then you should be able to easily extract the date and the description from the same HTML element. Also, that page exports RSS and ICal views as well - instead of scraping, I recommend you use the data in these formats.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://972942]
Approved by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2014-10-25 00:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (138 votes), past polls