http://www.perlmonks.org?node_id=972944


in reply to How to browse using WWW::Mechanize::Firefox

I recommend using a tool such as Firebug, which allows you to select the elements of interest by mouse and shows you an Xpath expression for the element(s). Firebug also allows you to inspect the classes of the elements.

inspecting one element, I see class="fc-event-title" - that might already be enough to find the elements that contain the event description:

X:\repos\WWW-Mechanize-Firefox>perl -Ilib -w examples\scrape-ff.pl htt +p://www.lcps.org/Page/2309 .fc-event-title Spanish I A Days Mi Familia- Oral presentation TODAY!!! Spanish I B days Mi Familia Oral Presentation TODAY!!! Spanish I A days Spanish II Spanish II Chapter 3A Quiz HOLIDAY (Memorial Day)

Replies are listed 'Best First'.
Re^2: How to browse using WWW::Mechanize::Firefox
by ckj (Chaplain) on May 29, 2012 at 08:13 UTC
    I'm getting this output through my perl script also,
    #!perl -w use strict; use WWW::Mechanize::Firefox; my $mech = WWW::Mechanize::Firefox->new(); $mech->get('http://www.lcps.org/Page/2309'); my $cal_content= $mech->content; while($cal_content=~m/"fc-event-title\s*ellipsis"(\s|\w)*>(.*?)<\/span +>/g){ print $2."\n"; }
    But the issue is how to get the dates too respective with their events. e.g. Output should be like this : 01/05/2012 Spanish I A Days 03/05/2012 Mi Familia- Oral presentation TODAY!!! Spanish I B days 04/05/2012 Mi Familia Oral Presentation TODAY!!! Spanish I A days 01/05/2012 Spanish II 08/05/2012 Spanish II Chapter 3A Quiz 28/05/2012 HOLIDAY (Memorial Day) Please make the changes in perl script itself.

      You will have to do some programming then. You will need to correlate the positions of the events with the date information. Personally, I would do that by using the page co-ordinates, but likely you can also get by by determining the column in which an element is positioned.

      I won't write a program for you because that requires deeper analysis and investment of more time than I'm willing to spend on what is mostly trial and error.

      Update: Consider switching the view to "List" view, then you should be able to easily extract the date and the description from the same HTML element. Also, that page exports RSS and ICal views as well - instead of scraping, I recommend you use the data in these formats.