Beefy Boxes and Bandwidth Generously Provided by pair Networks Joe
The stupid question is the question not asked
 
PerlMonks  

Difficulty with Logic parsing ICAL feed

by ChocolateCake (Initiate)
on Aug 17, 2012 at 16:16 UTC ( #988015=perlquestion: print w/ replies, xml ) Need Help??
ChocolateCake has asked for the wisdom of the Perl Monks concerning the following question:

I have been working on a code that will parse event information from an Ical feed. It is a huge block of data that I want to divide by key term. I need it to be done in an orderly way. I tried indexing the key terms and then having the program print what is between those indexes. However for some reason it became in infinite loop that printed all the data. I don't know how to fix it. DO NOT RUN MY CODE IT KEEPS FREEZING MY COMPUTER. I was hoping someone could show me what my problem is.

DO NOT RUN THIS PROGRAM use strict; use warnings; use LWP::Simple; use HTML::TreeBuilder; use HTML::FormatText; my $URL= get("https://www.events.utoronto.ca/iCal.php?ical=1&campus=0& +sponsor%5B%5D=&audience%5B%5D=&category%5B%5D="); my $Format=HTML::FormatText->new; my $TreeBuilder=HTML::TreeBuilder->new; $TreeBuilder->parse($URL); my $Parsed=$Format->format($TreeBuilder); open(FILE, ">UOTSUMMER.txt"); print FILE "$Parsed"; close (FILE); open (FILE, "UOTSUMMER.txt"); my @array=<FILE>; my $string ="@array"; my $offset = 0; # Where are we in the string? my $numResults = 0; while (1) { my $idxSummary = index($string, "SUMMARY", $offset); my $result = ""; my $idxDescription = index ($string, "DESCRIPTION", $offset); my $result2= ""; if ($idxSummary > -1) { $offset = $idxSummary + length("SUMMARY"); my $idxDescription = index($string, "DESCRIPTION", $offset); if ($idxDescription == -1) { print "(Data malformed: missing DESCRIPTION line.)\n"; last; } if ($idxDescription > -1) { $offset = $idxDescription+ length("DESCRIPTION"); my $idxLocation= index($string, "LOCATION", $offset); if ($idxLocation == -1) { print "(Data malformed: missing LOCATION line.)\n"; last; } my $length = $idxDescription - $offset; my $length2= $idxLocation - $offset; $result = substr($string, $offset, $length); $result2= substr ($string, $offset, $length2); $offset = $idxDescription + length("DESCRIPTION"); $result =~ s/^\s+|\s+$//g ; # Strip leading and trailing white #+space, $result2 =~ s/^\s+|\s+$//g ; # includng newlines. $numResults++; } else { print "(All done. $numResults result(s) found.)\n"; last; } open (FILE2, "UOT123.txt") print FILE2 "TITLE: <$result>\n DESCRIPTION: <$result2>\n";

Any guidance you may have will be greatly appreciated! Thanks!

Comment on Difficulty with Logic parsing ICAL feed
Download Code
Re: Difficulty with Logic parsing ICAL feed
by linuxkid (Sexton) on Aug 17, 2012 at 16:26 UTC

    take a look at the url you're trying to get, it looks malformed.

    --linuxkid


    imrunningoutofideas.co.cc

      It fetches an iCal record. Pretty easy to copy it and try it. What looks malformed about it?

        https://www.events.utoronto.ca/iCal.php?ical=1&campus=0&sponsor%5B%5D=&audience%5B%5D=&category%5B%5D= seemns like there's are some extra '='s

        --linuxkid


        imrunningoutofideas.co.cc
Re: Difficulty with Logic parsing ICAL feed
by tobyink (Abbot) on Aug 17, 2012 at 16:28 UTC

    Why on earth are you parsing iCalendar by hand? Use Text::vFile::asData.

    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re: Difficulty with Logic parsing ICAL feed
by roboticus (Canon) on Aug 17, 2012 at 16:30 UTC

    ChocolateCake:

    It looks like it doesn't find "SUMMARY", so it keeps resetting $offset to the same location--character 10: -1 + length("DESCRIPTION"). I'd suggest terminating the loop if it can't find SUMMARY or DESCRIPTION.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      THANK YOU!! YOU FIXED MY PROBLEM AND MY CODE WORKS!! BEST FEELING IN THE WORLD!!

        ChocolateCake:

        I should've mentioned: If you're having trouble with a loop like this, you might find it helpful to print the "interesting" variables (e.g. $offset) at the top of the loop. For example:

        while (1) { my $idxSummary = index($string, "SUMMARY", $offset); my $result = ""; my $idxDescription = index ($string, "DESCRIPTION", $offset); my $result2= ""; print "idxSum:$idxSummary, idxDesc:$idxDescriptioni, offs:$offset\ +n";

        Then I'd imagine you'd see something like:

        idxSum:47, idxDesc: 62, offs:0 idxSum:122, idxDesc: 143, offs:73 idxSum:-1, idxDesc:-1, offs:10 idxSum:-1, idxDesc:-1, offs:10 .....

        Then you'd probably figure it processed the first two records, and had a problem with the third. I didn't download your code, nor the ICAL url or anything, so the numbers are entirely fictitious.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Re: Difficulty with Logic parsing ICAL feed
by davido (Archbishop) on Aug 17, 2012 at 16:42 UTC

    Crossposted on Stack Overflow: http://stackoverflow.com/q/12009527/716443

    There's nothing wrong with minimal crossposting. But it's polite and useful to link to the other copies so that people don't put effort into a question that already has a solution elsewhere, and so that the collaborative effort can be based on responses from all incarnations of the question.


    Dave

      You are absolutely right, I am new to this and don't know the proper etiquette. Thank you for sharing that with me! How do I link the crosspost?
Re: Difficulty with Logic parsing ICAL feed
by davido (Archbishop) on Aug 17, 2012 at 16:55 UTC

    If you don't have https support installed this line will be a problem (maybe you do, as it's not the problem you're posting about):

    my $URL= get("https://www.events.utoronto.ca/iCal.php?ical=1&campus=0&sponsor%5B%5D=&audience%5B%5D=&category%5B%5D=");

    You should be checking the return value from LWP::Simple::get(). If it's undef, you didn't get a successful response. In this case you can solve the this part of the problem simply by switching to http:// instead of https://, or by installing LWP::Protocol::https. But still get in the habit of checking for undef after using LWP::Simple::get.

    You should be able to enable https support in LWP::Simple by following the advice in the README for LWP:

    If you want to access sites using the https protocol, then you need to install the LWP::Protocol::https module from CPAN.

    Once this is done, LWP::Simple is able to fetch the https request. (And maybe you've done this already).


    Dave

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://988015]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (7)
As of 2014-04-20 17:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (485 votes), past polls