http://www.perlmonks.org?node_id=988015

ChocolateCake has asked for the wisdom of the Perl Monks concerning the following question:

I have been working on a code that will parse event information from an Ical feed. It is a huge block of data that I want to divide by key term. I need it to be done in an orderly way. I tried indexing the key terms and then having the program print what is between those indexes. However for some reason it became in infinite loop that printed all the data. I don't know how to fix it. DO NOT RUN MY CODE IT KEEPS FREEZING MY COMPUTER. I was hoping someone could show me what my problem is.

DO NOT RUN THIS PROGRAM use strict; use warnings; use LWP::Simple; use HTML::TreeBuilder; use HTML::FormatText; my $URL= get("https://www.events.utoronto.ca/iCal.php?ical=1&campus=0& +sponsor%5B%5D=&audience%5B%5D=&category%5B%5D="); my $Format=HTML::FormatText->new; my $TreeBuilder=HTML::TreeBuilder->new; $TreeBuilder->parse($URL); my $Parsed=$Format->format($TreeBuilder); open(FILE, ">UOTSUMMER.txt"); print FILE "$Parsed"; close (FILE); open (FILE, "UOTSUMMER.txt"); my @array=<FILE>; my $string ="@array"; my $offset = 0; # Where are we in the string? my $numResults = 0; while (1) { my $idxSummary = index($string, "SUMMARY", $offset); my $result = ""; my $idxDescription = index ($string, "DESCRIPTION", $offset); my $result2= ""; if ($idxSummary > -1) { $offset = $idxSummary + length("SUMMARY"); my $idxDescription = index($string, "DESCRIPTION", $offset); if ($idxDescription == -1) { print "(Data malformed: missing DESCRIPTION line.)\n"; last; } if ($idxDescription > -1) { $offset = $idxDescription+ length("DESCRIPTION"); my $idxLocation= index($string, "LOCATION", $offset); if ($idxLocation == -1) { print "(Data malformed: missing LOCATION line.)\n"; last; } my $length = $idxDescription - $offset; my $length2= $idxLocation - $offset; $result = substr($string, $offset, $length); $result2= substr ($string, $offset, $length2); $offset = $idxDescription + length("DESCRIPTION"); $result =~ s/^\s+|\s+$//g ; # Strip leading and trailing white #+space, $result2 =~ s/^\s+|\s+$//g ; # includng newlines. $numResults++; } else { print "(All done. $numResults result(s) found.)\n"; last; } open (FILE2, "UOT123.txt") print FILE2 "TITLE: <$result>\n DESCRIPTION: <$result2>\n";

Any guidance you may have will be greatly appreciated! Thanks!

Replies are listed 'Best First'.
Re: Difficulty with Logic parsing ICAL feed
by roboticus (Chancellor) on Aug 17, 2012 at 16:30 UTC

    ChocolateCake:

    It looks like it doesn't find "SUMMARY", so it keeps resetting $offset to the same location--character 10: -1 + length("DESCRIPTION"). I'd suggest terminating the loop if it can't find SUMMARY or DESCRIPTION.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      THANK YOU!! YOU FIXED MY PROBLEM AND MY CODE WORKS!! BEST FEELING IN THE WORLD!!

        ChocolateCake:

        I should've mentioned: If you're having trouble with a loop like this, you might find it helpful to print the "interesting" variables (e.g. $offset) at the top of the loop. For example:

        while (1) { my $idxSummary = index($string, "SUMMARY", $offset); my $result = ""; my $idxDescription = index ($string, "DESCRIPTION", $offset); my $result2= ""; print "idxSum:$idxSummary, idxDesc:$idxDescriptioni, offs:$offset\ +n";

        Then I'd imagine you'd see something like:

        idxSum:47, idxDesc: 62, offs:0 idxSum:122, idxDesc: 143, offs:73 idxSum:-1, idxDesc:-1, offs:10 idxSum:-1, idxDesc:-1, offs:10 .....

        Then you'd probably figure it processed the first two records, and had a problem with the third. I didn't download your code, nor the ICAL url or anything, so the numbers are entirely fictitious.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Re: Difficulty with Logic parsing ICAL feed
by davido (Cardinal) on Aug 17, 2012 at 16:42 UTC

    Crossposted on Stack Overflow: http://stackoverflow.com/q/12009527/716443

    There's nothing wrong with minimal crossposting. But it's polite and useful to link to the other copies so that people don't put effort into a question that already has a solution elsewhere, and so that the collaborative effort can be based on responses from all incarnations of the question.


    Dave

      You are absolutely right, I am new to this and don't know the proper etiquette. Thank you for sharing that with me! How do I link the crosspost?
Re: Difficulty with Logic parsing ICAL feed
by tobyink (Canon) on Aug 17, 2012 at 16:28 UTC

    Why on earth are you parsing iCalendar by hand? Use Text::vFile::asData.

    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re: Difficulty with Logic parsing ICAL feed
by davido (Cardinal) on Aug 17, 2012 at 16:55 UTC

    If you don't have https support installed this line will be a problem (maybe you do, as it's not the problem you're posting about):

    my $URL= get("https://www.events.utoronto.ca/iCal.php?ical=1&campus=0&sponsor%5­B%5D=&audience%5B%5D=&category%5B%5D=");

    You should be checking the return value from LWP::Simple::get(). If it's undef, you didn't get a successful response. In this case you can solve the this part of the problem simply by switching to http:// instead of https://, or by installing LWP::Protocol::https. But still get in the habit of checking for undef after using LWP::Simple::get.

    You should be able to enable https support in LWP::Simple by following the advice in the README for LWP:

    If you want to access sites using the https protocol, then you need to install the LWP::Protocol::https module from CPAN.

    Once this is done, LWP::Simple is able to fetch the https request. (And maybe you've done this already).


    Dave

A reply falls below the community's threshold of quality. You may see it by logging in.