Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Difficulty with Logic parsing ICAL feed

by ChocolateCake (Initiate)
on Aug 17, 2012 at 16:16 UTC ( #988015=perlquestion: print w/ replies, xml ) Need Help??
ChocolateCake has asked for the wisdom of the Perl Monks concerning the following question:

I have been working on a code that will parse event information from an Ical feed. It is a huge block of data that I want to divide by key term. I need it to be done in an orderly way. I tried indexing the key terms and then having the program print what is between those indexes. However for some reason it became in infinite loop that printed all the data. I don't know how to fix it. DO NOT RUN MY CODE IT KEEPS FREEZING MY COMPUTER. I was hoping someone could show me what my problem is.

DO NOT RUN THIS PROGRAM use strict; use warnings; use LWP::Simple; use HTML::TreeBuilder; use HTML::FormatText; my $URL= get("https://www.events.utoronto.ca/iCal.php?ical=1&campus=0& +sponsor%5B%5D=&audience%5B%5D=&category%5B%5D="); my $Format=HTML::FormatText->new; my $TreeBuilder=HTML::TreeBuilder->new; $TreeBuilder->parse($URL); my $Parsed=$Format->format($TreeBuilder); open(FILE, ">UOTSUMMER.txt"); print FILE "$Parsed"; close (FILE); open (FILE, "UOTSUMMER.txt"); my @array=<FILE>; my $string ="@array"; my $offset = 0; # Where are we in the string? my $numResults = 0; while (1) { my $idxSummary = index($string, "SUMMARY", $offset); my $result = ""; my $idxDescription = index ($string, "DESCRIPTION", $offset); my $result2= ""; if ($idxSummary > -1) { $offset = $idxSummary + length("SUMMARY"); my $idxDescription = index($string, "DESCRIPTION", $offset); if ($idxDescription == -1) { print "(Data malformed: missing DESCRIPTION line.)\n"; last; } if ($idxDescription > -1) { $offset = $idxDescription+ length("DESCRIPTION"); my $idxLocation= index($string, "LOCATION", $offset); if ($idxLocation == -1) { print "(Data malformed: missing LOCATION line.)\n"; last; } my $length = $idxDescription - $offset; my $length2= $idxLocation - $offset; $result = substr($string, $offset, $length); $result2= substr ($string, $offset, $length2); $offset = $idxDescription + length("DESCRIPTION"); $result =~ s/^\s+|\s+$//g ; # Strip leading and trailing white #+space, $result2 =~ s/^\s+|\s+$//g ; # includng newlines. $numResults++; } else { print "(All done. $numResults result(s) found.)\n"; last; } open (FILE2, "UOT123.txt") print FILE2 "TITLE: <$result>\n DESCRIPTION: <$result2>\n";

Any guidance you may have will be greatly appreciated! Thanks!

Comment on Difficulty with Logic parsing ICAL feed
Download Code
Re: Difficulty with Logic parsing ICAL feed
by linuxkid (Sexton) on Aug 17, 2012 at 16:26 UTC

    take a look at the url you're trying to get, it looks malformed.

    --linuxkid


    imrunningoutofideas.co.cc

      It fetches an iCal record. Pretty easy to copy it and try it. What looks malformed about it?

        https://www.events.utoronto.ca/iCal.php?ical=1&campus=0&sponsor%5B%5D=&audience%5B%5D=&category%5B%5D= seemns like there's are some extra '='s

        --linuxkid


        imrunningoutofideas.co.cc
Re: Difficulty with Logic parsing ICAL feed
by tobyink (Abbot) on Aug 17, 2012 at 16:28 UTC

    Why on earth are you parsing iCalendar by hand? Use Text::vFile::asData.

    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re: Difficulty with Logic parsing ICAL feed
by roboticus (Canon) on Aug 17, 2012 at 16:30 UTC

    ChocolateCake:

    It looks like it doesn't find "SUMMARY", so it keeps resetting $offset to the same location--character 10: -1 + length("DESCRIPTION"). I'd suggest terminating the loop if it can't find SUMMARY or DESCRIPTION.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      THANK YOU!! YOU FIXED MY PROBLEM AND MY CODE WORKS!! BEST FEELING IN THE WORLD!!

        ChocolateCake:

        I should've mentioned: If you're having trouble with a loop like this, you might find it helpful to print the "interesting" variables (e.g. $offset) at the top of the loop. For example:

        while (1) { my $idxSummary = index($string, "SUMMARY", $offset); my $result = ""; my $idxDescription = index ($string, "DESCRIPTION", $offset); my $result2= ""; print "idxSum:$idxSummary, idxDesc:$idxDescriptioni, offs:$offset\ +n";

        Then I'd imagine you'd see something like:

        idxSum:47, idxDesc: 62, offs:0 idxSum:122, idxDesc: 143, offs:73 idxSum:-1, idxDesc:-1, offs:10 idxSum:-1, idxDesc:-1, offs:10 .....

        Then you'd probably figure it processed the first two records, and had a problem with the third. I didn't download your code, nor the ICAL url or anything, so the numbers are entirely fictitious.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Re: Difficulty with Logic parsing ICAL feed
by davido (Archbishop) on Aug 17, 2012 at 16:42 UTC

    Crossposted on Stack Overflow: http://stackoverflow.com/q/12009527/716443

    There's nothing wrong with minimal crossposting. But it's polite and useful to link to the other copies so that people don't put effort into a question that already has a solution elsewhere, and so that the collaborative effort can be based on responses from all incarnations of the question.


    Dave

      You are absolutely right, I am new to this and don't know the proper etiquette. Thank you for sharing that with me! How do I link the crosspost?
Re: Difficulty with Logic parsing ICAL feed
by davido (Archbishop) on Aug 17, 2012 at 16:55 UTC

    If you don't have https support installed this line will be a problem (maybe you do, as it's not the problem you're posting about):

    my $URL= get("https://www.events.utoronto.ca/iCal.php?ical=1&campus=0&sponsor%5B%5D=&audience%5B%5D=&category%5B%5D=");

    You should be checking the return value from LWP::Simple::get(). If it's undef, you didn't get a successful response. In this case you can solve the this part of the problem simply by switching to http:// instead of https://, or by installing LWP::Protocol::https. But still get in the habit of checking for undef after using LWP::Simple::get.

    You should be able to enable https support in LWP::Simple by following the advice in the README for LWP:

    If you want to access sites using the https protocol, then you need to install the LWP::Protocol::https module from CPAN.

    Once this is done, LWP::Simple is able to fetch the https request. (And maybe you've done this already).


    Dave

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://988015]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (5)
As of 2014-12-19 03:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (70 votes), past polls