Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Problems with seemingly simple string matching...

by desertrat (Acolyte)
on Oct 04, 2013 at 21:21 UTC ( #1056939=perlquestion: print w/ replies, xml ) Need Help??
desertrat has asked for the wisdom of the Perl Monks concerning the following question:

I'm working on a script to parse long .ics (iCal export) files to remove every event that occurs after a date.

I'm looking at the date section of the DTSTART event property of the VEVENT (see theRFC.) to determine whether the entire VEVENT component (as defined as everything that goes between BEGIN:VEVENT and END:VEVENT) should be included in the output file.

But for some reason this isn't happening properly. The entire input file is ending up in the output file here.

If I match for a Beginning event using !~/BEGIN:VEVENT/ in line 29 and =~ /BEGIN:VEVENT/ in line 30 no events are put into the output file. This is quite confusing.

#!/usr/bin/perl use strict; my ($foo, $bar); my $infile = $ARGV[0]; my $enddate = $ARGV[1]; if (!$infile||!$enddate){print "Syntax calarchiver.pl <filename> <endd +ate> (as 'YYYYMMDD')"; exit;} my $outfile=$infile; $outfile =~ s/.ics/.archive.ics/g; open (IN,'<', $infile) or die "Cannot open $infile"; open (OUT, '>', $outfile) or die "cannot open $outfile"; my $inevt =0; my $evt=''; my $badevt=0; my $instr; my @parts; my @dparts; while (<IN>) { $instr = $_; chomp($instr); @parts = split(/:/,$instr); PARSE: { if (($instr ne "BEGIN:VEVENT") && !$inevt){ print OUT $_; last PAR +SE;} if ($instr eq "BEGIN:VEVENT"){$inevt = 1; $evt .= $instr; last PAR +SE;} if ($parts[0] =~ /DTSTART/){ @dparts = split(/T/, $parts[1]); if ($dparts[0] > $enddate){$badevt=1; last PARSE;} else {$badevt =0; $evt .= $instr; last PARSE;} } if ($inevt && !$badevt) {$evt .=$instr; last PARSE;} if ($instr eq "END:VEVENT"){ if ($badevt){$inevt=0;$badevt=0;$evt='';print "bad event \n"; +last PARSE;} else {$evt .= $instr; print " good event \n"; print OUT $evt; + $inevt=0;$badevt=0;$evt=''; last PARSE;} } $foo=$bar; } } close OUT; close IN; exit;

Comment on Problems with seemingly simple string matching...
Download Code
Re: Problems with seemingly simple string matching...
by desertrat (Acolyte) on Oct 04, 2013 at 22:23 UTC

    damned line endings it was. When I replaced chomp($instr) with chop($instr);chop($instr); everything worked as expected.

    NOW what puzzles me is why chomp didn't work. the input file was created on the same platform as the program and the perl interpreter, why didn't it properly remove the <CR>:<NEWLINE> pair?

      NOW what puzzles me is why chomp didn't work.

      ddumperBasic debugging checklist $line and $/

      OTOH, $line =~ s/\s+$//; ## trim trailing whitespace

      G'day desertrat,

      "NOW what puzzles me is why chomp didn't work. the input file was created on the same platform as the program and the perl interpreter, why didn't it properly remove the <CR>:<NEWLINE> pair?"

      From your shebang line, it looks like you're on a *nix OS: default line-ending is "\n". The file you're dealing with has "\r\n", which is the MSWin default.

      I'm not familiar with iCal. Perhaps its default output format is the same as MSWin. You may be able to change that via configuration or options.

      Assuming you are on a *nix OS, given a string ending in "\r\n", I'd expect chomp to remove the trailing newline and leave the carriage return.

      -- Ken

Re: Problems with seemingly simple string matching...
by boftx (Deacon) on Oct 05, 2013 at 00:09 UTC

    This is just a guess at this point (just got off work and I'm still sober), and I probably ought to double check the man page, but I'm pretty sure that chomp() will remove only a single whitespace character from the end of the line. Even though the \r\n pair might be considered a single char in other contexts, it is two distinct characters so far as chomp() is concerned.

    Update: Well, I guess I just stuck my foot in my mouth. I just checked the man page and it says:

    This safer version of chop removes any trailing string that corresponds to the current value of $/ (also known as $INPUT_RECORD_SEPARATOR in the English module).

    So kindly disregard my guess. :)

    On time, cheap, compliant with final specs. Pick two.
Re: Problems with seemingly simple string matching...
by aitap (Deacon) on Oct 05, 2013 at 17:21 UTC
    You may want to use the :crlf IOLayer which will transform \r\n line endings to \n on input (leaving just "\n" unchanged), like this:
    open (IN,'<:crlf', $infile) or die "Cannot open $infile";

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1056939]
Approved by mtmcc
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2014-12-29 14:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (189 votes), past polls