Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Parsing/regex question

by vxp (Pilgrim)
on Jul 06, 2009 at 18:28 UTC ( [id://777628]=perlquestion: print w/replies, xml ) Need Help??

vxp has asked for the wisdom of the Perl Monks concerning the following question:

I'm parsing input and placing stuff that I am parsing into variables, so I can mangle them and do some logical operations on them

The file is read line-by-line and a bunch of regexes parse the relevant bits.

My problem comes in the form of "Position/Spot/Mark has rolled into ...." lines seen below in the script's output. Since input is read line by line, I can't craft a multiline regex to create a "$position_rolled_into = <datestamp>" for me. I was wondering if anyone can suggest a solution to this.

Thanks!

=========== DATE INITIALIZATION =========== Today is: 07/06/2009 Tomorrow is: 07/07/2009 Yesterday is: 07/05/2009 =========== INPUT =========== table_name keyid description ------------ ------------ ---------------------------------------- COCARRY StoreToDDI DONE COCARRY rundate 7/3/2009 ROLL ignore_local ROLL mark DONE 07/03/2009 22:09 ROLL position DONE 07/03/2009 22:08 ROLL rundate 07/03/2009 ROLL spot DONE 07/03/2009 22:08 SPLAdj data DONE Jul 4 2009 12:51AM SPLAdj lastcompdate Jul 3 2009 12:00AM SPLAdj rundate Jul 3 2009 12:00AM SPLbatch data DONE Jul 3 2009 11:55PM SPLbatch lastcompdate Jul 3 2009 12:00AM SPLbatch rundate Jul 3 2009 12:00AM SPLbatchNew data DONE Jul 4 2009 12:50AM SPLbatchNew lastcompdate Jul 3 2009 12:00AM SPLbatchNew rundate Jul 3 2009 12:00AM (16 rows affected) Position has rolled into -------------------------- Jul 6 2009 12:00AM (1 row affected) Spot has rolled into -------------------------- Jul 6 2009 12:00AM (1 row affected) Mark has rolled into -------------------------- Jul 6 2009 12:00AM (1 row affected) =========== AFTER PARSING =========== -- COCARRY StoreToDDI: DONE -- COCARRY rundate: 7/3/2009 -- ROLL ignore_local: nothing -- ROLL mark desc: DONE - 07/03/2009 - 22:09 -- ROLL position: DONE - 07/03/2009 - 22:08 -- ROLL rundate: 07/03/2009 -- ROLL spot desc: DONE - 07/03/2009 - 22:08 -- SPLAdj data desc: DONE - Jul 4 2009 - 12:51 - AM -- SPLAdj lastcompdate desc: Jul 3 2009 - 12:00 - AM -- SPLAdj rundate desc: Jul 3 2009 - 12:00 - AM -- SPLbatch data desc: DONE - Jul 3 2009 - 11:55 - PM -- SPLbatch lastcompdate desc: Jul 3 2009 - 12:00 - AM -- SPLbatch rundate desc: Jul 3 2009 - 12:00 - AM -- SPLbatchNew data desc: DONE - Jul 4 2009 - 12:50 - AM -- SPLbatchNew lastcompdate desc: Jul 3 2009 - 12:00 - AM -- SPLbatchNew rundate desc: Jul 3 2009 - 12:00 - AM

Full code below.

#!/usr/bin/perl my $tmp_dir = "./tmp"; my $region = shift; ################################################### # is the region correct? was it even given? ################################################### $region = lc($region); unless (($region =~ /^ny$/) || ($region =~ /^ln$/) || ($region =~ /^tk +$/) || ($region =~ /^hk$/) || ($region =~ /^se$/) || ($region =~ /^mo +$/) ) { print "The acceptable regions are:\nNY\nLN\nTK\nHK\nSE\nMO\n"; exit(-1); } print "=========== DATE INITIALIZATION ===========\n"; ############################################ # get today's date in the mm/dd/yyyy format ############################################ my ($sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $isdst) = loc +altime (time); $year = 1900 + $year; $mon = $mon + 1; if ($mon =~ /(\d)/) { $mon = "0" . $1; } if ($mday =~ /(\d)/) { $mday = "0" . $1; } my $today = $mon . "/" . $mday . "/" . $year; print "Today is: $today\n"; ############################################### # get tomorrow's date in the mm/dd/yyyy format ############################################### ($sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $isdst) = localt +ime (time + 86400); $year = 1900 + $year; $mon = $mon + 1; if ($mon =~ /(\d)/) { $mon = "0" . $1; } if ($mday =~ /(\d)/) { $mday = "0" . $1; } my $tomorrow = $mon . "/" . $mday . "/" . $year; print "Tomorrow is: $tomorrow\n"; ############################################### # get yesterday's date in the mm/dd/yyyy format ############################################### ($sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $isdst) = localt +ime (time - 86400); $year = 1900 + $year; $mon = $mon + 1; if ($mon =~ /(\d)/) { $mon = "0" . $1; } if ($mday =~ /(\d)/) { $mday = "0" . $1; } my $yesterday = $mon . "/" . $mday . "/" . $year; print "Yesterday is: $yesterday\n"; ################################################################ # process the input. parse it, place everything into variables. ################################################################ open(MSOUT, "$tmp_dir/$region") or die "Couldn't open $tmp_dir/$region +: $!\n"; print "\n=========== INPUT ===========\n"; while (<MSOUT>) { s/^\s+//; s/\s+$//; my $line = $_; chomp($line); next unless length > 0; print "$line\n"; # # COCARRY StoreToDDI DONE # if ($line =~ /^COCARRY\s+StoreToDDI\s+(.*)/) { $COCARRY_StoreToDDI_desc = $1; } # # COCARRY rundate 7/3/2009 # elsif ($line =~ /^COCARRY\s+rundate\s+(.*)/) { $COCARRY_rundate_desc = $1; } # # ROLL ignore_local # elsif ($line =~ /^ROLL\s+ignore_local$/) { $ROLL_ignore_local_desc = "nothing"; } # # ROLL ignore_local # elsif ($line =~ /^ROLL\s+ignore_local\s+(.*)/) { $ROLL_ignore_local_desc = $1; } # # ROLL mark DONE 07/03/2009 22:09 # elsif ($line =~ /^ROLL\s+mark\s+(\w+)\s+(\d+\/\d+\/\d+)\s(\d+:\d+)/ +) { $ROLL_mark_desc = $1; $ROLL_mark_desc_date = $2; $ROLL_mark_desc_time = $3; } # # ROLL position DONE 07/03/2009 22:08 # elsif ($line =~ /^ROLL\s+position\s+(\w+)\s+(\d+\/\d+\/\d+)\s(\d+:\ +d+)/) { $ROLL_position_desc = $1; $ROLL_position_desc_date = $2; $ROLL_position_desc_time = $3; } # # ROLL rundate 07/03/2009 # elsif ($line =~ /^ROLL\s+rundate\s+(.*)/) { $ROLL_rundate_desc = $1; } # # ROLL spot DONE 07/03/2009 22:08 # elsif ($line =~ /^ROLL\s+spot\s+(\w+)\s+(\d+\/\d+\/\d+)\s(\d+:\d+)/ +) { $ROLL_spot_desc = $1; $ROLL_spot_desc_date = $2; $ROLL_spot_desc_time = $3; } # # SPLAdj data DONE Jul 4 2009 12:51AM # elsif ($line =~ /^SPLAdj\s+data\s+(\w+)\s+(\w+\s+\d+\s\d+)\s(\d+:\d ++)(\w+)/) { $SPLAdj_data_desc = $1; $SPLAdj_data_desc_date = $2; $SPLAdj_data_desc_time = $3; $SPLAdj_data_desc_ampm = $4; } # # SPLAdj lastcompdate Jul 3 2009 12:00AM # elsif ($line =~ /^SPLAdj\s+lastcompdate\s+(\w+\s+\d+\s\d+)\s(\d+:\d ++)(\w+)/) { $SPLAdj_lastcompdate_desc_date = $1; $SPLAdj_lastcompdate_desc_time = $2; $SPLAdj_lastcompdate_desc_ampm = $3; } # # SPLAdj rundate Jul 3 2009 12:00AM # elsif ($line =~ /^SPLAdj\s+rundate\s+(\w+\s+\d+\s\d+)\s(\d+:\d+)(\w ++)/) { $SPLAdj_rundate_desc_date = $1; $SPLAdj_rundate_desc_time = $2; $SPLAdj_rundate_desc_ampm = $3; } # # SPLbatch data DONE Jul 3 2009 11:55PM # elsif ($line =~ /^SPLbatch\s+data\s+(\w+)\s+(\w+\s+\d+\s\d+)\s(\d+: +\d+)(\w+)/) { $SPLbatch_data_desc = $1; $SPLbatch_data_desc_date = $2; $SPLbatch_data_desc_time = $3; $SPLbatch_data_desc_ampm = $4; } # # SPLbatch lastcompdate Jul 3 2009 12:00AM # elsif ($line =~ /^SPLbatch\s+lastcompdate\s+(\w+\s+\d+\s\d+)\s(\d+: +\d+)(\w+)/) { $SPLbatch_lastcompdate_desc_date = $1; $SPLbatch_lastcompdate_desc_time = $2; $SPLbatch_lastcompdate_desc_ampm = $3; } # # SPLbatch rundate Jul 3 2009 12:00AM # elsif ($line =~ /^SPLbatch\s+rundate\s+(\w+\s+\d+\s\d+)\s(\d+:\d+)( +\w+)/) { $SPLbatch_rundate_desc_date = $1; $SPLbatch_rundate_desc_time = $2; $SPLbatch_rundate_desc_ampm = $3; } # # SPLbatchNew data DONE Jul 4 2009 12:50AM # elsif ($line =~ /^SPLbatchNew\s+data\s+(\w+)\s+(\w+\s+\d+\s\d+)\s(\ +d+:\d+)(\w+)/) { $SPLbatchNew_data_desc = $1; $SPLbatchNew_data_desc_date = $2; $SPLbatchNew_data_desc_time = $3; $SPLbatchNew_data_desc_ampm = $4; } # # SPLbatchNew lastcompdate Jul 3 2009 12:00AM # elsif ($line =~ /^SPLbatchNew\s+lastcompdate\s+(\w+\s+\d+\s\d+)\s(\ +d+:\d+)(\w+)/) { $SPLbatchNew_lastcompdate_desc_date = $1; $SPLbatchNew_lastcompdate_desc_time = $2; $SPLbatchNew_lastcompdate_desc_ampm = $3; } # # SPLbatchNew rundate Jul 3 2009 12:00AM # elsif ($line =~ /^SPLbatchNew\s+rundate\s+(\w+\s+\d+\s\d+)\s(\d+:\d ++)(\w+)/) { $SPLbatchNew_rundate_desc_date = $1; $SPLbatchNew_rundate_desc_time = $2; $SPLbatchNew_rundate_desc_ampm = $3; } } close(MSOUT); print "\n=========== AFTER PARSING ===========\n"; print "-- COCARRY StoreToDDI: $COCARRY_StoreToDDI_desc\n"; print "-- COCARRY rundate: $COCARRY_rundate_desc\n"; print "-- ROLL ignore_local: $ROLL_ignore_local_desc\n"; print "-- ROLL mark desc: $ROLL_mark_desc - $ROLL_mark_desc_date - $RO +LL_mark_desc_time\n"; print "-- ROLL position: $ROLL_position_desc - $ROLL_position_desc_dat +e - $ROLL_position_desc_time\n"; print "-- ROLL rundate: $ROLL_rundate_desc\n"; print "-- ROLL spot desc: $ROLL_spot_desc - $ROLL_spot_desc_date - $RO +LL_spot_desc_time\n"; print "-- SPLAdj data desc: $SPLAdj_data_desc - $SPLAdj_data_desc_date + - $SPLAdj_data_desc_time - $SPLAdj_data_desc_ampm\n"; print "-- SPLAdj lastcompdate desc: $SPLAdj_lastcompdate_desc_date - $ +SPLAdj_lastcompdate_desc_time - $SPLAdj_lastcompdate_desc_ampm\n"; print "-- SPLAdj rundate desc: $SPLAdj_rundate_desc_date - $SPLAdj_run +date_desc_time - $SPLAdj_rundate_desc_ampm\n"; print "-- SPLbatch data desc: $SPLbatch_data_desc - $SPLbatch_data_des +c_date - $SPLbatch_data_desc_time - $SPLbatch_data_desc_ampm\n"; print "-- SPLbatch lastcompdate desc: $SPLbatch_lastcompdate_desc_date + - $SPLbatch_lastcompdate_desc_time - $SPLbatch_lastcompdate_desc_amp +m\n"; print "-- SPLbatch rundate desc: $SPLbatch_rundate_desc_date - $SPLbat +ch_rundate_desc_time - $SPLbatch_rundate_desc_ampm\n"; print "-- SPLbatchNew data desc: $SPLbatchNew_data_desc - $SPLbatchNew +_data_desc_date - $SPLbatchNew_data_desc_time - $SPLbatchNew_data_des +c_ampm\n"; print "-- SPLbatchNew lastcompdate desc: $SPLbatchNew_lastcompdate_des +c_date - $SPLbatchNew_lastcompdate_desc_time - $SPLbatchNew_lastcompd +ate_desc_ampm\n"; print "-- SPLbatchNew rundate desc: $SPLbatchNew_rundate_desc_date - $ +SPLbatchNew_rundate_desc_time - $SPLbatchNew_rundate_desc_ampm\n";

Replies are listed 'Best First'.
Re: Parsing/regex question
by ikegami (Patriarch) on Jul 06, 2009 at 18:37 UTC
    Basically,
    my $position_rolled = 0; while (<$fh>) { chomp; if ($position_rolled) { $position_rolled = 0; $position_rolled_into = $_; } elsif ($_ eq 'Position has rolled into') { $position_rolled = 1; } ... }
Re: Parsing/regex question
by superfrink (Curate) on Jul 06, 2009 at 18:39 UTC
    What do you want the script to do when it encounters a "has rolled into" line? eg: skip the line and the following line or two?

    You could set a variable used to keep state when a line matches and at the top of the loop skip lines if the variable is set. Here is some sample completely un-tested code:
    my $lines_to_skip = 0; LINE: while (<MSOUT>) { if (0 < $lines_to_skip) { $lines_to_skip --; next LINE; } ... if ($line =~ /^(Position|Spot|Mark) has rolled into$/)) { $lines_to_skip = 1; next LINE; } }

      When the script encounters a "Position has rolled into" line it should skip everything up to the datestamp, and place that datestamp into a $position_rolled_into var

      Same goes for the "spot/mark has rolled into" :)

        should skip everything up to the datestamp

        oops, missed the "---" line. Adding one line of code addresses that:

        my $position_rolled = 0; while (<$fh>) { chomp; if ($position_rolled) { next if !/^... .\d 20\d\d/; $position_rolled = 0; $position_rolled_into = $_; } elsif ($_ eq 'Position has rolled into') { $position_rolled = 1; } ... }
        Then in the while loop you can do something like this:
        ... if (0 < $lines_to_skip) { if ($line =~ /datestamp-regex/) { next LINE; } # we are at the next date stamp line so fall through, etc. } ...
        Also rename $lines_to_skip to something suitable.
Re: Parsing/regex question
by vxp (Pilgrim) on Jul 06, 2009 at 19:34 UTC

    Thanks guys!

    I ended up doing the following:

    Right before the while (<MSOUT>) line:

    my $position_rolled = 0; my $spot_rolled = 0; my $mark_rolled = 0;

    inside the while (<MSOUT>) loop:

    ... next if $line =~ /^-/; if ($position_rolled) { $position_rolled = 0; $position_rolled_into = $line; } if ($spot_rolled) { $spot_rolled = 0; $spot_rolled_into = $line; } if ($mark_rolled) { $mark_rolled = 0; $mark_rolled_into = $line; } ... if ($line =~ /Position has rolled into/) { $position_rolled = 1; } if ($line =~ /Spot has rolled into/) { $spot_rolled = 1; } if ($line =~ /Mark has rolled into/) { $mark_rolled = 1; }

    And then printing it out:

    print "Position rolled into: $position_rolled_into\n"; print "Spot rolled into: $spot_rolled_into\n"; print "Mark rolled into: $mark_rolled_into\n";

    Works like a charm :)

Re: Parsing/regex question
by toolic (Bishop) on Jul 06, 2009 at 19:45 UTC
    Unrelated to your problem ... here is a suggestion to simplify your code. Use POSIX to determine your dates:
    print "=========== DATE INITIALIZATION ===========\n"; ############################################ # get dates in the mm/dd/yyyy format ############################################ use POSIX; print "Today is: " , strftime('%m/%d/%Y', localtime), "\n"; print "Tomorrow is: " , strftime('%m/%d/%Y', localtime(time + 86400)) +, "\n"; print "Yesterday is: " , strftime('%m/%d/%Y', localtime(time - 86400)) +, "\n";

      incorporated into my code, less lines now :D

      thanks for the tip!

        Don't, it's wrong.

        86400 seconds sooner/later can be the same day.
        86400 seconds sooner/later can be two days earlier/later.

        use DateTime qw( ); my $today = DateTime->today( time_zone => 'local' ); my $tomorrow = $today->add ( days => 1 ); my $yesterday = $today->subtract( days => 1 ); print "Today is: ", $today ->strftime('%m/%d/%Y'), "\n"; print "Tomorrow is: ", $tomorrow ->strftime('%m/%d/%Y'), "\n"; print "Yesterday is: ", $yesterday->strftime('%m/%d/%Y'), "\n";

        Update: Added solution.
        Update: Fixed spelling of time_zone. (underscore was missing)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://777628]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (6)
As of 2024-04-24 09:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found