Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

pattern match or key problem

by brassmon_k (Sexton)
on Jul 10, 2001 at 01:02 UTC ( #95162=perlquestion: print w/ replies, xml ) Need Help??
brassmon_k has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks I need your brain power,

I've been doing PERL for like 6 months and I'm not that good at regular expressions such as for pattern matching. I'll show you the script I have first. Then show you the data it needs to read and show you what I want out of it. Here is the script.
#!/usr/bin/perl -w use strict; my %mylist; my $min; my $max; my $range; my $line; # get the range (in this throwtogether it must be entered # with no spaces in the form HHMMSSTT-HHMMSSTT) # Hour, Minute, Second, Tenths of seconds # Get the range $range = <STDIN>; # Break up the range ($min, $max) = split /-/, $range; # Squeeze out leading and trailing spaces $min =~ s/^\s+//; $min =~ s/\s+$//; $max =~ s/^\s+//; $max =~ s/\s+$//; # Get the filenames and break into fields open(CAT, "cat mon402.log |"); (@ARGV = <CAT>) unless @ARGV; for (@ARGV) { if ($_ =~ m/(\d{2}\-\w{3}\-\d{4})\ (\d{2}\:\d{2}\:\d{2}\.\d{2})/) +{ # push the restricted range of filenames onto a hash of arrays + # keyed on the date field if (($2>= $min) && ($2 <= $max)) { push(@{$mylist{$2}}, $_); } } } my @keys = sort (keys %mylist); foreach my $key (@keys) { foreach my $thing (@{%mylist}{$key}){ foreach my $it (@$thing) { print "$min\n"; print "$max\n"; print "$it\n"; print "$key\n"; print "$thing\n"; print "@keys\n"; last; } } }
Okay "mon402.log" is the file I'm trying to pattern match in. I left the "$" out at the end of the pattern match line as it kept me from returning anything including errors, I'd just get my prompt again. Now if I leave it off I get the following errors
Argument "27-Jun-2001 00:23:59.37 D:19208797684 O:19209891234" isn't numeric or ge line 27 I think it's reading the whole line when it finds the time but it isn't even finding times in the time range I specify I'm lost. It will show all times with the error I listed just before. I think my pattern matching is wrong or the way I pushed the data off through the arrays and keys is screwed up.
The print statements at the bottom are just to see what prints out for the values given. Now here is the type of information in the "mon402.log".
27-Jun-2001 12:08:19.17 SendSMReq:T:1069 D:19203311338 O:Voice Mail M +WI P:0x20 C:0xe0 V:07-05 12:08 <20> 27-Jun-2001 12:08:19.36 SendSMCnf:AbsSub:1068 27-Jun-2001 12:08:19.56 SubmitInd:T:0 D:19207024720 O:19202928411 P:0 +x0 C:0x0 V:09-11 12:08 <yeah> 27-Jun-2001 12:08:20.05 SubmitInd:T:0 D:19207137406 O:19203314700 P:0 +x0 C:0x0 V:06-28 12:08 <LAST GF WAS 19 WAY TO IMMATURE, SUCKED IN BED +> 27-Jun-2001 12:08:20.30 SendSMReq:T:1070 D:16083089587 O:16083089560 +P:0x20 C:0x0 V:07-05 11:02 <good morning sleepy head> 27-Jun-2001 12:08:20.31 SubmitRsp:0 27-Jun-2001 12:08:20.32 SubmitRsp:0 27-Jun-2001 12:08:21.62 SendSMCnf:StatOK:1065 27-Jun-2001 12:08:22.36 SendSMCnf:StatOK:1060 27-Jun-2001 12:08:22.56 SendSMCnf:StatOK:1063
Now what I need is for the script to be able to match a time range on the second field right after the date. So when I give it a range such as -
10:38:38.42-10:39:39.42

I want the script to print the first 3 fields of each line inbetween the time I specified from STDIN. I looked on the web and in books for days but I can't seem to find an example of pattern matching internal to a file. External I can do but I've never done internal matching before and I'm not that good at regular expressions atleast the ones as complex as I need mine to be in order to search the way I want.

The brassmonk,
I relish the answer to this problem

Edit: chipmunk 2001-07-09

Comment on pattern match or key problem
Select or Download Code
Re: pattern match or key problem
by John M. Dlugosz (Monsignor) on Jul 10, 2001 at 02:01 UTC
    I don't see how your regex can return the whole line as $2, because there are no +'s or *'s in there that can run away!

    But I can help with the numeric part. You ask, essentially,

    '12:08:19.17' >= '12:00:00'
    And >= works between two numbers. Doing a string comparison (ge instead of >=) won't give you correct results unless the missing parts are padded.

    —John

Re: pattern match or key problem
by tachyon (Chancellor) on Jul 10, 2001 at 02:23 UTC

    OK so lets break it down you want the data between a range of times. Your times in the logfiles are stored as strings which makes comparing them to a range difficult. Comparing numbers on the other hand is easy. So, here is how you convert between a string that represents your time and the unix epoch time which is the number of seconds between this time and 1 Jan 1970

    use Time::Local; ($day,$mon,$year,$hours,$min,$sec) = split /[- :]/,"27-Jun-2001 12:08: +19.17"; print "$day,$mon,$year,$hours,$min,$sec\n"; $time = timelocal($sec,$min,$hours,$day,$mon,$year); print "$time\n"; print scalar localtime $time;

    So now we can generate a number that represents our time, from the sort of data you can get from the logfile, the problem becomes much easier. You need to convert your time range to unix epoc seconds. You then read each line of the file, strip out the string that represents your time, and convert it to epoch seconds as well. If it is within the range of interest you are green to go, otherwise you go to the next line. Here is some code that does just this. I read from the DATA file handle to make it easy.

    #!/usr/bin/perl -w use strict; use Time::Local; my ($day,$mon,$year,$hours,$min,$sec) = split /[- :]/,"27-Jun-2001 12: +08:19.50"; my $start = timelocal($sec,$min,$hours,$day,$mon,$year); # here is a shorter way using an array for our time elements my @time = split /[- :]/,"27-Jun-2001 12:08:21.50"; my $finish = timelocal(@time[5,4,3,0,1,2]); # iterate over our data while (<DATA>) { next if m/^\s*$/; # split the line into fields my @data = split /\s/, $_; # generate our time array my @time = split /[- :]/, "$data[0] $data[1]"; # convert to epoch time my $time = timelocal(@time[5,4,3,0,1,2]); # print the first three fields if in range if ($time > $start and $time < $finish) { print "$data[0] $data[1] $data[2]\n"; } } __DATA__ 27-Jun-2001 12:08:19.17 SendSMReq:T:1069 D:19203311338 O:Voice Mail M +WI P:0x20 C:0xe0 V:07-05 12:08 <20> 27-Jun-2001 12:08:19.36 SendSMCnf:AbsSub:1068 27-Jun-2001 12:08:19.56 SubmitInd:T:0 D:19207024720 O:19202928411 P:0 +x0 C:0x0 V:09-11 12:08 <yeah> 27-Jun-2001 12:08:20.05 SubmitInd:T:0 D:19207137406 O:19203314700 P:0 +x0 C:0x0 V:06-28 12:08 <LAST GF WAS 19 WAY TO IMMATURE, SUCKED IN BED +> 27-Jun-2001 12:08:20.30 SendSMReq:T:1070 D:16083089587 O:16083089560 +P:0x20 C:0x0 V:07-05 11:02 <good morning sleepy head> 27-Jun-2001 12:08:20.31 SubmitRsp:0 27-Jun-2001 12:08:20.32 SubmitRsp:0 27-Jun-2001 12:08:21.62 SendSMCnf:StatOK:1065 27-Jun-2001 12:08:22.36 SendSMCnf:StatOK:1060 27-Jun-2001 12:08:22.56 SendSMCnf:StatOK:1063

    So all you need to do now is convert you input time range into unix epoch and you are set. As you have used a kludge I will not code this for you. If you just want to use the current day/month/year use the localtime() function to get what you need;

    cheers

    PS By the way did you know that a brass monkey was the thing that the old man o'war sailing ships used to store the cannon balls on. In the arctic the cold led to contraction of the brass and the cannon balls fell off. Thus was born the term "It's cold enough to freeze the balls of a brass monkey".

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: pattern match or key problem
by frag (Hermit) on Jul 10, 2001 at 02:40 UTC
    Concerning matching within a file: does this help?
    open(FILE, "your.log"); # read each line from the file while ($line = <FILE>) { if ($line =~ m/the pattern you're looking for/) { # do something with the matched line } }

    If that's not what you meant and is something you already know, please clarify.

    As for handling time ranges, there's a number of existing nodes here that can give you some help; the search box and Super Search are a monk's best friends. In particular, check out the responses to Within A Date Time Range.

    -- Frag.

Re: pattern match or key problem
by brassmon_k (Sexton) on Jul 11, 2001 at 19:12 UTC
    This is brassmonk, I figured it out. I had a big (DA). Okay in the line which has "if (($2 >= $min) && ($2 <= $max)) {"

    Well I tried "ge" in both of them prior to my post on accident I forgot to use "le" in the other. As soon as I did that it worked because I forgot that "ge" and "le" work with strings and "<=" or ">=" work with numbers. Anyway here's the revised script. Granted it's not everything I want yet but it's perty cool right now still working on it. Going to turn it into a CGI after I get the blasted thing to do what I want.
    #!/usr/bin/perl -w 
    use strict; 
    my %mylist; 
    my $min; 
    my $max; 
    my $range; 
    my $line; 
    # get the range (in this throwtogether it must be entered 
    # with no spaces in the form HH:MM:SS.TS-HH:MM:SS.TS) 
    # Get the range 
    $range = <STDIN>; 
    # Break up the range 
    ($min, $max) = split /-/, $range; 
    # Squeeze out leading and trailing spaces 
    $min =~ s/^\s+//; 
    $min =~ s/\s+$//; 
    $max =~ s/^\s+//; 
    $max =~ s/\s+$//; 
    # Open the datafile and break into fields 
    open(FILE, "mon402.log");
    while (<FILE>) {
    if (s/<(.*?)>/TEXT/) {
    if ($_ =~ m/(\d+\-\w+\-\d+)\s(\d{2}\:\d{2}\:\d{2}\.\d{2})\s+(\w*)/) {
    # Push the restricted range of times onto a hash of arrays 
    # keyed on the time field 
    if (($2 ge $min) && ($2 le $max)) { 
    push(@{$mylist{$2}}, $_); 
    }
          }  
              } 
                  }  
    my @keys = sort (keys %mylist); 
    foreach my $key (@keys) { 
    foreach my $thing (@{%mylist}{$key}){ 
    foreach my $it (@$thing) { 
    print "$it\n"; 
    last;
          } 
       } 
    }
    
    I did a few extra thingies. Anyway this is an SMS message log for cell phones and one of the new things is I eliminated customers text message text by making it say TEXT because it's a privacy liability and our company got sued because one of our customer care reps was reading them (To bad for him, he got the boot) Anyway now all it says is TEXT for the text(I love it) if (s/<(.*?)>/TEXT/) { I also improved the pattern match line. It's uglier than last time but it does see the fields, I like it when it looks cryptic like that! I had some test prints (print $1, $2, etc.) in there before to test if it sees separate fields and it did find the fields. (I love this stuff it's fun) So everything is cool now. Thanks for the help big time though. You people in this forum oops "monks" are so good at perl. I'm getting better but you guys rule at this perl. I'll get there someday. Anywho THANX!
    DMH
    brassmonk
    Sorry I have no clever annotation or quote to spew out at the moment.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://95162]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (12)
As of 2014-09-30 19:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (382 votes), past polls