Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Problems with formatting the results of my regex

by superwombat (Novice)
on Jul 07, 2012 at 05:43 UTC ( #980443=perlquestion: print w/replies, xml ) Need Help??
superwombat has asked for the wisdom of the Perl Monks concerning the following question:

So, I've been working on a simple program to help me parse some log files. I have a regular expression that searches each line of the file, if it matches my search string, then it returns the date and timestamp, as well as the result (numerical value) at the end of the log file line.

Here's a sample line from the file I'm parsing

20120704 00:05:53.46;CmdTask(0);EV;FBLdxPreAlignment rtn=0, Yoffset=-4278

my $tags[0] is set to "FBLdxPreAlignment"

Here's my regex

if ($_=~/(\d+ \d+:\d+:\d+.\d+).*$tags[0].*\=(\-?\d+.?\d*).*/){

When I just print the results, either to the console or to a file, it works perfectly.

print "$1,$2\n";

returns "20120704 00:05:53.46,-4278"

What I need, is for it to print the results, several commas, then a newline. This is so I can have separate columns of data in an output CSV for more than one search string at a time. When I change the code to print out a string of trailing commas, the following output happens.

print "$1,$2,,,,,\n";

returns ",,,,,704 00:05:53.46,-4278"

I'm not super experienced at regexs, so I thought maybe $1 $2 were somehow being modified on the fly. I tried saving them to a variable as soon as they were generated

my $data = "$1,$2\n"; chomp $data; print "$data\n"; print "$data,,,,,\n";


"20120704 00:05:53.46,-4278"

",,,,,704 00:05:53.46,-4278"

Now, you'll notice I added a newline and then chomped it. If I don't have the newline on the end, I get no output at all.

print "$1,$2";

Results: blank

The logfiles I'm working with are from a machine running VMS. I think the issues I'm having may be related to different formatting of the data (newlines characters or something). If I copy several lines from the file and save it as a .log file on my own computer, it works as expected (the comma's go at the end where I want them) I'll go ahead and post the entirety of my current code here at the end, in case there's some other error I've made that I'm missing.

foreach (@files){ $logfile=$_; print "$_\n"; print OUTFILE "Time,$tags[0],$tags[1],$tags[2],$tags[3],$tags[4]\n +"; # open LOGFILE, "$logfile"; while (<LOGFILE>){ chomp $_; if ($_=~/(\d+ \d+:\d+:\d+.\d+).*$tags[0].*\=(\-?\d+.?\d*).*/){ my $data = "$1,$2\n"; print "$data"; chomp $data; print "$data,,,,,\n"; print "$data"; } } close LOGFILE; }

Replies are listed 'Best First'.
Re: Problems with formatting the results of my regex
by monsoon (Pilgrim) on Jul 07, 2012 at 06:18 UTC
    I think you may have a carriage return character \r at the end of the lines in your input file. chomp doesn't remove it. So after $data with a \r gets printed the caret is at the beginning of the line and the beginning of the line is overwritten with whatever follows $data in the print statement. You can remove \r with this for example tr/\015//d;.

      Fantastic! Thanks for the quick response. I tested the new code and it's working perfectly now. I still can't understand why the \r was getting captured in $2, since that portion supposedly should end with \d* zero or more digits. In any case, I'm glad it's fixed now.

        .? in (\-?\d+.?\d*) matched \r
Re: Problems with formatting the results of my regex
by johngg (Canon) on Jul 07, 2012 at 11:07 UTC

    If your log files have CRLF line endings you can also open them for reading via the :crlf layer so that chomp will remove the carriage return as well as the line feed.

    knoppix@Microknoppix:~$ hexdump -C xxx.crlf 00000000 4c 69 6e 65 20 31 0d 0a 4c 69 6e 65 20 32 0d 0a |Line 1..L +ine 2..| 00000010 4c 69 6e 65 20 33 0d 0a 4c 69 6e 65 20 34 0d 0a |Line 3..L +ine 4..| 00000020 4c 69 6e 65 20 35 0d 0a |Line 5..| 00000028 knoppix@Microknoppix:~$ perl -E ' > open $in, q{<:crlf}, q{xxx.crlf} or die $!; > while ( <$in> ) > { > chomp; > say qq{>$_<}; > }' >Line 1< >Line 2< >Line 3< >Line 4< >Line 5< knoppix@Microknoppix:~$

    I hope this is helpful.



Re: Problems with formatting the results of my regex
by Marshall (Abbot) on Jul 07, 2012 at 07:14 UTC
    Why make it more complicated than it needs to be?
    #!/usr/bin/perl -w use strict; my $in ="20120704 00:05:53.46;CmdTask(0);EV;FBLdxPreAlignment rtn=0, Y +offset=-4278 "; #desired: 20120704,00:05:53.46,,,-4278 my ($num, $time, $offset) = $in =~ /\s*(\d+)\s+([\d:.]+).*=([-\d]+)\s* +$/; print "$num,$time,,,$offset\n"; __END__ prints: 20120704,00:05:53.46,,,-4278
      Thanks so much for the additional help. I know my Regexes are not as streamlined as they could/should be, and I really appreciate the advice on how to improve them beyond just fixing the issue I was having.
      Doesn't match this: "20120704 00:05:53.46;CmdTask(0);EV;FBLdxPreAlignment rtn=0, Yoffset=-4278.25". The more I look at it the more I think the regex for $2 was supposed to match floating point numbers.
        Exactly, needs to match the possibility of a floating point number.
        The ".25" wasn't defined in the input spec, but this is fine addition that makes things more flexible.
Re: Problems with formatting the results of my regex
by 2teez (Vicar) on Jul 07, 2012 at 07:04 UTC

    OR you could use this, if you want:

    while(...){ chomp; if (m{([[:digit:]].+?);.+?$tags[0].+?=.+?((-)?[[:digit:]].+?)$}s) +{ printf "%s,%s\n", $1, $2; } .... }
    why use '.*' after $2 i.e (\-?\d+.?\d*).*/. Your code will still hav +e worked as (\-?\d+.?\d*) for your $2.

      How does your regex not match a \r?
      $_ = "20120704 00:05:53.46;CmdTask(0);EV;FBLdxPreAlignment rtn=0, Yoff +set=-4278\r"; $tag[0] = "FBLdxPreAlignment"; if (m{([[:digit:]].+?);.+?$tags[0].+?=.+?((-)?[[:digit:]].+?)$}s) +{ printf "%s,%s,,,,,,\n", $1, $2; }
      ,,,,,,04 00:05:53.46,-4278
      I put the trailing ".*" on the end to attempt to capture whatever end of line symbol was thwarting me. It's completely not needed.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://980443]
Approved by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (6)
As of 2018-10-16 20:57 GMT
Find Nodes?
    Voting Booth?
    When I need money for a bigger acquisition, I usually ...

    Results (89 votes). Check out past polls.