Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

grep only lines having matched pattern

by noviceuser (Acolyte)
on Apr 01, 2021 at 09:09 UTC ( [id://11130663]=perlquestion: print w/replies, xml ) Need Help??

noviceuser has asked for the wisdom of the Perl Monks concerning the following question:

Hi, i want to get below data in my perl script but am not aware of grep option to do that

suppose i have a file having below input lines, and i want to grep only lines having "03-15-2021" and not "03-15-2021-1" or 03-15-2021-2 etc

e.g. input:

03-15-2021-1 21.1.0-s103 2021/03/15:14:16:39 21.1 21.10-s103

03-15-2021-2 21.1.0-s103 2021/03/15:14:16:39 21.1 21.10-s103

03-15-2021 21.1.0-s102 2021/03/15:04:00:09 21.1 21.10-s102

output

03-15-2021 21.1.0-s102 2021/03/15:04:00:09 21.1 21.10-s102

Replies are listed 'Best First'.
Re: grep only lines having matched pattern
by Discipulus (Canon) on Apr 01, 2021 at 09:18 UTC
Re: grep only lines having matched pattern
by Tux (Canon) on Apr 01, 2021 at 09:59 UTC

    Both suggestions propose to check for a space to follow the date, but that will work well in this example, but will fail for dates that are located at the end of the line.

    As you posted your data explicitely, that won't be a problem, but maybe looking at the criterium a bit more defensive, you can also say: match a date *not* followed by any of -, digit, letter or underscore (identifier characters).

    my @lines = grep { m/ \b (?: 0[1-9] | 1[0-2] ) - (?: 0[1-9] | [12][0-9 +] | 3[01] ) - [0-9]{4} ) (?! [-\w] ) /x } @data;

    And your input data is horrific: MM-DD-YYYY ... YYYY/MM/DD. How on earth does someone come up with a mixed format like that? (/me is all for a global ban on M/D/Y and Y/D/M format)


    Enjoy, Have FUN! H.Merijn
      Well since \n is a space character - see below. However it appears to me that anchoring this regex to the beginning of the line is just fine.
      use strict; use warnings; while (<DATA>) { if (/\d{2}-\d{2}-\d{4}\s+/) { print; } } =Prints: 03-15-2021 21.1.0-s102 2021/03/15:04:00:09 21.1 21.10-s102 21.1.0-s102 2021/03/15:04:00:09 21.1 21.10-s102 03-15-2021 **works** =cut __DATA__ 03-15-2021-1 21.1.0-s103 2021/03/15:14:16:39 21.1 21.10-s103 03-15-2021-2 21.1.0-s103 2021/03/15:14:16:39 21.1 21.10-s103 03-15-2021 21.1.0-s102 2021/03/15:04:00:09 21.1 21.10-s102 21.1.0-s102 2021/03/15:04:00:09 21.1 21.10-s102 03-15-2021 21.1.0-s102 2021/03/15:04:00:09 21.1 21.10-s102 03-15-2021-4
      I guess that /\d{2}-\d{2}-\d{4}[^-]/would also work?

        Using DATA in example code tells me nothing about the *real* source for the data. It can be a log file or a database or a process that pipes otther sources into a (stream of) single lines of log that have no line endings at all.

        To *me* thinking out of that box has caused me to sometime be overprotective and think out of the box. It not only makes many lines in my code show more explicit what the intent is, but it also protects against the other ways in what this data can be supplied (in the future).

        Be liberal on the recieving end and be strict on the producing end.

        Been there, done that: you have no idea how completely valid CSV files get corrupted by people in the chain that want to "check" the content using a spreadsheet program like Excel and instead of exiting hit "OK" when the program asks them to write the changed data even if the change is just widening the column or changing the font.


        Enjoy, Have FUN! H.Merijn
Re: grep only lines having matched pattern
by philipbailey (Curate) on Apr 01, 2021 at 09:19 UTC
    The approach does depend on how your input data varies, but the below will work for your example.
    use strict; use warnings; for my $line (<DATA>) { print $line if $line =~ /^\d{2}-\d{2}-\d{4}\s/; } __END__ 03-15-2021-1 21.1.0-s103 2021/03/15:14:16:39 21.1 21.10-s103 03-15-2021-2 21.1.0-s103 2021/03/15:14:16:39 21.1 21.10-s103 03-15-2021 21.1.0-s102 2021/03/15:04:00:09 21.1 21.10-s102
Re: grep only lines having matched pattern
by BillKSmith (Monsignor) on Apr 01, 2021 at 17:13 UTC
    My understanding of your requirement is that you only want to accept lines starting with a date not followed by a dash and a non-zero digit. I recommend using a module to match the date. My choice is no better than Tux's hand coded regex, but at least it makes the intention clear. It would be convenient if you have an additional requirement to match other date/time formats in your data.
    use strict; use warnings; use Regexp::Common qw(time); my $DATE = $RE{time}{tf}{-pat => 'mm-dd-yyyy'}; my @data = <DATA>; my @wanted = grep {/^$DATE(?!-[1-9])/} @data; print @wanted; __DATA__ 03-15-2021-1 21.1.0-s103 2021/03/15:14:16:39 21.1 21.10-s103 03-15-2021-2 21.1.0-s103 2021/03/15:14:16:39 21.1 21.10-s103 03-15-2021 21.1.0-s102 2021/03/15:04:00:09 21.1 21.10-s102
    Bill
      You wrote: "My understanding of your requirement is that you only want to accept lines starting with a date not followed by a dash and a non-zero digit."

      My understanding is different.
      "not followed by a dash" is sufficient.

Re: grep only lines having matched pattern
by Marshall (Canon) on Apr 02, 2021 at 00:11 UTC
    This is only a comment:
    I suppose that you don't have any control at all over the format of this file. However, Tux, I and many other Monks would counsel you to use a YYYY-MM-DD format with leading zero's required in case of single digits (if you have a choice). This format winds up being "ASCII sortable" and that is a huge advantage with Perl. I personally generate all log files in UTC but I am often looking at international network files. Mileage varies with choice of time zone.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11130663]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2024-04-23 06:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found