Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

How to grep exact string

by Divakar (Sexton)
on Nov 15, 2012 at 07:14 UTC ( #1003945=perlquestion: print w/ replies, xml ) Need Help??
Divakar has asked for the wisdom of the Perl Monks concerning the following question:

Hi Folks,

i have 2 files (file1.txt and file2.txt). i have to check which things are newly added in file2.txt. so i am using grep function to check whether it is already there. but it is not matching exact string. anyone have any idea?

file1.txt contains

nt64osbld2tmp vm-nt64osbld2 vm-nt64osbld3 vm-nt64osbld4 vm-nt64osbld5 vm-nt64osremot1 vm-nt64osremot2 vm-nt64osremot3 vm-nt64osremot4 vm-nt64osremot5 vm-nt64osremot6 vm-nt64osremot7 vm-nt64osremot8 vm-ntdivakar1 vm-ntdivakar2 vm-ntdivakar4 vm-ntosbld1 vm-ntosbld5 vm-ntoscert3 vm-ntosdev1 vm-ntskommare1 vm-ntskommare2 vm-ntskommare4 vm-os2k8r264-01 vm-osremote1 vm-osremote10 vm-osremote2 vm-osremote3 vm-osremote4 vm-osremote5 vm-osremote6 vm-osw2k8-1 vm-osw2k8-2 vm-oswin2k3-32 vm-oswin2k3-64

file2.txt contains
nt64osbld2 nt64osbld2tmp nt64osbld3 nt64oscitrix1 NTOSBLD4 ntosbld5 ntosbld6 VM-NT64OSBLD2 VM-NT64OSBLD3 VM-NT64OSBLD4 VM-NT64OSBLD5 vm-nt64osremot1 vm-nt64osremot2 vm-nt64osremot3 vm-nt64osremot4 vm-nt64osremot5 vm-nt64osremot6 VM-NT64OSREMOT7 VM-NT64OSREMOT8 vm-ntosbld1 VM-NTOSBLD5 vm-ntoscert3 vm-ntosdev1 VM-OS2K8R264-01 vm-osremote1 vm-osremote10 vm-osremote2 vm-osremote3 vm-osremote4 VM-OSREMOTE5 vm-osremote6 VM-OSW2K8-1 VM-OSW2K8-2 VM-OSW2K8X64-1 VM-OSW2K8X64-2 VM-OSWIN2K3-32 VM-OSWIN2K3-64

below is my script.
use warnings; use strict; my @first_list; my @second_list; my @first_list_new; my @second_list_new; open FIRST_LIST, "< first_list.txt" or print $! "\n"; @first_list=<FIRST_LIST>; close (FIRST_LIST); chomp (@first_list); open SECOND_LIST, "< second_list.txt" or print $! "\n"; @second_list=<SECOND_LIST>; close (SECOND_LIST); chomp (@second_list); foreach my $machine (@first_list) { my $machine_new=lc($machine); push(@first_list_new,$machine_new); } foreach my $machine (@second_list) { my $machine_new=lc($machine); push(@second_list_new,$machine_new); } print "Machines extra in second list\n\n"; foreach my $unique (@second_list_new) { if (grep (/^$unique$/,@first_list_new)) { print "already there $unique\n\n"; } else { print "newly in this $unique\n\n"; } }

Thanks & Regards,

Divakar

Comment on How to grep exact string
Select or Download Code
Re: How to grep exact string
by tobyink (Abbot) on Nov 15, 2012 at 07:18 UTC
    if (grep { $_ eq $unique } @first_list_new) { ...; }
    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
      Hi tobyink, did you try this? i am getting syntax error if i replaced with what you gave. Thanks & Regards, Divakar
        I believe the way you have it is a correct way to do it (see grep):
        grep ( /^$unique$/, @first_list_new )
        An alternate syntax is:
        grep { /^$unique$/ } @first_list_new

        My suggested grep has been correct Perl syntax since Perl 5.0. (Perhaps before?)

        Note there are no parentheses after the grep... not this:

        grep({ ... } @list)

        And note that there's no comma after the block... not this:

        grep { ... }, @list

        It just needs to be like this:

        grep { ... } @list

        If you really want parentheses, they can go on the outside:

        (grep { ... } @list)
        perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re: How to grep exact string
by frozenwithjoy (Curate) on Nov 15, 2012 at 08:25 UTC
    What output are you expecting and what are you getting? I get the following with what you posted. Is it not what you wanted/intended?

    EDIT: After incorporating tobyink's or ColonelPanic's suggestion, I get the same results, but it is now safe against regex metacharacters

Re: How to grep exact string
by ColonelPanic (Friar) on Nov 15, 2012 at 08:46 UTC
    This is a perfect case for the smart match operator. You can do it like this:
    foreach my $unique (@second_list_new) { if ($unique ~~ @first_list_new)) { print "already there $unique\n\n"; } else { print "newly in this $unique\n\n"; } }
    This checks if any of @first_list_new is equal to $unique. Unlike grep, it will stop as soon as it finds a match.

    If you do stick with the regex method, it would be a good idea to use \Q...\E around your variable:
    if (grep (/^\Q$unique\E$/,@first_list_new))
    This ensures that it will match the literal content of $unqiue. Otherwise, some characters within $unique could be interpreted as special regex characters. This wouldn't be a problem with the sample data you have shown, but if you were searching for a value like "foo.bar", for example, it wouldn't do what you want without \Q...\E, because the . would match any character rather than a literal dot.

    Update: Of course, this doesn't answer why your original match failed. Could it be a whitespace problem? The second file that you posted above seems to have spaces at the end of the lines. If this is the problem, you could either remove whitespace when you read the files in, or ignore it in your comparison.

    Here is an example of how you could remove trailing whitespace from one of the files, as well as do all of your other processing in one go. Note that you no longer need to chomp, because the m switch on the regex causes the newline to also be matched and removed:
    my @first_list_new = map { s/\s*$//m; lc $_ } <FIRST_LIST>;


    When's the last time you used duct tape on a duct? --Larry Wall

      The behaviour of smart match on arrays may well change in Perl 5.18 or 5.20.

      This works quite nicely for me...

      use strict; use warnings; use File::Slurp qw( slurp ); use Syntax::Keyword::Junction qw( any ); my @first = map { lc($_) } slurp('file1.txt'); my @second = map { lc($_) } slurp('file2.txt'); chomp(@first, @second); print "Machines extra in second list...\n"; for my $machine (@second) { if ($machine eq any(@first)) { print "$machine is already there\n"; } else { print "$machine is new\n"; } }
      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
        I think (hope!) that smart match is mature enough at this point that you can rely on its behavior not changing for such a basic case.

        Nevertheless, another way of solving the problem is always useful.



        When's the last time you used duct tape on a duct? --Larry Wall

        Wasn't familiar with Syntax::Keyword::Junction, so I appreciate seeing it used here. Also, File::Slurp's a good module to use...

        Since the OP's comparing lists, List::Compare can help find unique entries in file2.txt:

        use Modern::Perl; use File::Slurp qw/read_file/; use List::Compare; my %hash; my @file1 = map { chomp; lc } read_file 'file1.txt'; my @file2 = map { chomp; lc } read_file 'file2.txt'; my $lc = List::Compare->new( \@file1, \@file2 ); my @file2Only = $lc->get_Ronly; say for @file2Only;

        Output:

        nt64osbld2 nt64osbld3 nt64oscitrix1 ntosbld4 ntosbld5 ntosbld6 vm-osw2k8x64-1 vm-osw2k8x64-2

        Not in the OP's output formatting, but that wouldn't take much more...

Use hashes instead of grep
by space_monk (Chaplain) on Nov 15, 2012 at 09:05 UTC
    Yes instead of using grep, why aren't you using hashes to determine whether there is anything new in the file? You can build up a hash of all the terms (machines) in file1 and then check the hash to see whether an entry exists for each line of file2.

    This would be much quicker than using grep for what you seem to be doing.

    A Monk aims to give answers to those who have none, and to learn from those who know more.
      I agree that, all else being equal, this would be the best design.

      However, it ultimately depends on what the rest of the code does.



      When's the last time you used duct tape on a duct? --Larry Wall
        Not really. Even if the code uses both arrays then there is nothing preventing building a temporary hash through
        my %hash = map { $_ => 1 } @first_list;
        ..and then discarding the hash after the check for new machines. More likely is that he is only interested in the second list, or just new machines, and they could be in an array just like before.
        A Monk aims to give answers to those who have none, and to learn from those who know more.
Re: How to grep exact string
by space_monk (Chaplain) on Nov 15, 2012 at 13:44 UTC
    This is a rough draft for the hash idea explained earlier. It is not debugged or tested for compilation, so feel free to fix any minor issues....
    use warnings; use strict; my @first_list; my @second_list; my %first_lc; my @second_lc; open FIRST_LIST, "< first_list.txt" or print $! "\n"; @first_list=<FIRST_LIST>; close (FIRST_LIST); chomp (@first_list); open SECOND_LIST, "< second_list.txt" or print $! "\n"; @second_list=<SECOND_LIST>; close (SECOND_LIST); chomp (@second_list); # create look up of machines in lower case my %first_lc = map { lc => 1 } @first_list; # not stricty necessary could use map in loop my @second_lc = map { lc } @second_list; print "Machines extra in second list\n\n"; foreach my $unique (@second_lc) { if (exists $first_lc{$unique}) { print "already there $unique\n\n"; } else { print "newly in this $unique\n\n"; } }
    A Monk aims to give answers to those who have none, and to learn from those who know more.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1003945]
Approved by tobyink
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (7)
As of 2014-08-20 09:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (110 votes), past polls