Is there a way around grep?

mdegges has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys, I'm a new perl user, and have been working on a perl web crawler for a couple weeks now. A few days ago I was really stuck on the normalize url part, so I showed my friend my code and asked him if he had any ideas. He emailed me back with the if(grep..) part down to the bottom.. but I have no idea what it's doing. I know grep's in the format {expr} stack and filters through the stack, but I'm still confused about it and want to understand. I think it might actually be better to just not use grep altogether, but is there a way of doing that?

use File::Basename;

#list of filenames to normalize
my @index_file_names=qw(index.html index.htm index.php index.asp index
+.cgi);


sub normalize_url {
    my $old_url = $_[0];
    chomp($old_url);
    #saves name at the end 
    my $filename=basename($old_url);

    if (grep {$_ eq $filename} @index_file_names) {
    #saves the directory part
    my $normalized_url=dirname($old_url);
    $normalized_url;
    }else{
        #don't need to normalize url
    $old_url; 
    }
}
[download]

Thanks for any help!

Comment on Is there a way around grep? Download Code

Replies are listed 'Best First'.
Re: Is there a way around grep? by toolic (Bishop) on Oct 11, 2011 at 01:20 UTC
In this case, think of grep as a `for` loop. It loops through all 5 of the filenames in your array variable (@index_file_names). For each filename ($_), it compares against the $filename value. If they are equivalent, an internal counter is incremented. grep returns the number of matches. If at least one match occured, the `if` clause is executed. use File::Basename; #list of filenames to normalize my @index_file_names=qw(index.html index.htm index.php index.asp index +.cgi); normalize_url('foo'); normalize_url('bar/index.php'); sub normalize_url { my $old_url = $_[0]; chomp($old_url); #saves name at the end my $filename=basename($old_url); if (grep {$_ eq $filename} @index_file_names) { #saves the directory part my $normalized_url=dirname($old_url); #$normalized_url; print "$old_url: found\n"; }else{ #don't need to normalize url #$old_url; print "$old_url: not found\n"; } } __END__ foo: not found bar/index.php: found [download] This is a common use of grep.	[reply] [d/l] [select]
Re: Is there a way around grep? by BrowserUk (Patriarch) on Oct 11, 2011 at 03:01 UTC
Why iterate when you can lookup?: `use File::Basename; #list of filenames to normalize my %index_file_names = map{ $_ => 1 } qw( index.html index.htm index.php index.asp index.cgi ); sub normalize_url { my $old_url = $_[0]; chomp($old_url); #saves name at the end my $filename=basename($old_url); if( exists $index_file_names{ $filename } ) { #saves the directory part my $normalized_url=dirname($old_url); $normalized_url; }else{ #don't need to normalize url $old_url; } }` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re: Is there a way around grep? by ikegami (Patriarch) on Oct 11, 2011 at 02:35 UTC
`if (grep {$_ eq $filename} @index_file_names) {` [download] is basically the same as `my $found; for (@index_file_names) { ++$found if $_ eq $filename; } if ($found) {` [download] grep	[reply] [d/l] [select]
Re^2: Is there a way around grep? by suaveant (Parson) on Oct 11, 2011 at 15:53 UTC
Of course, if only looking for a match a good addition would be a `last` to break the loop as soon as a match is found. Or use List::Util::first - Ant - Some of my best work - (1 2 3)	[reply] [d/l]
Re^3: Is there a way around grep? by AnomalousMonk (Archbishop) on Oct 11, 2011 at 22:42 UTC
... or List::MoreUtils::any	[reply]
Re^3: Is there a way around grep? by ikegami (Patriarch) on Oct 15, 2011 at 08:26 UTC
If you're looking for speed, I bet `grep` is normally faster than `for`+`last`.	[reply] [d/l] [select]

Back to Seekers of Perl Wisdom