http://www.perlmonks.org?node_id=930718

mdegges has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys, I'm a new perl user, and have been working on a perl web crawler for a couple weeks now. A few days ago I was really stuck on the normalize url part, so I showed my friend my code and asked him if he had any ideas. He emailed me back with the if(grep..) part down to the bottom.. but I have no idea what it's doing. I know grep's in the format {expr} stack and filters through the stack, but I'm still confused about it and want to understand. I think it might actually be better to just not use grep altogether, but is there a way of doing that?
use File::Basename; #list of filenames to normalize my @index_file_names=qw(index.html index.htm index.php index.asp index +.cgi); sub normalize_url { my $old_url = $_[0]; chomp($old_url); #saves name at the end my $filename=basename($old_url); if (grep {$_ eq $filename} @index_file_names) { #saves the directory part my $normalized_url=dirname($old_url); $normalized_url; }else{ #don't need to normalize url $old_url; } }
Thanks for any help!

Replies are listed 'Best First'.
Re: Is there a way around grep?
by toolic (Bishop) on Oct 11, 2011 at 01:20 UTC
    In this case, think of grep as a for loop. It loops through all 5 of the filenames in your array variable (@index_file_names). For each filename ($_), it compares against the $filename value. If they are equivalent, an internal counter is incremented. grep returns the number of matches. If at least one match occured, the if clause is executed.
    use File::Basename; #list of filenames to normalize my @index_file_names=qw(index.html index.htm index.php index.asp index +.cgi); normalize_url('foo'); normalize_url('bar/index.php'); sub normalize_url { my $old_url = $_[0]; chomp($old_url); #saves name at the end my $filename=basename($old_url); if (grep {$_ eq $filename} @index_file_names) { #saves the directory part my $normalized_url=dirname($old_url); #$normalized_url; print "$old_url: found\n"; }else{ #don't need to normalize url #$old_url; print "$old_url: not found\n"; } } __END__ foo: not found bar/index.php: found
    This is a common use of grep.
Re: Is there a way around grep?
by BrowserUk (Patriarch) on Oct 11, 2011 at 03:01 UTC

    Why iterate when you can lookup?:

    use File::Basename; #list of filenames to normalize my %index_file_names = map{ $_ => 1 } qw( index.html index.htm index.php index.asp index.cgi ); sub normalize_url { my $old_url = $_[0]; chomp($old_url); #saves name at the end my $filename=basename($old_url); if( exists $index_file_names{ $filename } ) { #saves the directory part my $normalized_url=dirname($old_url); $normalized_url; }else{ #don't need to normalize url $old_url; } }

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Is there a way around grep?
by ikegami (Patriarch) on Oct 11, 2011 at 02:35 UTC
    if (grep {$_ eq $filename} @index_file_names) {

    is basically the same as

    my $found; for (@index_file_names) { ++$found if $_ eq $filename; } if ($found) {

    grep

      Of course, if only looking for a match a good addition would be a last to break the loop as soon as a match is found.

      Or use List::Util::first

                      - Ant
                      - Some of my best work - (1 2 3)

        If you're looking for speed, I bet grep is normally faster than for+last.