Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Is there a way around grep?

by mdegges (Initiate)
on Oct 11, 2011 at 01:06 UTC ( #930718=perlquestion: print w/ replies, xml ) Need Help??
mdegges has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys, I'm a new perl user, and have been working on a perl web crawler for a couple weeks now. A few days ago I was really stuck on the normalize url part, so I showed my friend my code and asked him if he had any ideas. He emailed me back with the if(grep..) part down to the bottom.. but I have no idea what it's doing. I know grep's in the format {expr} stack and filters through the stack, but I'm still confused about it and want to understand. I think it might actually be better to just not use grep altogether, but is there a way of doing that?
use File::Basename; #list of filenames to normalize my @index_file_names=qw(index.html index.htm index.php index.asp index +.cgi); sub normalize_url { my $old_url = $_[0]; chomp($old_url); #saves name at the end my $filename=basename($old_url); if (grep {$_ eq $filename} @index_file_names) { #saves the directory part my $normalized_url=dirname($old_url); $normalized_url; }else{ #don't need to normalize url $old_url; } }
Thanks for any help!

Comment on Is there a way around grep?
Download Code
Replies are listed 'Best First'.
Re: Is there a way around grep?
by toolic (Bishop) on Oct 11, 2011 at 01:20 UTC
    In this case, think of grep as a for loop. It loops through all 5 of the filenames in your array variable (@index_file_names). For each filename ($_), it compares against the $filename value. If they are equivalent, an internal counter is incremented. grep returns the number of matches. If at least one match occured, the if clause is executed.
    use File::Basename; #list of filenames to normalize my @index_file_names=qw(index.html index.htm index.php index.asp index +.cgi); normalize_url('foo'); normalize_url('bar/index.php'); sub normalize_url { my $old_url = $_[0]; chomp($old_url); #saves name at the end my $filename=basename($old_url); if (grep {$_ eq $filename} @index_file_names) { #saves the directory part my $normalized_url=dirname($old_url); #$normalized_url; print "$old_url: found\n"; }else{ #don't need to normalize url #$old_url; print "$old_url: not found\n"; } } __END__ foo: not found bar/index.php: found
    This is a common use of grep.
Re: Is there a way around grep?
by BrowserUk (Pope) on Oct 11, 2011 at 03:01 UTC

    Why iterate when you can lookup?:

    use File::Basename; #list of filenames to normalize my %index_file_names = map{ $_ => 1 } qw( index.html index.htm index.php index.asp index.cgi ); sub normalize_url { my $old_url = $_[0]; chomp($old_url); #saves name at the end my $filename=basename($old_url); if( exists $index_file_names{ $filename } ) { #saves the directory part my $normalized_url=dirname($old_url); $normalized_url; }else{ #don't need to normalize url $old_url; } }

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Is there a way around grep?
by ikegami (Pope) on Oct 11, 2011 at 02:35 UTC
    if (grep {$_ eq $filename} @index_file_names) {

    is basically the same as

    my $found; for (@index_file_names) { ++$found if $_ eq $filename; } if ($found) {

    grep

      Of course, if only looking for a match a good addition would be a last to break the loop as soon as a match is found.

      Or use List::Util::first

                      - Ant
                      - Some of my best work - (1 2 3)

        If you're looking for speed, I bet grep is normally faster than for+last.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://930718]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (7)
As of 2015-07-29 02:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (260 votes), past polls