Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Hash/Array of Regular Expressions?

by bikeNomad (Priest)
on Jun 23, 2001 at 20:52 UTC ( [id://90974]=note: print w/replies, xml ) Need Help??


in reply to Hash/Array of Regular Expressions?

You can pre-compile a regular expression using the qr// quote-like operator:
#!/usr/bin/perl -w use strict; my @array = map { qr{$_} } ('^abcd', 'cd[ef]g', 'cat$'); while (<>) { chomp; foreach my $re (@array) { print "Matched $re\n" if m{$re}; } }

Replies are listed 'Best First'.
Re (2): Hash/Array of Regular Expressions? (code)
by deprecated (Priest) on Jun 24, 2001 at 18:00 UTC
      my @array = map { qr{$_} } ('^abcd', 'cd[ef]g', 'cat$');
    I'm curious why you chose to use map here. I've used arrays of qr!! (Parse Loops with flat text files. (code)), and I also think I have a good sense of when map is appropriate (Sort a long list of hosts by domain (code)). But in some cases, using map seems to be either obfuscation or bloat for the sake of shorter code.

    Observe:

    # timtowtdi... foreach ( qw{ ^abcd cd[ef]g cat$ } ) { push @array, qr{$_} }; # shorter still ... push @array, qr{$_} for qw{ ^abcd cd[ef]g cat$ }; # or the way I think is most clear and most sane.... @array = ( qr{^abcd}, qr{cd[ef]g}, qr{cat$} );
    Isn't there additional overhead from calling map? Some people are frenzied map lovers. And in some cases, it is the most appropriate way to do something. I just dont see why you used it here.

    Enlighten me?

    brother dep.

    --
    Laziness, Impatience, Hubris, and Generosity.

      Good question. I agree that sometimes map obscures things. However, I find it obscuring when it's used in void context (instead of a foreach loop). However, I also find that using a foreach loop where a mapping is happening obscures things.

      I used map because:<bl>

    • I assumed that the patterns weren't necessarily going to be in the program text (ruling out your literal
      @array = ( qr{^abcd}, qr{cd[ef]g}, qr{cat$} );
      option)
    • To me, the operation reads more clearly this way: I'm transforming an array of strings into an array of regexes by applying an operation to each of them. This is communicated most clearly (IMO) by the map operator. My problem with the foreach is that it tends to obscure the meaning of the code. We see a loop, then we have to decode it to figure out that it is in fact doing the same thing as a map. Kent Beck would call using map intention revealing.
    • I mostly use Smalltalk, where this is the idiomatic way to do it. </bl> In Smalltalk, this operation would be written as:
      regexes := strings collect: [ :ea | Regex new: ea ].
      In Smalltalk, every collection responds to the collect: message, which passes each of the elements of the collecion into a block (equivalent to a Perl CODE ref) whose output is collected into a collection of the same species as the original collection. So Perl's map operator corresponds directly to Smalltalk's collect: methods.

      Also, Perl's grep operator corresponds directly to Smalltalk's select: methods.

      update: changed title because of topic change

      Since you've called me out by name, I guess I have to respond. And I wouldn't call myself a "frenzied map lover". Like any function, map has it's place.

      When is the proper time to use map? I would argue that one should use map whenever you want to apply an expression to all the members of an array, and actually intend to use the array of results.

      Map is contraindicated when you're not going to use the result... use foreach in that case; map adds extra overhead compared to foreach if you aren't going to use the result. (that's the only additional overhead that I know of.) In this case, bikeNomad is using the result, and the code is therefore concise, and correct.

      Dep, you haven't shown any reason why map is a bad idea in this case. I think that clarity here is achieved by separating out the array of regexps, so that they are visually distinct and clear. Then, the code that maps that with qr() is also visually separate. I think this is a good thing.

      Since you implied you wanted it, here's my stylistic criticism of your alternatives:

      @array = ( qr{^abcd}, qr{cd[ef]g}, qr{cat$} );

      Putting a qr{} around each search term is terrible, IMHO. If you had a list with many search terms, it would result in much more typing. Even with a few terms, it means that each search expression that the author is trying to express is wrapped in a little bit of ugliness. (I do appreciate your use of qw{} elsewhere, to reduce quotes.)

      foreach ( qw{ ^abcd cd[ef]g cat$ } ) { push @array, qr{$_} }

      This isn't bad, but recommending it is the same as saying that map() shouldn't exist, since it's exactly the same, except with more typing.

       push @array, qr{$_} for qw{ ^abcd cd[ef]g cat$ };

      This is worst of all, I think, because it relies on the wierd semantic order of things in perl that few other languages implement (like putting the loop conditions after the loop body). Don't get me wrong, I think that sort of thing is cool, and is great for a some circumstances. But really, the point of the backwards syntax is to make perl read more like English. I'd rather my perl code read like C or TCL or lisp than English. Those types of constructs are exactly the sorts of things that make perl hard to read for novice perl programmers. The fact that I can iterate the push after-the-fact like you suggest here is non-obvious to someone coming from another language. Even someome familiar with perl might wonder, whether the precedence rules will do what you want. As it turns out, your code is correct, of course. But someone could easily read it as meaning something like: push @array, { qr{$_} } for qw{ ^abcd cd[ef]g cat$ };, which, of course, is wrong.

      My own stylistic fetishes, aside, you never said what you thought was wrong with using map. What is it that you object to?

      Map has lower overhead than many other list changing algorithms... this is mostly because it uses better, faster, fewer temporary variables. Lets comapare using our good old friend Devel::OpProf.

      #!/usr/bin/perl use warnings; use strict; use Devel::OpProf qw'profile print_stats zero_stats'; my @source = ( 1..10_000 ); my @dest = (); #measure the map profile(1); @dest = map { $_ * 10 } @source; profile(0); print "*** map ***\n"; print_stats(); zero_stats(); @dest = (); #measure the foreach profile(1); foreach(@source){ @dest = $_ * 10; } profile(0); print "\n*** foreach ***\n"; print_stats(); zero_stats(); @dest = (); #measure the for profile(1); push @dest, $_ * 10 for @source; profile(0); print "\n*** for ***\n"; print_stats();

      The output:

      *** map ***
      null operation           10005
      constant item            10001
      scalar variable          10000
      map iterator             10000
      multiplication (*)       10000
      block                    10000
      pushmark                 4
      next statement           2
      private array            2
      list assignment          1
      map                      1
      subroutine entry         1
      glob value               1
      
      *** foreach ***
      null operation           20005
      pushmark                 20002
      next statement           20002
      glob value               10002
      logical and (&&)         10001
      private array            10001
      constant item            10001
      foreach loop iterator    10001
      iteration finalizer      10000
      multiplication (*)       10000
      scalar dereference       10000
      list assignment          10000
      foreach loop entry       1
      subroutine entry         1
      loop exit                1
      
      *** for ***
      next statement           10003
      glob value               10002
      pushmark                 10002
      logical and (&&)         10001
      private array            10001
      constant item            10001
      foreach loop iterator    10001
      multiplication (*)       10000
      push                     10000
      iteration finalizer      10000
      scalar dereference       10000
      null operation           5
      foreach loop entry       1
      subroutine entry         1
      loop exit                1
      

      So we see that, a map has less action than a foreach, and stuffing the for in the push is almost as good as a map, and with many of the same operations going on.
      --
      Snazzy tagline here

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://90974]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (4)
As of 2024-03-19 07:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found