Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Question on Regex

by Athanasius (Chancellor)
on Nov 18, 2012 at 14:20 UTC ( #1004416=note: print w/replies, xml ) Need Help??


in reply to Question on Regex

You can also do this without a regex, by using split on the ‘/’ character to produce a list, and then subscripting the list to get the desired field:

#! perl use strict; use warnings; my $count = 0; for my $line (<DATA>) { my $name = (split '/', $line)[4]; print "Name #", ++$count, " is '", $name, "'\n"; } __DATA__ .co.uk/Jobs/Company-Sector/C8A6446X4PND86M9WYJ/Tradewind/?APath=2.21.0 +.0.0 .com/Stuff/Somewhere/ABCD789/Peabody/?APath=2.0.12.1.3 .com.au/More-Stuff/Anywhere/XYZ12345/Perkins/?APath=4.5.6.7.8

Output:

0:14 >perl 390_SoPW.pl Name #1 is 'Tradewind' Name #2 is 'Peabody' Name #3 is 'Perkins' 0:18 >

Hope that helps,

Athanasius <°(((><contra mundum

Replies are listed 'Best First'.
Re^2: Question on Regex
by Anonymous Monk on Nov 18, 2012 at 14:26 UTC

    Thanks Athanasius. I got what I was looking. Thanks a lot. :)

      FYI:

      #!/usr/bin/perl use strict; use warnings; use Benchmark qw ( :hireswallclock cmpthese timethese ); our $string = qq (.co.uk/Jobs/Company-Sector/C8A6446X4PND86M9WYJ/Trade +wind/?APath=2.21.0.0.0); sub karlgoethebier { our $string; $string =~ m/.+\/.+\/.+\/.+\/(.+)\/.+/; return $1; } sub athanasius { our $string; return (split '/', $string)[4]; } my $results = timethese (-10, { 'karlgoethebier' => 'karlgoethebier', 'athanasius' => 'athanasius', }); cmpthese($results); __END__ Karls-Mac-mini:Desktop karl$ ./tradewind.pl Benchmark: running athanasius, karlgoethebier for at least 10 CPU seco +nds... athanasius: 10.4769 wallclock secs (10.47 usr + 0.00 sys = 10.47 CPU) + @ 627362.46/s (n=6568485) karlgoethebier: 10.4287 wallclock secs (10.42 usr + 0.00 sys = 10.42 +CPU) @ 105188.77/s (n=1096067) Rate karlgoethebier athanasius karlgoethebier 105189/s -- -83% athanasius 627362/s 496% --

      Regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

        Which shows that sub athanasius is up to 5 times faster than sub karlgoethebier. Even easier to see when supplying a positive COUNT value to timethese:

        #! perl use strict; use warnings; use Benchmark qw( :hireswallclock cmpthese timethese ); my $string = ".co.uk/Jobs/Company-Sector/C8A6446X4PND86M9WYJ/Tradewind +/?APath=2.21.0.0.0"; cmpthese( timethese ( 1_000_000, { 'karlgoethebier' => sub { $string =~ m/.+\/.+\/.+\/.+\/(.+)\/.+/; retu +rn $1; }, 'athanasius' => sub { return (split '/', $string)[4] }, } ) );

        Output:

        13:09 >perl 390_SoPW.pl Benchmark: timing 1000000 iterations of athanasius, karlgoethebier... athanasius: 5.19894 wallclock secs ( 5.15 usr + 0.00 sys = 5.15 CPU) + @ 194250.19/s (n=1000000) karlgoethebier: 20.2485 wallclock secs (20.08 usr + 0.00 sys = 20.08 +CPU) @ 49808.24/s (n=1000000) Rate karlgoethebier athanasius karlgoethebier 49808/s -- -74% athanasius 194250/s 290% -- 13:16 >

        Not really surprising, since regexen with quantifiers can be expensive:

        Avoid regular expressions with many quantifiers.... Such patterns can result in exponentially slow backtracking behavior unless the quantified subpatterns match on their first “pass”.
        The Camel Book, 4th Edition, p. 693.

        So, what happens if we limit the backtracking?

        #! perl use strict; use warnings; use Benchmark qw( :hireswallclock cmpthese timethese ); my $string = ".co.uk/Jobs/Company-Sector/C8A6446X4PND86M9WYJ/Tradewind +/?APath=2.21.0.0.0"; cmpthese( timethese ( 1_000_000, { 'karlgoethebier' => sub { $string =~ m/.+\/.+\/.+\/.+\/(.+)\/.+/; retu +rn $1; }, 'karlgoethebier2' => sub { $string =~ m/.+?\/.+?\/.+?\/.+?\/(.+?)\/.+/; + return $1; }, 'athanasius' => sub { return (split '/', $string)[4] }, } ) );

        Result:

        13:28 >perl 390_SoPW.pl Benchmark: timing 1000000 iterations of athanasius, karlgoethebier, ka +rlgoethebier2... athanasius: 4.91799 wallclock secs ( 4.88 usr + 0.00 sys = 4.88 CPU) + @ 204792.14/s (n=1000000) karlgoethebier: 20.1721 wallclock secs (20.05 usr + 0.00 sys = 20.05 +CPU) @ 49885.26/s (n=1000000) karlgoethebier2: 2.55908 wallclock secs ( 2.53 usr + 0.00 sys = 2.53 + CPU) @ 395569.62/s (n=1000000) Rate karlgoethebier athanasius karlgoethebie +r2 karlgoethebier 49885/s -- -76% -8 +7% athanasius 204792/s 311% -- -4 +8% karlgoethebier2 395570/s 693% 93% +-- 13:29 >perl 390_SoPW.pl

        The regex is now significantly faster than split-with-subscript. Interesting!

        Athanasius <°(((><contra mundum

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1004416]
help
Chatterbox?
[LanX]: Choroba: this happened before I joined, was still in uni, but my boss was summoned to the CEO of the second biggest German bank at that time and could only say " I told them its not ready" ;)
[LanX]: memories....I missed my connection while chatting
[Discipulus]: in this case Corion we are speaking about software licensing: evry year or two we must rescan the whole ced to produce an excel report, while at every activation / disactivation we update a black box DB: i said that i a week i can produce the perl to..
[Discipulus]: rend out the xls IF i have access to the DB
[choroba]: LanX I miss working in a bank sometimes...
[Corion]: Discipulus: Ooof. Especially yearly things are things I like to automate instead of trying to remember how I did things last year...
[Corion]: And the second rule that I've learned is, that there is no one-off job, so writing a program for it pays off almost immediately. The third rule is to give all my programs numbers and have them reproduce that number in the name of their output files. :)
[Discipulus]: the true part is that also specification change between years.. but well our job is cheap but dont abuse of us.. ;=)

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (15)
As of 2017-03-29 12:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Should Pluto Get Its Planethood Back?



    Results (350 votes). Check out past polls.