Re: Question on Regex

in reply to Question on Regex

You can also do this without a regex, by using split on the ‘/’ character to produce a list, and then subscripting the list to get the desired field:

#! perl
use strict;
use warnings;

my $count = 0;

for my $line (<DATA>)
{
    my $name = (split '/', $line)[4];
    print "Name #", ++$count, " is '", $name, "'\n";
}

__DATA__
.co.uk/Jobs/Company-Sector/C8A6446X4PND86M9WYJ/Tradewind/?APath=2.21.0
+.0.0
.com/Stuff/Somewhere/ABCD789/Peabody/?APath=2.0.12.1.3
.com.au/More-Stuff/Anywhere/XYZ12345/Perkins/?APath=4.5.6.7.8
[download]

Output:

 0:14 >perl 390_SoPW.pl
Name #1 is 'Tradewind'
Name #2 is 'Peabody'
Name #3 is 'Perkins'

 0:18 >
[download]

Hope that helps,

Athanasius <°(((>< contra mundum

Comment on Re: Question on Regex Select or Download Code

Replies are listed 'Best First'.
Re^2: Question on Regex by Anonymous Monk on Nov 18, 2012 at 14:26 UTC
Thanks Athanasius. I got what I was looking. Thanks a lot. :)	[reply]
Re^3: Question on Regex by karlgoethebier (Abbot) on Nov 18, 2012 at 16:41 UTC
FYI: #!/usr/bin/perl use strict; use warnings; use Benchmark qw ( :hireswallclock cmpthese timethese ); our $string = qq (.co.uk/Jobs/Company-Sector/C8A6446X4PND86M9WYJ/Trade +wind/?APath=2.21.0.0.0); sub karlgoethebier { our $string; $string =~ m/.+\/.+\/.+\/.+\/(.+)\/.+/; return $1; } sub athanasius { our $string; return (split '/', $string)[4]; } my $results = timethese (-10, { 'karlgoethebier' => 'karlgoethebier', 'athanasius' => 'athanasius', }); cmpthese($results); __END__ Karls-Mac-mini:Desktop karl$ ./tradewind.pl Benchmark: running athanasius, karlgoethebier for at least 10 CPU seco +nds... athanasius: 10.4769 wallclock secs (10.47 usr + 0.00 sys = 10.47 CPU) + @ 627362.46/s (n=6568485) karlgoethebier: 10.4287 wallclock secs (10.42 usr + 0.00 sys = 10.42 +CPU) @ 105188.77/s (n=1096067) Rate karlgoethebier athanasius karlgoethebier 105189/s -- -83% athanasius 627362/s 496% -- [download] Regards, Karl ŤThe Crux of the Biscuit is the Apostropheť	[reply] [d/l]
Re^4: Question on Regex by Athanasius (Archbishop) on Nov 19, 2012 at 03:39 UTC
Which shows that `sub athanasius` is up to 5 times faster than `sub karlgoethebier`. Even easier to see when supplying a positive `COUNT` value to `timethese`: `#! perl use strict; use warnings; use Benchmark qw( :hireswallclock cmpthese timethese ); my $string = ".co.uk/Jobs/Company-Sector/C8A6446X4PND86M9WYJ/Tradewind +/?APath=2.21.0.0.0"; cmpthese( timethese ( 1_000_000, { 'karlgoethebier' => sub { $string =~ m/.+\/.+\/.+\/.+\/(.+)\/.+/; retu +rn $1; }, 'athanasius' => sub { return (split '/', $string)[4] }, } ) );` [download] Output: `13:09 >perl 390_SoPW.pl Benchmark: timing 1000000 iterations of athanasius, karlgoethebier... athanasius: 5.19894 wallclock secs ( 5.15 usr + 0.00 sys = 5.15 CPU) + @ 194250.19/s (n=1000000) karlgoethebier: 20.2485 wallclock secs (20.08 usr + 0.00 sys = 20.08 +CPU) @ 49808.24/s (n=1000000) Rate karlgoethebier athanasius karlgoethebier 49808/s -- -74% athanasius 194250/s 290% -- 13:16 >` [download] Not really surprising, since regexen with quantifiers can be expensive: Avoid regular expressions with many quantifiers.... Such patterns can result in exponentially slow backtracking behavior unless the quantified subpatterns match on their first “pass”. — The Camel Book, 4^th Edition, p. 693. So, what happens if we limit the backtracking? `#! perl use strict; use warnings; use Benchmark qw( :hireswallclock cmpthese timethese ); my $string = ".co.uk/Jobs/Company-Sector/C8A6446X4PND86M9WYJ/Tradewind +/?APath=2.21.0.0.0"; cmpthese( timethese ( 1_000_000, { 'karlgoethebier' => sub { $string =~ m/.+\/.+\/.+\/.+\/(.+)\/.+/; retu +rn $1; }, 'karlgoethebier2' => sub { $string =~ m/.+?\/.+?\/.+?\/.+?\/(.+?)\/.+/; + return $1; }, 'athanasius' => sub { return (split '/', $string)[4] }, } ) );` [download] Result: 13:28 >perl 390_SoPW.pl Benchmark: timing 1000000 iterations of athanasius, karlgoethebier, ka +rlgoethebier2... athanasius: 4.91799 wallclock secs ( 4.88 usr + 0.00 sys = 4.88 CPU) + @ 204792.14/s (n=1000000) karlgoethebier: 20.1721 wallclock secs (20.05 usr + 0.00 sys = 20.05 +CPU) @ 49885.26/s (n=1000000) karlgoethebier2: 2.55908 wallclock secs ( 2.53 usr + 0.00 sys = 2.53 + CPU) @ 395569.62/s (n=1000000) Rate karlgoethebier athanasius karlgoethebie +r2 karlgoethebier 49885/s -- -76% -8 +7% athanasius 204792/s 311% -- -4 +8% karlgoethebier2 395570/s 693% 93% +-- 13:29 >perl 390_SoPW.pl [download] The regex is now significantly faster than `split`-with-subscript. Interesting! Athanasius <°(((>< contra mundum	[reply] [d/l] [select]
Re^5: Question on Regex by karlgoethebier (Abbot) on Nov 19, 2012 at 14:15 UTC
Re^6: Question on Regex by karlgoethebier (Abbot) on Nov 19, 2012 at 20:24 UTC
Some notes below your chosen depth have not been shown here

In Section Seekers of Perl Wisdom