Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^2: Substitute 'bad words' with 'good words' according to lists

by sk (Curate)
on Sep 26, 2005 at 03:00 UTC ( #495004=note: print w/ replies, xml ) Need Help??


in reply to Re: Substitute 'bad words' with 'good words' according to lists
in thread Substitute 'bad words' with 'good words' according to lists

pg,

I think the original code is not necessarily inefficient.I feel the performance depends on number of words your split returns.. Here is a benchmark of the original (added  keys which was missing). I have modifed the txt to be 100 times the original one.

Again the story could be different when you have way too many replacements and fewer words.

#!/usr/bin/perl use strict; use warnings; use Benchmark qw (:all); my $txt = "ugly anotherugly " x 100; # print $txt,$/; sub pg { my %words = ( ugly => 'ug**', anotherugly => 'anot*******', ); my @words = split / /, $txt; # largely simplified, you have to cou +nt ,.:; etc for my $i (0 .. $#words) { $words[$i] = $words{$words[$i]} if (exists($words{$words[$ +i]})) } # print join(' ', @words),$/; } sub orig { my %words = ( ugly => 'ug**', anotherugly => 'anot*******', ); $txt =~ s/$_/$words{$_}/g foreach keys(%words); # print $txt,$/; } my $test = {'pg' => \&pg, 'Original' =>\&orig,}; my $result = timethese(-10,$test ); cmpthese($result);

Output

Benchmark: running Original, pg for at least 10 CPU seconds... Original: 11 wallclock secs (10.86 usr + 0.00 sys = 10.86 CPU) @ 43 +770.26/s (n=475345) pg: 11 wallclock secs (10.68 usr + 0.00 sys = 10.68 CPU) @ 43 +28.46/s (n=46228) Rate pg Original pg 4328/s -- -90% Original 43770/s 911% --

NOTE: I removed the join from your code just to show the looping differences.


Comment on Re^2: Substitute 'bad words' with 'good words' according to lists
Select or Download Code
Replies are listed 'Best First'.
Re^3: Substitute 'bad words' with 'good words' according to lists
by pg (Canon) on Sep 26, 2005 at 05:14 UTC

    You are right, and thanks for pointing out. My original analysis took the assumption that both s/// and split iterate through the sentence with the same performance, however that was wrong, and split() is much slower:

    use strict; use warnings; use Benchmark qw (:all); my $txt = "a" x 100; sub seperate { split //, $txt; } sub replace { $txt =~ s/a/b/g; } my $result = timethese(100000, {'seperate' => \&seperate, 'replace' => + \&replace});

    This gives:

    Benchmark: timing 10000 iterations of replace, seperate... replace: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) (warning: too few iterations for a reliable count) seperate: 2 wallclock secs ( 1.20 usr + 0.00 sys = 1.20 CPU) @ 83 +05.65/s (n =10000) C:\Perl\bin>perl -w math1.pl Benchmark: timing 100000 iterations of replace, seperate... replace: 1 wallclock secs ( 0.03 usr + 0.00 sys = 0.03 CPU) @ 31 +25000.00/s (n=100000) (warning: too few iterations for a reliable count) seperate: 16 wallclock secs (12.50 usr + 0.00 sys = 12.50 CPU) @ 80 +00.00/s (n =100000)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://495004]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (14)
As of 2015-07-07 18:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (93 votes), past polls