Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Fast Replacement

by sathishselvam (Initiate)
on Jun 14, 2013 at 03:13 UTC ( #1038877=perlquestion: print w/ replies, xml ) Need Help??
sathishselvam has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, Is there any way to speed up the substitution.

while ( index($group, "!") > -1 and $index<50000 ) { $group =~ s/!/\n/; index++; }
The above replace line is in loop which executes 50,000 times and replace the '!' symbol everytime. So it takes around 10 mins and also the input string is so large. Thanks in advance.

Comment on Fast Replacement
Download Code
Re: Fast Replacement
by davido (Archbishop) on Jun 14, 2013 at 04:22 UTC

    If I'm reading correctly (Update: I wasn't reading correctly), you're substituting all occurrences of "!" with a "\n" newline as long as it falls within the portion of the string that comes before the 50,000th position.

    substr( $group, 0, 50000 ) =~ tr/!/\n/;

    That's about the best I can come up with. You're using substr as an lvalue so that the change propagates back to $group, but is constrained to just the range specified. And you're using tr/// which is faster than s/// for single-character transliteration, where a search pattern isn't required.

    Update: Bah, I can see already that I misread what you're doing. Looks more like you want to replace the first 50k "!" characters with newlines, not all "!" characters that reside in the first 50k positions. Pardon me. ;)

    Update 2: Here's a version that will substitute all '!' characters with \n, up to 50k times. After that, it will no longer match. It will be faster, but not necessarily legible:

    $group =~ s/!(??{ ( $myregexp::count++ < 50000 ) ? '' : '(?!)' })/\n/g +;

    Dave

Re: Fast Replacement
by gurpreetsingh13 (Scribe) on Jun 14, 2013 at 04:44 UTC
    Assuming you want to replace all occurrences within the file, why not just use perl command line.
    perl -pi -e 's/!/\n/g' <filename>
Re: Fast Replacement
by muba (Priest) on Jun 14, 2013 at 05:21 UTC

    The reason it is slow is that for every call to index, Perl has to go through the process of checking each character in the string whether it is an "!". Again and again and again. So you could use the $index variable to tell Perl not to bother about the characters it has already checked:

    while ( index($group, "!", $index) > -1 and $index<50000 ) {

    Alternatively, because TIMTOWTDI:

    use strict; use warnings; my $string = "abc!def!ghi!jkl!mno!pqr!stu!vwx!yz"; my $limit = 3; $string = join("\n", split(/!/, $string, $limit)); print $string;
Re: Fast Replacement
by hdb (Prior) on Jun 14, 2013 at 06:32 UTC

    The call to index returns the position of the found character, so you can replace it directly using substr if you capture the output from index. Additionally, as pointed out by muba, you do not need to start from the beginning every time, but save time by starting from the last position found.

    use strict; use warnings; my $group = "a!" x 50001; my $count = 0; my $pos = 0; substr $group, $pos, 1, "\n" while( ($pos = index( $group, "!", $pos ) +) > -1 and $count++ < 50000 ); print $group;
Re: Fast Replacement
by choroba (Abbot) on Jun 14, 2013 at 07:26 UTC
    If you are reading $group from a file, you can probably replace the exclamation marks when reading the file already. Something like
    { local $/ = '!'; while (<>) { chomp; print "$_\n"; last if 50_000 <= $.; } } print <>;
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Fast Replacement (0.000025s)
by BrowserUk (Pope) on Jun 14, 2013 at 10:20 UTC

    Try it this way. This replaces all the '!'s in the first 50k bytes of the string with newlines in 25 microseconds:

    $x = '1234!' x 11000;; say length $x;; 55000 $t=time; substr( $x, 0, 50e3 ) =~ tr[!][\n]; printf "%.9f\n", time() - $t;; 0.000025034

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
    .

      While normally I greatly respect your insight, appreciate your input, and value your code, in this case I feel I have to point out that sathishselvam doesn't seem to want to replace any "!" occuring in the first 50k bytes of the input string, but rather he wants to replace the first 50k occurances of "!". davido seems to agree with me on this one.

        Looking again I see you're right.

        But still, rather than invoking the regex engine 50,000 times, better to search for the position of the 50,000th ! and then replace in one pass.

        #! perl -slw use strict; use Time::HiRes qw[ time ]; my $s = '1234!' x 55e3; my $start = time; my( $p, $c ) = ( 0, 50e3 ); 1 while --$c and $p = 1+ index $s, '!', $p; substr( $s, 0, $p ) =~ tr[!][\n]; printf "Took %f seconds\n", time() - $start; __END__ C:\test>junk71;; Took 0.011771 seconds C:\test>junk71;; Took 0.009690 seconds

        That could probably be sped up with a binary chop for the position, but it hardly seems worth it.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1038877]
Approved by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2014-12-26 10:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (171 votes), past polls