Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

I think this could come down to a few things. One of the most important IMO is the size of buffer being used. For the perl variant you read a line and replace it. Assuming your file is a "standardish" text file its going to have rows of around a hundred chars. IIRC in substitutions perl allocates a buffer slightly larger than is needed so that it doesnt have to grow the output buffer on every replace, so its quite possible that on a single line it only actually does the malloc/realloc once. Even if it does it multiple times on most O/S'es this wont involve a buffer copy as the malloc implementation will most likely be able to simply resize the allocated buffer.

Making things more interesting your regex is not actually going to be handled by the regex engine. It will be handled by a subset of it that is used to implement index(). More specifically it will be done via boyer-moore matching. This means that it can potentially determine if the string is present in the line in less char comparisons than there are in the line. In fact s/word/words/ is more like the following code:

sub replace { my ($str,$find,$repl,$pos)=@_; $pos||=0; while ((my $index=index($str,$find,$pos))>=0) { substr($str,$index,length($find))=$repl; $pos+=length($repl); } return $str; }

Contrast this situation to a word processor scenario. Most likely it doesn't do the boyer-moore optimisation, most likely its working on buffers that are larger than a single line, and there are good odds that the lines are stored in a way that makes the substitution less efficient anyway as word processors tend not to store their data in a way that would not be efficient for S&R but would be efficient for insert/delete operations. IE, if each char were actually a link in a double linked list each insertion/deletion event would require updating four pointers, a cost that the perl s/// doesnt have to incur.

It wouldn't surprise me if you found considerable difference in run time when you worked against a single buffer containing the full file, or used a different pattern for your comparisons.

---
$world=~s/war/peace/g


In reply to Re: Word replace - notetab light vs perl by demerphq
in thread Word replace - notetab light vs perl by kiat

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (5)
As of 2024-03-19 10:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found