Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

After some benchmarking...

I will use the xor approach (fast!). I stand corrected about substr(), it is much faster than unpack, so I will also use it. Last but not least, using pos() will help as well (even if the tests were ambiguous between length() and pos(), to my mind the code gains in expressiveness).

Thanks to you all!

For completeness, I post my sub below. It is the backend of another that creates relative paths for a fixlinks util.

After reading the chapter on unicode of the camel book, I conclude that this sub will do the rigth thing whatever locale, since Perl strings are either latin1 or utf8 encoded, and these are both ascii transparent, the requirement to look for a slash after deciding $pos does the trick. However, it is a shame that it cannot be generalized to a common substring function like the one I presented above.

sub SLASH() { 47 } sub _common_path { if ( $_[0] eq $_[1] ) { return( $_[0], # all path components are common "", # nothing remains of first "", # nothing remains of second ); } else { use bytes; my ( $len0, $len1 ) = ( length($_[0]), length($_[1]) ); # find the offset of the first byte that differs my $pos = $_[0] ^ $_[1]; $pos =~ m/[^\x00]/g; $pos = pos($pos) - 1; # if some bytes are common but the last one wasn't the separator # we must decide which path components are common if ( $pos > 0 && vec($_[0], ($pos - 1), 8) != SLASH ) { # check if first path is just longer than the second if ( $pos == $len1 && vec($_[0], $pos, 8) == SLASH ) { $pos++; return( substr($_[0], 0, $pos), # common path with slash substr($_[0], $pos), # extra in first "", # nothing remains of second ); } # check if second path is just longer than the first if ( $pos == $len0 && vec($_[1], $pos, 8) == SLASH ) { $pos++; return( substr($_[1], 0, $pos), # common path with slash "", # nothing remains of first substr($_[1], $pos), # extra in second ); } # otherwise, rewind until last common path component while ( $pos > 0 ) { $pos--; if ( vec($_[0], $pos, 8) == SLASH ) { $pos++; # and keep the common slash last; } } } return( substr($_[0], 0, $pos), # common path components (with slash) substr($_[0], $pos), # extra in first substr($_[1], $pos), # extra in second ); } }

In reply to Re: Common Substrings by Anonymous Monk
in thread Common Substrings by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2024-04-16 07:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found