Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

OK, I'll try my best at explaining the "unrolling the loop" mechanism (I don't have MRE at hand, so feel free to correct me if I am blatantly wrong!):

I use this technique in 2 cases:

  • When I want to match a string between 2 1-char delimiters, but the end delimiter can be included in the string through an escape mechanism. For example a double-quoted string that can include the double-quote if backslashed. In this case the "naive" m/"[^"]*"/ would not work.
  • When I want to match a string between 2 delimiters but the end delimiter is several characters long, such as in C comments (/*comment*/). In this case /*(.*?)*/ would work but it might be slow (I think, I have not benchmarked it).

In the first case here is how you want to match:

  • the start delimiter,
  • then an "easy" string (the normal* part): anything that does not include the escape character,
  • if you find the escape character then you want to skip the next character (the special part), whatever it is, and then match the following "easy" string (the second normal*), until you get to the next escape character or the end delimiter,
  • finally you match the end delimiter

Now if you want to match a string with a multi-character end delimiter here is how to do it:

  • match the start delimiter,
  • match anything until you get to the first character of the end delimiter,
  • if that character is not followed by the rest of the end delimiter then you can keep on matching "regular" characters until the next time you find that character,
  • match the end delimiter

A potential pitfall is that you want to make sure you don't consume the characters just after the first character of the end delimiter, or things like **/ (the first character of the end delimiter is there twice in a row, once as a regular character and once as the start of the end delimiter) would not be processed properly.

I guess a couple of examples might be appropriate.

First matching a double-quoted string, double quotes can be escaped using \":

#!/bin/perl -w use strict; while( <DATA>) { next if(/^\s*#/); # skip comments in DATA chomp; # split data into string to match and expected result(s) my( $string, @expected)= split /\s*=>\s*/; while( $string=~ m{" # the start deli +miter ([^\\"]* # anything but t +he end of the string or the escape char (?:\\. # the escape + char preceeding an escaped char (any char) [^\\"]* # anything b +ut the end of the string or the escape char )*) # repeat "}gx) # the end delimi +ter { my $match= $1; my $expected= shift @expected; unless( $match eq $expected) { print "unexpected result line $.: found /$match/, expectin +g /$expected/\n"; } } } __DATA__ # string to match => expected results(s) toto "a string" tata => a string toto "a string" => a string toto "a string" tata => a string toto "a \" string" tata => a \" string toto "\" string" tata => \" string toto "a\"" tata => a\" toto "\"" tata => \" toto "\"\"" tata => \"\" toto "string 1" "string 2" tata => string 1 => string 2 toto "string 1 => toto "string 1" "string => string 1 toto "tata\\" tutu => tata\\ toto "tata\\\"" tutu => tata\\\"

And now how to match C-like comments:

#!/bin/perl -w use strict; while( <DATA>) { chomp; next if(^\s*#); # skip comments # split the data into the string to match and the expected result( +s) my( $string, @expected)= split /\s*=>\s*/; while( $string=~ m{/\* # the delimite +r ([^*]* # anything but + the beginning of the delimiter (?:\*(?!>/) # the beginn +ing of the delimiter, not preceeding the rest of the delimiter # (?!>/) +means "not before /, do not use the next char) [^/]* # anything b +ut the beginning of the delimiter )*) # repeat \*/}gx) # the end of t +he delimiter { my $match= $1; my $expected= shift @expected || ''; unless( $match eq $expected) { print "unexpected result line $.: found /$match/, expectin +g /$expected/\n"; } } } __DATA__ # string => result(s) toto => toto /*foo*/ tata => foo /*foo*/ tata => foo toto /*foo*/ => foo toto /*foo*bar*/ => foo*bar toto /*foo**/ => foo* toto /**/ => /***/ => * /*/*/ => /

In reply to Re: Unrolling the loop technique by mirod
in thread Unrolling the loop technique by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (2)
As of 2021-06-24 05:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What does the "s" stand for in "perls"? (Whence perls)












    Results (123 votes). Check out past polls.

    Notices?