Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Hello,

Here I will share a collection of examples how to use regex instead of usual loops (for, foreach, while) on one-dimensional data.
The idea is that if our array has known type of data (letters, words, numbers, and similar) and if we can reserve few characters as a separators (which our data do not contain), then we can join our array by separator to a string and then apply iterative regex search (or substitution) on it.

This text is for advanced users of regular expressions. Perldocs perlre is for your help in case. Beware of warnings met across the script.

Firstly, as the simplest case, I will show three simple lines, which operate on a simple string, character by character.
s/./ do_smth( $& ) /eg; m/.(?{ do_smth( $& ); })(*FAIL)/; do_smth( $& ) while m/./g;
  • The first case uses the substitution. It is destructive -- it changes the letters of the string. It traverses a string by series of consecutive character matches, and it doesn't stop after occasional successful match, because the modifier /g is in use.
  • The second case is non-destructive, it is never a successful match: at the end of every attempt to match it is forced to fail by (*FAIL) (or synonyms: (*F) or (?!)). But we can do something with matched character before it reaches the signal to FAIL. We use code-block (?{ }) (or (??{ })).
  • The 2nd case is an alternative to common 3rd case -- a while loop. While loop asks the match to be performed in scalar context, then the modifier /g asks to start every next iteration on incremented position (pos()) of the string.

Next I will show a self-documented script with examples. Every example is written in few different ways: in a common style for (and/or foreach) loop and in a regex-style "loop". Before looking at examples, I want to emphasize the importance of using correct border check between elements. If an element is multi-character, the regex may split it and match any substring of it, if border is not clear. Usually I use space or punctuation marks for joining array elements, so that the simple m/\b/ can be applied as a border of element (when they contain only alphanumeric characters).
Right after the code of the script, there is an OUTPUT of it. Some comments about the script is further after its OUTPUT. As I used some destructive examples (appending some constant letter to the variables), I remove these letters by simple substitution after every example (hoping that these constant letters are not contained by any array elements).
#!/usr/bin/perl use strict; use warnings; print "# The need of clear borders of the elements:\n"; print "## Without borders (wrong):\n"; "1 23 456" =~ m/\d+(?{ print "[$&]" })(*FAIL)/; print "\n"; print "## With borders (correct):\n"; "1 23 456" =~ m/\b\d+\b(?{ print "[$&]" })(*FAIL)/; print "\n"; print "## Alternative (correct):\n"; "1 23 456" =~ m/\d+(*SKIP)(?{ print "[$&]" })(*FAIL)/; print "\n"; my @A = ( 1 .. 3, 'abc', 'zz', 79, 444 ); my $A = join ',', @A; # ',' -- one reserved character; m/[,]/ and die "Elem '$_' of \@A contains separator '$&'!\n" for @A; print "# SIMPLE LOOPING through an array:\n"; print "## NON-DESTRUCTIVE:\n"; for( my $i = 0; $i < @A; $i ++ ){ print "[$A[ $i ]]"; } print "\n"; for my $A ( @A ){ print "[$A]"; } print "\n"; $A =~ m/ \b([^,]+)\b (?{ print "[$1]" }) (*FAIL) /x; print "\n"; print "## DESTRUCTIVE:\n"; for( my $i = 0; $i < @A; $i ++ ){ $A[ $i ] .= 'X'; print "[$A[ $i ]]"; } print "\n"; chop for @A; for my $A ( @A ){ $A .= 'X'; print "[$A]"; } print "\n"; chop for @A; $A =~ s/ \b([^,]+)\b / $1 . 'X' /gex; print $A =~ s/\b([^,]+)\b,?/[$1]/gr; print "\n"; $A =~ s/X//g; print "# LOOPING through an array by evaluating several (2-3) consecut +ive elements:\n"; print "## NON-DESTRUCTIVE:\n"; for( my $i = 0; $i < @A - 1; $i ++ ){ print "[$A[ $i ]-$A[ $i + 1 ]]"; } print "\n"; for my $i ( 0 .. @A - 2 ){ print "[$A[ $i ]-$A[ $i + 1 ]]"; } print "\n"; $A =~ m/ \b([^,]+)\b, \b([^,]+)\b (?{ print "[$1-$2]" }) (*FAIL) /x; print "\n"; # ---- for( my $i = 0; $i < @A - 1; $i += 2 ){ print "[$A[ $i ]-$A[ $i + 1 ]]"; } print "\n"; for my $i ( grep $_ % 2 == 0, 0 .. @A - 2 ){ print "[$A[ $i ]-$A[ $i + 1 ]]"; } print "\n"; $A =~ m/ \b([^,]+)\b,(*SKIP) \b([^,]+)\b (?{ print "[$1-$2]" }) (*FAIL) /x; print "\n"; # ---- for( my $i = 0; $i < @A - 2; $i ++ ){ print "[$A[ $i ]-$A[ $i + 1 ]-$A[ $i + 2 ]]"; } print "\n"; for my $i ( 0 .. @A - 3 ){ print "[$A[ $i ]-$A[ $i + 1 ]-$A[ $i + 2 ]]"; } print "\n"; $A =~ m/ \b([^,]+)\b, \b([^,]+)\b, \b([^,]+)\b (?{ print "[$1-$2-$3]" }) (*FAIL) /x; print "\n"; # ---- for( my $i = 0; $i < @A - 2; $i += 2 ){ print "[$A[ $i ]-$A[ $i + 1 ]-$A[ $i + 2 ]]"; } print "\n"; for my $i ( grep $_ % 2 == 0, 0 .. @A - 3 ){ print "[$A[ $i ]-$A[ $i + 1 ]-$A[ $i + 2 ]]"; } print "\n"; $A =~ m/ \b([^,]+)\b,(*SKIP) \b([^,]+)\b, \b([^,]+)\b (?{ print "[$1-$2-$3]" }) (*FAIL) /x; print "\n"; # ---- for( my $i = 0; $i < @A - 2; $i += 3 ){ print "[$A[ $i ]-$A[ $i + 1 ]-$A[ $i + 2 ]]"; } print "\n"; for my $i ( grep $_ % 3 == 0, 0 .. @A - 3 ){ print "[$A[ $i ]-$A[ $i + 1 ]-$A[ $i + 2 ]]"; } print "\n"; $A =~ m/ \b([^,]+)\b, \b([^,]+)\b,(*SKIP) \b([^,]+)\b (?{ print "[$1-$2-$3]" }) (*FAIL) /x; print "\n"; print "## DESTRUCTIVE:\n"; # ---- for( my $i = 0; $i < @A - 2; $i ++ ){ $A[ $i ] .= $A[ $i + 1 ] gt $A[ $i + 2 ] ? 'X' : 'Y'; print "[$A[ $i ]-$A[ $i + 1 ]-$A[ $i + 2 ]]"; } print "\n"; s/[XY]// for @A; for my $i ( 0 .. @A - 3 ){ $A[ $i ] .= $A[ $i + 1 ] gt $A[ $i + 2 ] ? 'X' : 'Y'; print "[$A[ $i ]-$A[ $i + 1 ]-$A[ $i + 2 ]]"; } print "\n"; s/[XY]// for @A; $A =~ s/ \b([^,]+)\b (?= ,\b([^,]+)\b ,\b([^,]+)\b ) / my $new = $1 . ( $2 gt $3 ? 'X' : 'Y' ); print "[$new-$2-$3]"; $ +new /gex; print "\n"; $A =~ s/[XY]//g; print "# 'TRIANGLE' LOOPING through an array (loop in loop):\n"; for my $i ( 0 .. @A - 2 ){ for my $j ( $i + 1 .. @A - 1 ){ printf "%10s", " [$A[ $i ]-$A[ $j ]]"; } print "\n"; } $A =~ m/ \b([^,]+)\b .*? \b([^,]+)\b (?{ printf "%10s", " [$1-$2]" }) (?=$) (?{ print "\n" }) (*FAIL) /x; print "# 'RECTANGLE' LOOPING through two arrays (loop in loop):\n"; my @B = @A[ 2 .. 4 ]; my $AB = $A . ';' . join ',', @B; # ',' and ';' -- two reserved charac +ters; m/[,;]/ and die "Elem '$_' of set { \@A, \@B } contains separator '$&' +!\n" for @A, @B; for my $i ( 0 .. @A - 1 ){ for my $j ( 0 .. @B - 1 ){ printf "%10s", " [$A[ $i ]-$B[ $j ]]"; } print "\n"; } $AB =~ m/ \b([^,]+)\b .*; .*? \b([^,]+)\b (?{ printf "%10s", " [$1-$2]" }) (?=$) (?{ print "\n" }) (*FAIL) /x;
OUTPUT:
# The need of clear borders of the elements: ## Without borders (wrong): [1][23][2][3][456][45][4][56][5][6] ## With borders (correct): [1][23][456] ## Alternative (correct): [1][23][456] # SIMPLE LOOPING through an array: ## NON-DESTRUCTIVE: [1][2][3][abc][zz][79][444] [1][2][3][abc][zz][79][444] [1][2][3][abc][zz][79][444] ## DESTRUCTIVE: [1X][2X][3X][abcX][zzX][79X][444X] [1X][2X][3X][abcX][zzX][79X][444X] [1X][2X][3X][abcX][zzX][79X][444X] # LOOPING through an array by evaluating several (2-3) consecutive ele +ments: ## NON-DESTRUCTIVE: [1-2][2-3][3-abc][abc-zz][zz-79][79-444] [1-2][2-3][3-abc][abc-zz][zz-79][79-444] [1-2][2-3][3-abc][abc-zz][zz-79][79-444] [1-2][3-abc][zz-79] [1-2][3-abc][zz-79] [1-2][3-abc][zz-79] [1-2-3][2-3-abc][3-abc-zz][abc-zz-79][zz-79-444] [1-2-3][2-3-abc][3-abc-zz][abc-zz-79][zz-79-444] [1-2-3][2-3-abc][3-abc-zz][abc-zz-79][zz-79-444] [1-2-3][3-abc-zz][zz-79-444] [1-2-3][3-abc-zz][zz-79-444] [1-2-3][3-abc-zz][zz-79-444] [1-2-3][abc-zz-79] [1-2-3][abc-zz-79] [1-2-3][abc-zz-79] ## DESTRUCTIVE: [1Y-2-3][2Y-3-abc][3Y-abc-zz][abcX-zz-79][zzX-79-444] [1Y-2-3][2Y-3-abc][3Y-abc-zz][abcX-zz-79][zzX-79-444] [1Y-2-3][2Y-3-abc][3Y-abc-zz][abcX-zz-79][zzX-79-444] # 'TRIANGLE' LOOPING through an array (loop in loop): [1-2] [1-3] [1-abc] [1-zz] [1-79] [1-444] [2-3] [2-abc] [2-zz] [2-79] [2-444] [3-abc] [3-zz] [3-79] [3-444] [abc-zz] [abc-79] [abc-444] [zz-79] [zz-444] [79-444] [1-2] [1-3] [1-abc] [1-zz] [1-79] [1-444] [2-3] [2-abc] [2-zz] [2-79] [2-444] [3-abc] [3-zz] [3-79] [3-444] [abc-zz] [abc-79] [abc-444] [zz-79] [zz-444] [79-444] # 'RECTANGLE' LOOPING through two arrays (loop in loop): [1-3] [1-abc] [1-zz] [2-3] [2-abc] [2-zz] [3-3] [3-abc] [3-zz] [abc-3] [abc-abc] [abc-zz] [zz-3] [zz-abc] [zz-zz] [79-3] [79-abc] [79-zz] [444-3] [444-abc] [444-zz] [1-3] [1-abc] [1-zz] [2-3] [2-abc] [2-zz] [3-3] [3-abc] [3-zz] [abc-3] [abc-abc] [abc-zz] [zz-3] [zz-abc] [zz-zz] [79-3] [79-abc] [79-zz] [444-3] [444-abc] [444-zz]
As you see I used C-style for in the beginning of every example. It is versatile, because we can manipulate 2nd and 3rd fields of it. However, when we operate on several consecutive elements, it consumes additional logic on correctly manipulating arrays of any length.
Note that "true"-foreach loop lacks ability to perform 'triangle' loop (it could do strict 'square' loop of one array, or strict 'rectangle' loop of two arrays). Therefore I used "indexed"-foreach loop when "true"-foreach was not able.
With (*FAIL) usually the (*SKIP) control verb is useful: it forces to skip backtracking.
Note .*? in 'triangle' loop which is non-greedy. Greediness inverts the direction of traversing elements. (Non-)greediness may be a matter for discussion on a performance speed.
Modifier /x is crucial for readability of longer regex examples.
Note that 'rectangle'-looping requires one additional separator character.
IMPORTANT: distances between elements of the array increase when the elements by themselves are longer. Therefore this method may be time-inefficient when elements of the array are e.g. long strings. But if these elements are practical numbers, they rarely exceed billions of billions (that is no longer than couple of dozens of characters each).

A word on new experimental feature from 5.36. From 'perldelta':
"You can now iterate over multiple values at a time by specifying a list of lexicals within parentheses. For example, for my ($left, $right, $gripping) (@moties) { ... }". More in: Foreach Loops.
This looks as useful option. But here are couple of limitations: 1) it creates additional undef values if the number of array elems are not divisible by number of iterators, 2) its step is constant == the number of iterators (i.e. chunks of iterators can not overlap, kinda similar to use of \G anchor in regex). But the experimental feature may change its behavior in the future.
Example code:
#!/usr/bin/perl use strict; use warnings; my @A = ( 1 .. 3, 'abc', 'zz', 79, 444 ); my $A = join ',', @A; print "# With 'undef's:\n"; for my( $i, $j, $k )( @A ){ print "[$i-$j-$k]"; } print "\n"; for( my $i = 0; $i < @A; $i += 3 ){ print "[$A[ $i ]-$A[ $i + 1 ]-$A[ $i + 2 ]]"; } print "\n"; print "Without 'undef's -- no stepping out of an array:\n"; for my $i ( grep $_ % 3 == 0, 0 .. @A - 3 ){ print "[$A[ $i ]-$A[ $i + 1 ]-$A[ $i + 2 ]]"; } print "\n"; $A =~ m/ \b([^,]+)\b, \b([^,]+)\b,(*SKIP) \b([^,]+)\b (?{ print "[$1-$2-$3]" }) (*FAIL) /x; print "\n";
OUTPUT:
for my (...) is experimental at <...> # With 'undef's: Use of uninitialized value <...> Use of uninitialized value <...> [1-2-3][abc-zz-79][444--] Use of uninitialized value <...> Use of uninitialized value <...> [1-2-3][abc-zz-79][444--] Without 'undef's -- no stepping out of an array: [1-2-3][abc-zz-79] [1-2-3][abc-zz-79]
Exercises:
  • Squeeze an array.
  • Check if an array is monotonic.
  • Count inversions (number of pairs of indices i and j, i<j, that ai>aj).
Some exercises on Codeforces.com platform:
Thank you for reading.

In reply to Using regex as an alternative to usual loops on 1D data by rsFalse

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (7)
As of 2024-04-18 11:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found