Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Applying regex to each line in a record.

by haukex (Bishop)
on Oct 24, 2020 at 22:26 UTC ( #11123139=note: print w/replies, xml ) Need Help??


in reply to Applying regex to each line in a record.

Even after multiple attempts, I am at a total loss of how the "m" and "s" works for regex.
  • /m changes the meaning of ^ and $:
    • Without /m,
      • ^ matches only at the very beginning of the string. (This is the same as \A, except that \A is not affected by /m.)
      • $ matches at the very end of the string, but if the string ends with \n, it will match just before and just after this \n. (This is the same as \Z, except that \Z is not affected by /m.)
    • With /m,
      • ^ matches at the very beginning of the string, and just after any \n, except if the \n is the last character in the string. In other words, it matches at the beginning of each line within the string.
      • $ matches just before each \n, in other words before the end of every line within the string, and at the very end of the string.
  • /s changes the meaning of .:
    • Without /s, . matches anything except the newline, i.e. [^\n]. In other words, a regex of /.+/g is limited to matching one line within the string at a time.
    • With /s, . matches absolutely any character, including \n.

Note that /m and /s are completely independent of one another. Keep in mind that ^ and $ are zero-width matches - for example, this means that with $_ = "a\nb", a regex of /$/gm will match and leave the regex engine's position at before the \n, and a following regex of /./gs would then match that \n. Here is some code to play around with (try changing the lists of $strings and $regexes). As you can see, /m really only becomes important if there are multiple \n's in the string. And of course there's the WebPerl Regex Tester that visualizes this as well (modern browser required).

use warnings; use strict; use open qw/:std :utf8/; use Term::ANSIColor qw/colored/; for my $str ( "a","a\n","a\nb","a\n\nb","a\nb\nc\n","a\nb\nc\nd") { for my $regex ( '/^/g','/^/gm','/$/g','/$/gm','/./g','/./gs' ) { my $o = join( '', map { sprintf "%2s", chr( $_<0x21 ? 0x2400+$_ : $_==0x7F ? 0x2421 : $_ ) } map ord, split //, $str )." "; my @matches; eval qq{ push \@matches, [[\@-],[\@+]] while \$str=~$regex ;1} or die $@; my ($matchcnt,%matches) = (1); for my $match (@matches) { my @pos = $match->[0][0]==$match->[1][0] ? ( $match->[0][0] * 2 ) : map { $_*2+1 } $match->[0][0]..$match->[1][0]-1; for my $p (@pos) { die "overlapping matches not supported" if exists $matches{$p}; $matches{$p} = $matchcnt; } } continue { $matchcnt++ } substr($o, $_, 1) = colored(['underline'], substr($o, $_, 1)) #"<u>".substr($o, $_, 1)."</u>" # alternative for HTML for sort { $b<=>$a } keys %matches; printf "%6s: %s\n", $regex, $o; } }

Output:

  /^/g:  a 
 /^/gm:  a 
  /$/g:  a 
 /$/gm:  a 
  /./g:  a 
 /./gs:  a 
  /^/g:  a ␊ 
 /^/gm:  a ␊ 
  /$/g:  a  
 /$/gm:  a  
  /./g:  a ␊ 
 /./gs:  a  
  /^/g:  a ␊ b 
 /^/gm:  a ␊ b 
  /$/g:  a ␊ b 
 /$/gm:  a ␊ b 
  /./g:  ab 
 /./gs:  a  b 
  /^/g:  a ␊ ␊ b 
 /^/gm:  a ␊  b 
  /$/g:  a ␊ ␊ b 
 /$/gm:  a  ␊ b 
  /./g:  a ␊ ␊ b 
 /./gs:  a   b 
  /^/g:  a ␊ b ␊ c ␊ 
 /^/gm:  a ␊ b ␊ c ␊ 
  /$/g:  a ␊ b ␊ c  
 /$/gm:  a ␊ b ␊ c  
  /./g:  abc ␊ 
 /./gs:  a  b  c  
  /^/g:  a ␊ b ␊ c ␊ d 
 /^/gm:  a ␊ b ␊ c ␊ d 
  /$/g:  a ␊ b ␊ c ␊ d 
 /$/gm:  a ␊ b ␊ c ␊ d 
  /./g:  abcd 
 /./gs:  a  b  c  d 

Replies are listed 'Best First'.
Re^2: Applying regex to each line in a record.
by pritesh_ugrankar (Monk) on Oct 25, 2020 at 16:55 UTC

    Hi Haukex,

    I'm truly at a loss of words. While the code you've written here is truly advanced for me, The output is teaching me a lot.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11123139]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2020-11-30 21:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?