Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Modifying muliple matched strings in text

by nysus (Parson)
on Jul 12, 2020 at 12:18 UTC ( #11119206=perlquestion: print w/replies, xml ) Need Help??

nysus has asked for the wisdom of the Perl Monks concerning the following question:

In the following piece of markdown text, I want to replace the spaces with a hyphen and also make the link absolute with a slash in front. So I want to go from:

blah blah [click me](click me) more stuff blah [link here](link here) blah blah ...

to:

blah blah [click me](/click-me) more stuff blah [link here](/link-here) blah blah ...

So this is as far as I got and I'm stymied:

#!/usr/bin/env perl use strict; use warnings; my $text = "blah blah [click me](click me) more stuff\nblah [link here +](link here) blah blah"; $text =~ s/(\[[^]]+]\()/$1\//g; # make links absolute $text =~ /(\[[^]]+]\(\/)([^\)]+)/; my $part1 = $1; my $part2 = $2; my $orig_part2 = $part2; $part2 =~ s/ /-/g; $text =~ s/\Q$part1$orig_part2\E/$part1$part2/g;

The obvious problem with this code is that only the first link gets the space replaced:

blah blah [click me](/click-me) more stuff blah [link here](/link here) blah blah

I'm drawing a blank on how I might loop through all matches to links and modify those matches. Or maybe there is an entirely better way of pulling this off. Thanks!

UPDATE:

OK, I noodled around with this some more. On a lark, I tried a "global" match (which I didn't know existed outside substitution), and came up with this:

my $text = "blah blah [click me](click me) more stuff\nblah [link here +](link here) blah blah"; $text =~ s/(\[[^]]+]\()/$1\//g; # make links absolute my @matches = $text =~ /(\[[^]]+]\(\/)([^\)]+)/g; while (@matches) { my $part1 = shift @matches; my $part2 = shift @matches; my $orig_part2 = $part2; $part2 =~ s/ /-/g; $text =~ s/\Q$part1$orig_part2\E/$part1$part2/g; }

Though it seems to work, I have a hunch this isn't ideal.

$PM = "Perl Monk's";
$MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
$nysus = $PM . ' ' . $MCF;
Click here if you love Perl Monks

Replies are listed 'Best First'.
Re: Modifying muliple matched strings in text (updated)
by haukex (Bishop) on Jul 12, 2020 at 13:36 UTC
    Though it seems to work, I have a hunch this isn't ideal.

    Of course, the best way is to use a proper parser. In this case, the original Markdown is a Perl script itself, and the regex extracted from it looks like the following (modified slightly to put it into qr// form). Of course, now there are tons of different Markdown variations and parsers, and probably several that are more robust than this "just a regex" parser, but as long as your Markdown input is simple enough, it should probably be ok. <update> There are caveats to this approach, though - for example, the following regex will also operate on code blocks! As usual, more representative sample data will result in more accurate solutions :-) </update>

    my $g_nested_brackets; $g_nested_brackets = qr{ (?> # Atomic matching [^\[\]]+ # Anything other than brackets | \[ (??{ $g_nested_brackets }) # Recursive set of nested brackets \] )* }x; my $anchors = qr{ ( # wrap whole match in $1 \[ ($g_nested_brackets) # link text = $2 \] \( # literal paren [ \t]* <?(.*?)>? # href = $3 [ \t]* ( # $4 (['"]) # quote char = $5 (.*?) # Title = $6 \5 # matching quote )? # title is optional \) ) }xs;

    I've taken this regex and modified it to modernize it a bit and only capture the things we're interested in:

    use warnings; use strict; my $anchors = qr{ (?(DEFINE) (?<nested_brackets> (?> [^\[\]]+ | \[ (?&nested_brackets) \] )* ) ) \[ (?<text> (?&nested_brackets) ) \] \( (?<link> [ \t]* <? .*? >? [ \t]* (?: (?<titlequote>['"]) .*? \k<titlequote> )? ) \) }xs; my $input = <<'END'; blah blah [click me](click me) more stuff blah [link here](link here) blah blah END my $expect = <<'END'; blah blah [click me](/click-me) more stuff blah [link here](/link-here) blah blah END (my $output = $input) =~ s{$anchors}{ my ($t, $l) = @+{qw/ text link /}; $l =~ s/\s+/-/g; "[$t](/$l)" }ge; use Test::More tests=>1; is $output, $expect;

      Holy cow, there's some really advanced regex stuff going on here I've never seen before. I will study this closely and pick up some new tricks. Thanks.

      I thought about researching a markdown parser but I figured it was easier, for now, just to roll my own before going down a big wormhole.

      $PM = "Perl Monk's";
      $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
      $nysus = $PM . ' ' . $MCF;
      Click here if you love Perl Monks

Re: Modifying muliple matched strings in text
by LanX (Sage) on Jul 12, 2020 at 13:00 UTC
    Hint:

    Call a function in the substitution part (activated by /e modifier) which returns your transformation.

    s/PATTERN/replace($1,$2)/eg

    Not sure if you even need to pass the matchgroups $1,... into the sub, but it's certainly more readable.

    Edit

    Pattern should match the links [$1]($2) (pseudo syntax).

    I recommend using the /x flag for readability.

    The transformations of $2 should happen inside the replace() sub

    HTH!

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      Very cool. Didn't know that was even possible (well probably read about it once and then forgot). Thanks.

      $PM = "Perl Monk's";
      $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
      $nysus = $PM . ' ' . $MCF;
      Click here if you love Perl Monks

Re: Modifying muliple matched strings in text
by tybalt89 (Prior) on Jul 12, 2020 at 15:01 UTC
    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11119206 use warnings; my $text = "blah blah [click me](click me) more stuff\nblah [link here +](link here) blah blah"; s!]\(\K.*?\)! '/' . $& =~ tr~ ~-~r !ge for $text; print "$text\n";

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11119206]
Approved by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2021-11-29 19:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?