I see. If you have tags already in your data and want to skip over them then the best plan is to use a proper HTML parser.
If you cannot (or will not) use a parser then a crude plan B which works for your supplied dataset is:
#!/usr/bin/env perl
use strict;
use warnings;
use Test::More;
my @set = (
{
in => 'ABCDEFGHI',
want => "ABCD\nEFGH\nI\n"
},
{
in => '<b>ABC</b>DEFGHI',
want => "<b>ABC</b>D\nEFGH\nI\n"
},
);
plan tests => scalar @set;
my $len = 4;
for my $x (@set) {
my $i = 0;
my $out = '';
my $intag = 0;
for my $c (split (//, $x->{in})) {
$out .= $c;
$intag++ if $c eq '<';
$intag-- if $c eq '>';
next if $intag || $c eq '>';
$i++;
$out .= "\n" unless $i % $len;
}
$out .= "\n";
is ($out, $x->{want});
}
This isn't robust (and is rather C-ish for my taste) but it serves to illustrate this approach in general terms. Have fun with it.
Update: edited source for improved generality. |