use strict;
use warnings;
use HTML::Truncate;
my $snippet = <<"";
First a technological update - The code that drives this site is avail
+able for free on <a href="https://github.com/amacks/vatican_mss">GitH
+ub</a>. I've just merged in a rather complex change to create proper
+ shelfmark sorting, fixing things like numbers-stored-as-strings and
+handling roman numerals. Two problems yet unfixed are Fonds <strong>
+P.I.O</strong>, with the middle "I" reading as a roman numeral, and <
+strong>Arch.Cap.S.Pietro</strong> where sub-set "I" is read as roman
+1 and everything gets confused.</p>
my $ht = HTML::Truncate->new();
$ht->chars(100);
print $ht->truncate($snippet), $/;
__END__
First a technological update - The code that drives this site is avail
+able for free on <a href="https://github.com/amacks/vatican_mss">GitH
+ub</a>. I've…
HTML::Truncate. Long time since I used this for anything but it helped me out with similar needs a long time ago. There are lots of low level tools to do this kind of thing but you end up having to do a lot of #text character counting and such.
And that’s why this one breaks much later than your raw substr; it’s only counting displayed characters, not HTML content.
Update: if we correct your omission of the opening paragraph tag in the input, this is the output–
<p>First a technological update - The code that drives this site is av
+ailable for free on <a href="https://github.com/amacks/vatican_mss">G
+itHub</a>. I've…</p>
|