Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

inserting html tags after X characters

by Anonymous Monk
on Nov 01, 2002 at 15:48 UTC ( [id://209756]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I am trying to break up a long character string in an html document when it is displayed on a web page by inserting a
tag after every 80 characters that occur in the document outside the pre-existing html tags. I've tried this:
s/(>[^<]*)([.\n]{80})/$1$2<BR>/sg
but the problem is that is "resets" its counter every time it encounters a html tag. Ie. If it finds 72 characters, then an html tag, I want it to be resume counting at 73 once it leaves the tag, instead it will not insert a <BR> until it finds an "unbroken" string of 80 characters. Is there way to do this with a single regular expression, or will I need multiple lines of code? Thanks.

edit (broquaint): added formatting + <code> tags

Replies are listed 'Best First'.
Re: inserting html tags after X characters
by fruiture (Curate) on Nov 01, 2002 at 16:10 UTC

    Don't try to parse HTML with simple regular expresseions. I recommend to use something like HTML::TreeBuilder to parse HTML correctly and the do your thing on the tree. This way you can be sure not to destroy valid HTML and you can for example notice BR,HR,P etc. elements to make your counter clever...

    --
    http://fruiture.de
Re: inserting html tags after X characters
by Rich36 (Chaplain) on Nov 01, 2002 at 17:52 UTC

    A simper option instead of using regular expressions to insert the breaks is to use Text::Wrap. Once you've got the string of text from the HTML tags, use Text::Wrap (setting the columns to 80 characters) on the text. You could then substitute the "\n"'s for "<br>\n" for the HTML line breaks in the string.


    «Rich36»
      Just using Text::Wrap on HTML data might not do any damage by itself -- it would just be mucking with whitespace characters.

      But doing something like s/\n/<br>\n/g; either with or without Text::Wrap would be a really bad idea. You could end up with <br> embedded inside other tags.

      Sorry, but I had to "--" that suggestion. I misunderstood your suggestion.

      update: re-read the idea, realized that you were actually suggesting using Text::Wrap on just the text data, after it was separated from the HTML tags. Right. Sorry about the misunderstanding. (The first reply still seems the better starting point.)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://209756]
Approved by Tanalis
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2024-03-29 08:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found