Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Removing space at end of field and converting to text

by htmanning (Scribe)
on Apr 12, 2012 at 23:27 UTC ( #964832=perlquestion: print w/ replies, xml ) Need Help??
htmanning has asked for the wisdom of the Perl Monks concerning the following question:

Monks, I have a form that people enter large amounts of text in. I convert any line breaks to HTML line break tags before writing to the file. Most people end up hitting enter several times at the end of the field and I'm unable to strip all of those line breaks out, so I end up with four or five HTML line breaks at the end. I've tried this but it doesn't work.
$field =~ s/\\n\\n\\n/<br\>/;
How can I strip whatever whitespace and line breaks happen at the end of the text. Also, a lot of people copy and paste directly from Word into the form (even though we ask them not to). I strip out what I can, but there's always some characters I haven't thought of. Is there a way to auto convert to text so it doesn't screw up my rss feeds? Thanks.

Comment on Removing space at end of field and converting to text
Download Code
Re: Removing space at end of field and converting to text
by GrandFather (Cardinal) on Apr 12, 2012 at 23:37 UTC

    Read perlretut for a refresher on regular expressions. You can use $field =~ s/\n+/<br>/g; to replace runs of newlines with single break tags which may be what you want. Note that you probably want to use \n instead of \\n to replace newline characters instead of pairs of \ and n characters.

    Extra newlines should not affect how HTML/XML is parsed so the final rendered text should not be affected by "extra" white space including newline characters.

    True laziness is hard work
Re: Removing space at end of field and converting to text
by Riales (Hermit) on Apr 12, 2012 at 23:41 UTC

    This might work for the first problem:

    $field =~ s/(?:\\n)+$/<br\>/;

    EDIT: Forgot to note that this only works if your newline characters really are being seen at that point as \\n and not simply \n.

    As for your second question, I suspect there may be a CPAN module that could help, but don't know of one specifically...

Re: Removing space at end of field and converting to text
by GrandFather (Cardinal) on Apr 13, 2012 at 00:02 UTC

    For your second problem it may be that the best approach is to use a white list - a list of characters that are OK. You can use the translate operator tr (see Quote Like Operators - look for tr/ ) to delete all characters from a string that are not in a given list.

    True laziness is hard work
Re: Removing space at end of field and converting to text
by dineen5214 (Initiate) on Apr 13, 2012 at 01:37 UTC
    With the copy from Windows Word into a form passed into a backend script, there might be a combination like "\r\n". So, multiple \n\n might not be valid in the search. If this is the case, then trying looking for a pair combination of \r\n\r\n .. so a sequenced pair of \r\n\r\n would require a replacement for a HTML line break \n
    \n. see: http://www.perlmonks.org/?node_id=549385
Re: Removing space at end of field and converting to text
by ww (Bishop) on Apr 13, 2012 at 02:07 UTC
    "a lot of people copy and paste directly from Word... "

    GrandFather's suggestion that you use a whitelist and tr/// to remove anything NOT on the whitelist is a decent solution, assuming you don't have any need for chars outside printable ASCII.

    An alternate -- draconian but even less INsecure (all other things being equal) -- is to flat out reject any entry which contains (for example) MS Word artifacts -- smart quotes, and so forth -- and return the poster to the data entry form with an explanation that your form accepts only text. (Posters need not be excessively inconvenienced: any M$ Word doc can be saved as text within Word itself and when copy-pasted from a text version should contain none of the chars which "screw up (your) RSS feeds."

Re: Removing space at end of field and converting to text
by tangent (Deacon) on Apr 13, 2012 at 10:48 UTC
    Assuming you don't want any linebreaks or whitespace at the end of the field, why not just:
    $field =~ s/\s+$//;
    For linebreaks in the middle of the text, and coming from different machines, this works for me:
    if ($field =~ m/\r/) { $field =~ s/\n//g; $field =~ s/\r/<br>/g; } elsif ($field =~ m/\n/) { $field =~ s/\n/<br>/g; }
Re: Removing space at end of field and converting to text
by Anonymous Monk on Apr 13, 2012 at 13:16 UTC
    When speed matters: $field = unpack 'A*', $field
      Thanks for all the suggestions!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://964832]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2014-09-03 02:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (35 votes), past polls