Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Matching positions with lookarounds

by blahblah (Friar)
on Mar 22, 2004 at 15:29 UTC ( [id://338644]=perlquestion: print w/replies, xml ) Need Help??

blahblah has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I have a long string that I am trying to break into 76 character lines. The hitch is that it is only allowed to break a line on a whitespace character. I can easily break a line on any character:
$finaloutput =~ s/(.{76})/$1\r\n/g;
but finding the whitespace closest to, but less than 76 characters is making me nuts. I feel confident that there must be a regexy way to do it. I've been pouring through Mr. Friedl's great "Mastering Regular Expressions" and feel like I could do this with some lookarounds, but the incantation escapes me.
ideas?

Thanks

Replies are listed 'Best First'.
Re: Matching positions with lookarounds
by ctilmes (Vicar) on Mar 22, 2004 at 15:33 UTC
Easy Solution
by Sprenger000 (Initiate) on Mar 22, 2004 at 16:43 UTC
    If you just want to split on "whitespace as close to 76 characters as possible" then you can use:

    $finaloutput =~ s/(.{0,76}\s)/$1\r\n/g;

    This simply asks for "between zero and 76 characters followed by whitespace". Since it's a greedy match by default it will always grab the most. You may need to change the upper limit to 75 if you want it to be 76 including the space, I don't know if that's the case.
      n.b. Please drop the "\r". You should never hardcode a CR into plain text, in Perl. Let the automatic conversion from "\n" to CRLF, when printing to a filehandle without binmode applied, on a platform that wants the CRs, take care of that. "\n" is the logical end-of-line character, on any platform.

      But, that aside, even though you're well on the way, your program has a bug. It will try to add a linebreak in the last line, even if it's narrow enough to fit onto one line. Why would it do that? Because

      $_ = "Hello, world!"; /.{0,76}\s/;
      matches the space between "Hello," and "world!".

      I'd change the regexp to the following:

      s/[^\n\S]*(.{1,76})(?:\s|$)/$1\n/g;
      with the following rationale:
      • It'll match as many characters up to 76, until the end of the string (!) or to the last whitespace character in that substring, whichever is longer
      • /./ doesn't match newlines, thus it'll leave embedded short lines (ending with a "\n") unchanged, and try to match again, directly after the following newline.
      • You likely don't want leading whitespace after a wrapped line — though you probably will want to keep embedded empty lines.
      • You're not interested in autogenerated empty lines, hence the requirement for at least one character. /.{0,76}(\s|$)/g tends to match twice at the end of the string: first with a non-empty string, till the end, and then again with an empty string. BTW IMHO this is a bug — I don't think anybody actually wants this behaviour.
      But, I admit: mine doesn't quite look as easy as yours, any more. :)
      Er, that should be =~, not =, of course. Typo!
Re: Matching positions with lookarounds
by artist (Parson) on Mar 22, 2004 at 16:13 UTC
    warning: Untested
    $_ = $finaloutput; my $replace_from = ' '; my $replace_to = '\n'; my $last_position="76"; my $location = rindex(substr($_,0,$last_position),$replace_from); s/(.{0,$location})./$1$replace_to/; print;
Re: Matching positions with lookarounds
by podian (Scribe) on Mar 22, 2004 at 17:41 UTC
    This sounds like a homework question to me. I remember doing it in my first year of college.

    Any thoughts?

    Update: do you need to have a regular expression?

    If not, you need to start at column 76, go forward or backward until you find a white space.

      Oh please. If this were homework I'd be in college, and boy wouldn't THAT be great! No, amazingly I'm this dumb even years after college. ;) Guess I should have gone for that CS degree....
        Not dumb, just learning, like every other monk. If you keep finding yourself faced with "This should be easy!" problems, you might want to consider arming yourself with The Perl Cookbook from O'Reilly.

        --
        Spring: Forces, Coiled Again!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://338644]
Approved by delirium
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (4)
As of 2024-04-24 03:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found