Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Split string using regex on \n or max line length

by Anonymous Monk
on Feb 10, 2017 at 07:47 UTC ( [id://1181636]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a script that splits strings into an array based on a specified line length. I'm using the following:
@invoice_note_lines = $invoice_data_ref->{'invoice_note'} =~ /(.{1,$in +voice_note_line_length}\W)/gms;
This works well, but I would also like to split if \n exists in the string. Basically, I want to split on \n and still provide a maximum length per string in the array. What is the simplest way to do this?

Replies are listed 'Best First'.
Re: Split string using regex on \n or max line length (updated x3)
by haukex (Archbishop) on Feb 10, 2017 at 08:04 UTC

    Hi Anonymous,

    The /s modifier means:

    Treat the string as single line. That is, change "." to match any character whatsoever, even a newline, which normally it would not match.

    So if I understand your question correctly, simply removing that modifier should do what you want:

    my $len = 20; # 345678901234567890 my $text = <<'ENDTXT'; One Two Three Four Five Six Seven Eight Nine Ten Eleven Twelve Thirteen Fourteen ENDTXT my @lines = $text =~ /(.{1,$len}\W)/gm; # remove leftover whitespace at ends of lines s/\s+$// for @lines; print "<$_>\n" for @lines; __END__ <One> <Two Three Four Five> <Six Seven> <Eight Nine Ten> <Eleven Twelve> <Thirteen Fourteen>

    Note that there's also the core module Text::Wrap that you could take a look at. Update 2: A an example of Text::Wrap that does the same thing as is similar to the above. Uncomment the tr/// operation to reflow the entire text:

    use Text::Wrap; $Text::Wrap::columns = 20; #$text=~tr/\n/ /; print wrap('', '', $text);

    Update 1: The following modification to the regex eliminates the need for the s/\s+$// for @lines; above (works because \s includes newline). Update 3: Actually, the following doesn't behave the same way the original regex does. It's hard to make an alternative suggestion without knowing what your intentions are here: do you definitely want to include one more non-word character at the end of the matched string, even whitespace, or did you perhaps mean a word boundary \b? If you could provide some sample input and expected output for different cases, and/or explain more about how you want the splitting to occur, that would help in making an alternate suggestion.

    my @lines = $text =~ /(.{1,$len})\s+/gm;

    Hope this helps,
    -- Hauke D

      Update: The following modification to the regex eliminates the need for the s/\s+$// for @lines; above (works because \s includes newline):

      I still plead for keeping the \W, best with non-greedifier i.e. \W?, so a final punctuation or hyphen is captured even though the limit is exceeded by one.

        Hi flowdy,

        You're right that my second regex didn't operate like the OP's, but if you write /(.{1,$len}\W?)/gm, then that may split in the middle of a word, like your suggestion and unlike the OP's regex. I've updated my node.

        Thanks,
        -- Hauke D

      > simply removing that modifier

      But the Best Practices tell us to always keep it!

      ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      Yes, that is very helpful. Removing the /s modifier took care of the problem. Thank you
Re: Split string using regex on \n or max line length
by flowdy (Scribe) on Feb 10, 2017 at 08:08 UTC

    Hi,

    I would do it in a string digestion loop:

    while ( my $ltd = substr $string, 0, $LIMIT, '' ) { my @parts = split /\n/, $ltd; ... # do sth. with @parts, otherwise they will be sad }

    Instead of substr, there are certainly other ways of splitting at fixed length, but this has just come to my mind first.

    Update: As $string is rewritten every iteration to another place in memory, haukex' solution performs probably better. Yet I would prefer a while loop instead, swallowing everything in an array first is seldom needed.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1181636]
Approved by gargle
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2024-04-24 07:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found