Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Using split() to divide a string by length

by japhy (Canon)
on Apr 13, 2006 at 19:22 UTC ( [id://543195]=perlmeditation: print w/replies, xml ) Need Help??

I caught the tail end of a discussion on irc.freenode.net's #perl channel about how to split a string into equal-sized chunks. Some people were trying to use split() to accomplish this; one person fell prey to this:
my $string = "abcdefghi"; my @fields = split /(?=.{3})/, $string;
They expected this to mean "split $string at every location that is followed by three characters (and then skip ahead three characters!)", but what it really means is "split $string at every location that is followed by three characters". They ended up getting ("a", "b", "c", "d", "e", "f", "ghi").

So how can you use split() to do this? Someone said "Couldn't you abuse \G?", and that reminded me of the internal assignment to $_ of the string being matched against, and the resulting use of pos()! I present:

my @fields = split /(?(?{pos() % 3})(?!))/, $string;

Update: Yes, I know about unpack(), etc. This was merely presented as the most direct way to accomplish the task using split().


Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart

Replies are listed 'Best First'.
Re: Using split() to divide a string by length
by Zaxo (Archbishop) on Apr 13, 2006 at 19:39 UTC

    Another trick that works is to capture the split characters, which places them also in @fields and makes pos advance beyond them. Since all but probably the last group match, the normal split results mostly don't contain anything, so we need to filter out false elements with grep:

    my $string = join '', a..z; my @fields = grep {$_} split /(.{3})/, $string; print "@fields\n"; __END__ abc def ghi jkl mno pqr stu vwx yz

    After Compline,
    Zaxo

        Yep, I tried with defined first because that's the way I thought it worked, too. With that, the result of mine is,

        abc def ghi jkl mno pqr stu vwx yz
        Note the extra spaces, indicating that there are defined empty strings instead of undefs in those positions.

        Update: Good idea, ikegami++.

        After Compline,
        Zaxo

Re: Using split() to divide a string by length
by BrowserUk (Patriarch) on Apr 13, 2006 at 20:08 UTC

    I find unpack more suitable for this task.

    print for unpack '(A3)*', "abcdefghi";; abc def ghi

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Note: Parens requires Perl 5.8.0.

        Yep, I know. I remember it it being added.

        I also remember it from the last time you told me.

        And the time before that.

        So, what is your point?

        • I can't use parens in pack/unpack templates because it's only been available for 4 years*?
        • I shouldn't mention my preference for a solution because it's only been available for the last 8 releases?
        • Everytime I suggest a solution that uses a feature that isn't available in every build of perl, I should add a footnote that ikegami has (unnecessarily) reminded me that this feature has only been available for the last 8 releases and 4 years*?

        I know, I know. You're just "expanding knowledge".

        Perhaps you should also consider adding footnotes to all your posts that use or recommend other features that have not been around forever? Like say, the 3-arg open; or even hashes?

        (*) For the pedantic, 3 years, 8 months, 16 days 4 hours (approx. at the time of posting).


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Using split() to divide a string by length
by duff (Parson) on Apr 13, 2006 at 22:09 UTC

    I hereby propose that we patch split such that its first argument, if it's a reference to an integer, will split the string into chunk of characters each with as many chars as that integrer (except the last of course). Come on! Who's with me? :-)

    (for the humor impaired, I'm not being serious)

Re: Using split() to divide a string by length
by chibiryuu (Beadle) on Apr 13, 2006 at 20:10 UTC
    This doesn't use split, but is the first thing I think of:
    my $string = join '', 'a'..'z'; my @fields = $string =~ /.{1,3}/g; my @fields2 = grep {$a=!$a} @fields;
    Hmm, $string =~ /.{1,3}/g should even be faster than split /(?(?{pos() % 3})(?!))/, $string.  I guess not as fast as unpack, though.
      What is your grep line all about?

      Caution: Contents may have been coded under pressure.
Re: Using split() to divide a string by length
by radiantmatrix (Parson) on Apr 14, 2006 at 16:12 UTC

    I'm not sure split is the right choice for extracting fixed-length substrings. Isn't that really what substr is for (I mean, if you don't want to use unpack)?

    sub split_len { ## split_len( $chars, $string[, $limit] ) ## - splits $string into chunks of $chars chars ## - limits number of segments returned to $limit, if provided my ($chars, $string) = @_; my ($i, @result); for ($i = 0; ($i+$chars) < length($string); $i+=$chars) { last if (defined $limit && @result >= $limit); push @result, substr($string, $i, $chars); } # deal with any short remainders return @result if (defined $limit && @result >= $limit); if ($i > length($string)-$chars) { push @result, substr($string, $i); } return @result; }
    <-radiant.matrix->
    A collection of thoughts and links from the minds of geeks
    The Code that can be seen is not the true Code
    I haven't found a problem yet that can't be solved by a well-placed trebuchet
      # deal with any short remainders

      substr does it for us: "If OFFSET and LENGTH specify a substring that is partly outside the string, only the part within the string is returned". This is my version. Doesn't implement $limit (nor parameter checking) but features $start:

      sub split_len { my ($str, $start, $len) = @_; my @ret; for (my $strlen = length $str; $start <= $strlen; $start += $len) +{ push @ret, substr $str, $start, $len; } return @ret; } my $c = join '', 'a'..'z'; print "@{[ split_len $c, 0, 3 ]}\n"; print "@{[ split_len $c, 0, 4 ]}\n"; print "@{[ split_len $c, 3, 4 ]}\n"; __END__ abc def ghi jkl mno pqr stu vwx yz abcd efgh ijkl mnop qrst uvwx yz defg hijk lmno pqrs tuvw xyz

      --
      David Serrano

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://543195]
Approved by davidrw
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2024-03-29 13:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found