Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Are strings lists of characters?

by John M. Dlugosz (Monsignor)
on Oct 17, 2002 at 18:33 UTC ( #206090=perlmeditation: print w/replies, xml ) Need Help??

It's been mentioned a couple times that it would be nice if strings acted like arrays of characters.

I agree that would be great for many things, but worry about the ramifications in the details.

So, if we decide not to present a string as list magically or in a consistant manner across all the builtins, here is a light-duty idea.

Right now, we split on empty to break into a list of chars. First we need split to specify u0,u1,u2, etc. Second, if split simply produced a lazy list rather than doing it all up front, that would satisfy the people who hate using split but wish it was more direct. Third, some syntactic sugar could be used, such as a special symbol or even a named sub that takes one argument. E.g. map { whatever } unravel($x).

So, does the existance of lazy lists / generators take care of 90% of the issues that made people wish for strings-as-arrays? We can make a (lazy) array out of a string to suit the job, whenever we need.


Replies are listed 'Best First'.
Re: Are strings lists of characters?
by Ovid (Cardinal) on Oct 17, 2002 at 18:54 UTC

    If you really want a lazy list, could you just use an iterator? The following will return the individual characters, but only as you need them. Further, it won't do a split, but it does reverse the string internally, so a very long string may be an issue. I just hacked it together to demonstrate one strategy. It could use some clean up.

    #!/usr/bin/perl -w use strict; sub NEXT { $_[0]->() } sub string_to_char_iter { my $string = shift; $string = reverse $string; sub { '' ne $string ? chop $string : undef } } my $string = join '', 'a' .. 'z'; my $iter = string_to_char_iter $string; while ( defined ( my $char = NEXT $iter ) ) { print "$char\n"; }


    Update: Just in case it's not clear, this was just some demo code. Obviously, for a string of 26 characters, creating an iterator would be overkill. Iterators are going to be more useful if you have a large amount of data that is difficult to fit into memory, such as reading from a file, or if you need to keep track of where you are in your data while reading it.

    Oh, and I tweaked the code just a hair.

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      That's fine for a while loop, but map wouldn't know what to do with it. That's why we need the iterator at the language level. I suppose the implementation of the iterator would not be any different, rather Perl 6 "knows" that the iterator is in fact an iterator and will use it transparently.

        Yes, but you can write your own version of map that takes a code reference as the first argument and an iterator as the second argument, thus solving your problem for Perl 5, rather than having to wait for Perl 6 to come out during Christmas :)

        For more information on this, you can go to, subscribe to the mailing list and read the sample chapter. While I don't think that Dominus would mind my posting a brief code snippet to illustrate, I'm not entirely certain if that's appropriate, because he has asked that the chapter not be distributed (or even saved). As a result, I'm not entirely certain if it would be appropriate to post the code.

        However, if you check it out, search for the &imap function. It seems to resolve what you're looking for. Again, I'd post it myself, but I'm not sure of what's appropriate there.

        Update: I contacted Dominus via email to inquire about the appropriateness of this and he replied that his only reason for wanting to prevent distribution is to revise and correct the chapter so as to avoid error-filled drafts floating around the 'Net. Posting a snippet is therefore okay.

        #!/usr/bin/perl -w use strict; sub NEXT { $_[0]->() } sub imap (&$) { my ($transform, $it) = @_; return sub { my $next = NEXT($it); return unless defined $next; return $transform->($next); } } sub string_to_char_iter { my $string = shift; $string = reverse $string; sub { '' ne $string ? chop $string : undef } } my $string = join '', 'a' .. 'z'; my $iter = string_to_char_iter $string; my $uc_chars = imap { uc $_[0] } $iter; while ( my $char = NEXT $uc_chars ) { print "$char\n"; }

        For that code, we pass in a subref and an iterator (which is also a sub ref. We return yet another sub reference that will apply the first subref to the value returned from the iterator. In otherwords, we use the imap() function to transform one iterator into another, getting the results that you may need.


        Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

        How about wrapping Ovid's while loop in another subroutine:
        use strict; sub NEXT { $_[0]->() } sub string_to_char_iter { my $string = shift; $string = reverse $string; sub { chop $string } } sub get_all { my $iter = shift; my (@list,$char); push @list,$char while $char = NEXT $iter; return @list; } my $string = join '', 'a' .. 'z'; my $iter = string_to_char_iter $string; print $_,$/ for map uc, get_all($iter);
        UPDATE: changed while loop to one liner to irk the Java types >:)


        (the triplet paradiddle with high-hat)
        While I personally miss the built in support of iterators. (I was working on a patch to for Iterator context, but haven't had the time to finish), you can get the DWIM with iterator closures.
        #!/usr/bin/perl use warnings; use strict; sub char_iterator(\$){ my $str = shift; my $count = 0; return sub { if (wantarray){ my ($tc,$len) = ($count, (length($$str) - $count) ); $count = length($$str); return split('', substr($$str,$tc,$len)); }else{ return substr($$str,$count++,1); } } } my $string = join('', ('a'..'z') x 3); my $chariter = char_iterator($string); # Get one char at a time. while(my $char = $chariter->() ){ print "Got $char\n"; } # Get it in list context. my $mapiter = char_iterator($string); my @upper = map { uc($_)."\n" } $mapiter->(); print @upper;

        update I had a possible workaround to the "lazy evaluate" situation here. I still plan to explore this further as time permits. I'd be interested to hear your thoughts.


        "To be civilized is to deny one's nature."
(jeffa) Re: Are strings lists of characters?
by jeffa (Bishop) on Oct 17, 2002 at 20:40 UTC
    This reply is not an answer to your question, but instead another question. One of the first things that struck out and hit me from reading the Cookbook was the recipe for treating a string like an array of characters. Code was given, but with the caveat "don't do that." Why? Because you don't need to in Perl. I am curious to see some arguments that insist we need to treat strings as arrays in Perl. Are regexes that daunting?


    (the triplet paradiddle with high-hat)
Re: Are strings lists of characters?
by Aristotle (Chancellor) on Oct 18, 2002 at 03:54 UTC
    Why split? Perl 5 can already do all that with an iterator that is context aware to boot.
    $_ = join '', 'a' .. 'z'; print "$1\n" while /(.)/sg; print map "$_\n", /(.)/sg;
    Did I miss anything?

    Makeshifts last the longest.

        Did I miss anything?

      print map chr."\n", unpack "C*", $_;     ?   (TIMTOWTDI :-) )

      update:   and:     print chr for unpack "C*", $_

        Now put that in a while loop condition.

        Makeshifts last the longest.

Re: Are strings lists of characters?
by Juerd (Abbot) on Oct 18, 2002 at 12:34 UTC

    package NeedsAName; sub TIEHASH { my ($class, $ref) = @_; return bless \$ref, $class; } sub FETCH { my ($self, $key) = @_; return substr $$$self, $key, 1; } sub STORE { my ($self, $key, $data) = @_; return substr($$$self, $key, 1) = $data; } sub CLEAR { my ($self) = @_; $$$self = ''; } sub DELETE { my ($self, $key) = @_: $self->STORE($key, ''); } sub EXISTS { my ($self, $key) = @_; return $key <= length $$$self; } sub FIRSTKEY { my ($self) = @_; return length $$$self ? 0 : undef; } sub NEXTKEY { my ($self, $lastkey) = @_ return length $$$self > $lastkey ? $lastkey + 1 : undef; } # tie %foo, 'NeedsAName', \$string;

    I'm in a hurry, so I did not test and I used a hash because implementing FETCHSIZE, STORESIZE, EXTEND, PUSH, POP, SHIFT, UNSHIFT and SPLICE is too much work :)

    And there's no error checking.

    Hmmm... maybe I should just have said 'Why not tie?'...

    - Yes, I reinvent wheels.
    - Spam: Visit eurotraQ.

      Says juerd:
      sub DELETE { my ($self, $key) = @_: $self->STORE($key, ''); }
      That, unfortunately, doesn't work well. Suppose %h is tied to the string converted, and then you do delete @h{2,5}. You'd like to delete the n and the r, yielding coveted, but that's not what happens. Instead, Perl calls DELETE(2), which deletes the n, leaving coverted, and then DELETE(5), which deletes the t, not the r, leaving covered instead of coveted.

      Of course, that's not your fault, but at present it can't really be made to work right. I was going to put in a patch to fix this (motivated by the same problem using delete with Tie::File) but I haven't gotten around to it yet. The easy solution is that if you're deleting a list of values, Perl should delete them in order from last to first instead of from first to last. That fixes the delete @h{2,5} problem, but unfortunately the same problem persists with delete @h{2,5,3}.

      The patch I planned to make would allow the tied hash class to request that Perl call a special DELETESLICE method instead of making multiple calls to DELETE in such cases. It would follow the same form as the NEGATIVE_INDICES feature in the current bleadperl.

      Mark Dominus
      Perl Paraphernalia
      package StringArray; require Tie::Array; use base 'Tie::Array'; sub TIEARRAY { bless $_[1], $_[0] } sub FETCH { substr(${$_[0]}, $_[1], 1) } sub STORE { substr(${$_[0]}, $_[1], 1) = $_[2] } sub FETCHSIZE { length(${$_[0]}) } sub STORESIZE { $$self = substr(${$_[0]}, 0, $_[1]) } sub DELETE { substr(${$_[0]}, $_[1], 1) = '' } 1;
      #!perl -w use StringArray; use strict; my $test = "Hello dolly"; my @testa; tie @testa, 'StringArray', \$test; print "\$#testa = $#testa\n"; print "testa[1] = $testa[1]\n"; $testa[1] = 'b'; print "testa[1] = $testa[1]\n"; map {$_ = uc $_} @testa; print "after map: test = $test\n"; delete $testa[1]; print "after delete: test = $test\n"; push @testa, "C"; print "after push: test = $test\n";
        Nope - still the same problem.
        #!/usr/bin/perl -w use strict; $_ = "converted"; tie my @test, 'StringArray', $_; print map "$_\n", map { delete @$_[2,5]; join '', grep defined, @$_; } \@test, [ /(.)/sg ]; package StringArray; require Tie::Array; use base 'Tie::Array'; sub TIEARRAY { my $str = pop; bless \$str, shift } sub FETCH { substr(${$_[0]}, $_[1], 1) } sub STORE { substr(${$_[0]}, $_[1], 1) = $_[2] } sub FETCHSIZE { length(${$_[0]}) } sub STORESIZE { $$_[0] = substr(${$_[0]}, 0, $_[1]) } sub DELETE { substr(${$_[0]}, $_[1], 1) = '' } 1; __END__ covered coveted

        Makeshifts last the longest.

Re: Are strings lists of characters?
by jepri (Parson) on Oct 18, 2002 at 14:45 UTC
    Petruchio was expanding on this in the chatterbox a while back...but he wanted it as part of a grander plan to have polymorphic functions that worked on all data types. e.g. delete should delete entries from arrays and strings, we should be able to push and pop strings, length to work properly on arrays, etc.

    So in that sense, it's a desire for the language to be a bit more consistent, rather than an implementation issue.

    I didn't believe in evil until I dated it.

Re: Are strings lists of characters?
by gjb (Vicar) on Oct 20, 2002 at 17:09 UTC

    There happens to be a CPAN module that allows one to tie a list to a string: Tie::CharArray.

    I've never used it since I generally got around with substr for iterating over strings, but some tests I ran just now seem to show that it works pretty well.

    There have been a few situations where I really wished I could treat strings as arrays, but since substr can be assigned to I could get around this limitation. A specific example that comes to mind is the implementation of a genetic programming algorithm where I prefered to have strings rather than lists as datatypes for the chromosomes.

    Hope this helps, -gjb-

Re: Are strings lists of characters?
by Aristotle (Chancellor) on Oct 20, 2002 at 17:53 UTC

    I had to think very long and hard for a case where having strings be character arrays would offer syntatically superior, more concise ways of expressing than using substr. I have finally come up with something. Consider this:

    my @string = "hubris" =~ /(.)/sg; @string[0,2,5] = "etv" = ~ /(.)/sg; print reverse @string; __END__ virtue
    This would be very awkward to achieve with substrs, particularly for more complex examples. (Bioinformatics might be an area where such could be useful.) But thanks to /(.)/sg expressiveness doesn't suffer much even here; the only concern I see is efficiency, if you do this a lot. But if that really is a probably, use a class with a real array in its guts and an overloaded stringification operator would probably suffice. And that one is downright trivial, something like: sub stringify { local $"; "@{$_[0]}" }

    Assignment needs to be overloaded too, I guess, and would be slightly less trivial.

    All in all, I conclude that there's not much need for such a feature at the language level. A module should suffice.

    Makeshifts last the longest.

      A few quite common tasks map more naturally to a string-as-array approach. As an example, consider determining the common prefix (or suffix) of two strings.

      It can be done by regex matching:

      my @str = ('ABCD', 'ABEF'); my $str = join('-', @str); if ($str =~ /^([A-Z]+)[A-Z]*\-\1[A-Z]*$/) { print "common: '$1'\n"; } else { print "no match\n"; }
      but it's much more natural to do it with a simple for over the characters. Of course one can get around with substr, but it looks decidedly weird.

      Regards, -gjb-

        Way too hackish, not to mention it breaks if the chosen delimiter appears in your input strings - I'll get back to that in a bit though.

        To find the common prefix, you have to iterate over two variables; be that scalars or arrays. Using a for loop:

        my (@str1, @str2); my ($i, @prefix) = (0); for(@str1) { last if $_ ne $str2[$i++]; push @prefix, $_; }
        Or a while loop:
        my (@str1, @str2); my ($i, @prefix) = (0); push @prefix, $str1[$i++] while($str1[$i] eq $str2[$i]);
        I'd definitely prefer the while version, simply because the arrays are treated equally. Now let's look at how you'd do that over scalars:
        my ($s1, $s2) = ("ABCD", "ABEF"); my $i = 0; $i++ while substr($s1, $i, 1) eq substr($s2, $i, 1); my $prefix = substr $s1, 0, $i;

        That's hardly any different to read, way clearer than the regex solution, shorter and more idiomatic to boot, and doesn't break regardless of input. Ok, using the ternary operator for your code would shorten the regex approach, but if anything, it would probably conceil the code's intent even further.

        No, the only advantage strings-as-arrays would offer as far as I can see is for simulatenously replacing multiple non-contiguous parts of the string with parts of some other string. But then, that's such a rare circumstance that it shouldn't be unacceptably painful to just listify the strings using /(.)/sg for that, then glue the result back together.

        I do see compelling reasons to syntactically extend push, shift and friends for dealing with strings (although I also see reasons not to), but definitely not for making strings fullblown arrays.

        Makeshifts last the longest.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://206090]
Approved by Ovid
Front-paged by hsmyers
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (6)
As of 2018-01-19 10:13 GMT
Find Nodes?
    Voting Booth?
    How did you see in the new year?

    Results (217 votes). Check out past polls.