It's been mentioned a couple times that it would be nice if strings acted like arrays of characters.
I agree that would be great for many things, but worry about the ramifications in the details.
So, if we decide not to present a string as list magically or in a consistant manner across all the builtins, here is a light-duty idea.
Right now, we split on empty to break into a list of chars. First we need split to specify u0,u1,u2, etc. Second, if split simply produced a lazy list rather than doing it all up front, that would satisfy the people who hate using split but wish it was more direct. Third, some syntactic sugar could be used, such as a special symbol or even a named sub that takes one argument. E.g. map { whatever } unravel($x).
So, does the existance of lazy lists / generators take care of 90% of the issues that made people wish for strings-as-arrays? We can make a (lazy) array out of a string to suit the job, whenever we need.
—John
Re: Are strings lists of characters?
by Ovid (Cardinal) on Oct 17, 2002 at 18:54 UTC
|
If you really want a lazy list, could you just use an iterator? The following will return the individual characters, but only as you need them. Further, it won't do a split, but it does reverse the string internally, so a very long string may be an issue. I just hacked it together to demonstrate one strategy. It could use some clean up.
#!/usr/bin/perl -w
use strict;
sub NEXT { $_[0]->() }
sub string_to_char_iter {
my $string = shift;
$string = reverse $string;
sub { '' ne $string ? chop $string : undef }
}
my $string = join '', 'a' .. 'z';
my $iter = string_to_char_iter $string;
while ( defined ( my $char = NEXT $iter ) ) {
print "$char\n";
}
Cheers,
Ovid
Update: Just in case it's not clear, this was just some demo code. Obviously, for a string of 26 characters, creating an iterator would be overkill. Iterators are going to be more useful if you have a large amount of data that is difficult to fit into memory, such as reading from a file, or if you need to keep track of where you are in your data while reading it.
Oh, and I tweaked the code just a hair.
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats. | [reply] [d/l] |
|
That's fine for a while loop, but map wouldn't know what to do with it. That's why we need the iterator at the language level. I suppose the implementation of the iterator would not be any different, rather Perl 6 "knows" that the iterator is in fact an iterator and will use it transparently.
| [reply] |
|
Yes, but you can write your own version of map that takes a code reference as the first argument and an iterator as the second argument, thus solving your problem for Perl 5, rather than having to wait for Perl 6 to come out during Christmas :)
For more information on this, you can go to http://perl.plover.com/book/, subscribe to the mailing list and read the sample chapter. While I don't think that Dominus would mind my posting a brief code snippet to illustrate, I'm not entirely certain if that's appropriate, because he has asked that the chapter not be distributed (or even saved). As a result, I'm not entirely certain if it would be appropriate to post the code.
However, if you check it out, search for the &imap function. It seems to resolve what you're looking for. Again, I'd post it myself, but I'm not sure of what's appropriate there.
Update: I contacted Dominus via email to inquire about the appropriateness of this and he replied that his only reason for wanting to prevent distribution is to revise and correct the chapter so as to avoid error-filled drafts floating around the 'Net. Posting a snippet is therefore okay.
#!/usr/bin/perl -w
use strict;
sub NEXT { $_[0]->() }
sub imap (&$) {
my ($transform, $it) = @_;
return sub {
my $next = NEXT($it);
return unless defined $next;
return $transform->($next);
}
}
sub string_to_char_iter {
my $string = shift;
$string = reverse $string;
sub { '' ne $string ? chop $string : undef }
}
my $string = join '', 'a' .. 'z';
my $iter = string_to_char_iter $string;
my $uc_chars = imap { uc $_[0] } $iter;
while ( my $char = NEXT $uc_chars ) {
print "$char\n";
}
For that code, we pass in a subref and an iterator (which is also a sub ref. We return yet another sub reference that will apply the first subref to the value returned from the iterator. In otherwords, we use the imap() function to transform one iterator into another, getting the results that you may need.
Cheers,
Ovid
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats. | [reply] [d/l] |
|
|
|
How about wrapping Ovid's while loop in another
subroutine:
use strict;
sub NEXT { $_[0]->() }
sub string_to_char_iter {
my $string = shift;
$string = reverse $string;
sub { chop $string }
}
sub get_all {
my $iter = shift;
my (@list,$char);
push @list,$char while $char = NEXT $iter;
return @list;
}
my $string = join '', 'a' .. 'z';
my $iter = string_to_char_iter $string;
print $_,$/ for map uc, get_all($iter);
UPDATE: changed while loop to one liner to irk the Java types >:)
jeffa
L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)
| [reply] [d/l] |
|
|
|
|
While I personally miss the built in support of iterators. (I was working on a patch to Want.pm for Iterator context, but haven't had the time to finish), you can get the DWIM with iterator closures.
#!/usr/bin/perl
use warnings;
use strict;
sub char_iterator(\$){
my $str = shift;
my $count = 0;
return sub {
if (wantarray){
my ($tc,$len) = ($count, (length($$str) - $count) );
$count = length($$str);
return split('', substr($$str,$tc,$len));
}else{
return substr($$str,$count++,1);
}
}
}
my $string = join('', ('a'..'z') x 3);
my $chariter = char_iterator($string);
# Get one char at a time.
while(my $char = $chariter->() ){
print "Got $char\n";
}
# Get it in list context.
my $mapiter = char_iterator($string);
my @upper = map { uc($_)."\n" } $mapiter->();
print @upper;
update
I had a possible workaround to the "lazy evaluate" situation here. I still plan to explore this further as time permits. I'd be interested to hear your thoughts.
-Lee
"To be civilized is to deny one's nature." | [reply] [d/l] |
(jeffa) Re: Are strings lists of characters?
by jeffa (Bishop) on Oct 17, 2002 at 20:40 UTC
|
| [reply] |
Re: Are strings lists of characters?
by Aristotle (Chancellor) on Oct 18, 2002 at 03:54 UTC
|
Why split? Perl 5 can already do all that with an iterator that is context aware to boot.
$_ = join '', 'a' .. 'z';
print "$1\n" while /(.)/sg;
print map "$_\n", /(.)/sg;
Did I miss anything?
Makeshifts last the longest. | [reply] [d/l] |
|
  Did I miss anything?
print map chr."\n", unpack "C*", $_;     ?   (TIMTOWTDI :-) )
update:   and:     print chr for unpack "C*", $_
  p
| [reply] [d/l] [select] |
|
| [reply] |
Re: Are strings lists of characters?
by Juerd (Abbot) on Oct 18, 2002 at 12:34 UTC
|
package NeedsAName;
sub TIEHASH {
my ($class, $ref) = @_;
return bless \$ref, $class;
}
sub FETCH {
my ($self, $key) = @_;
return substr $$$self, $key, 1;
}
sub STORE {
my ($self, $key, $data) = @_;
return substr($$$self, $key, 1) = $data;
}
sub CLEAR {
my ($self) = @_;
$$$self = '';
}
sub DELETE {
my ($self, $key) = @_:
$self->STORE($key, '');
}
sub EXISTS {
my ($self, $key) = @_;
return $key <= length $$$self;
}
sub FIRSTKEY {
my ($self) = @_;
return length $$$self ? 0 : undef;
}
sub NEXTKEY {
my ($self, $lastkey) = @_
return length $$$self > $lastkey ? $lastkey + 1 : undef;
}
# tie %foo, 'NeedsAName', \$string;
I'm in a hurry, so I did not test and I used a hash because implementing FETCHSIZE, STORESIZE, EXTEND, PUSH, POP, SHIFT, UNSHIFT and SPLICE is too much work :)
And there's no error checking.
Hmmm... maybe I should just have said 'Why not tie?'...
- Yes, I reinvent wheels.
- Spam: Visit eurotraQ.
| [reply] [d/l] |
|
Says juerd:
sub DELETE {
my ($self, $key) = @_:
$self->STORE($key, '');
}
That, unfortunately, doesn't work well. Suppose %h is tied
to the string converted, and then you do
delete @h{2,5}. You'd like to delete the n
and the r, yielding coveted, but that's not what happens.
Instead,
Perl calls DELETE(2), which deletes the n,
leaving coverted, and then DELETE(5),
which deletes the t, not the r,
leaving covered instead of coveted.
Of course, that's not your fault, but at present it
can't really be made to work right. I was going to put
in a patch to fix this (motivated by the same problem
using delete with Tie::File) but
I haven't gotten around to it yet. The easy solution is
that if you're deleting a list of values, Perl should
delete them in order from last to first instead of from
first to last. That fixes the delete @h{2,5} problem,
but unfortunately the same problem persists with
delete @h{2,5,3}.
The patch I planned to make would allow the tied hash class
to request that Perl call a special DELETESLICE
method instead of making multiple calls to DELETE
in such cases. It would follow the same form as the
NEGATIVE_INDICES feature in the current bleadperl.
--
Mark Dominus
Perl Paraphernalia
| [reply] [d/l] |
|
package StringArray;
require Tie::Array;
use base 'Tie::Array';
sub TIEARRAY { bless $_[1], $_[0] }
sub FETCH { substr(${$_[0]}, $_[1], 1) }
sub STORE { substr(${$_[0]}, $_[1], 1) = $_[2] }
sub FETCHSIZE { length(${$_[0]}) }
sub STORESIZE { $$self = substr(${$_[0]}, 0, $_[1]) }
sub DELETE { substr(${$_[0]}, $_[1], 1) = '' }
1;
Example:#!perl -w
use StringArray;
use strict;
my $test = "Hello dolly";
my @testa;
tie @testa, 'StringArray', \$test;
print "\$#testa = $#testa\n";
print "testa[1] = $testa[1]\n";
$testa[1] = 'b';
print "testa[1] = $testa[1]\n";
map {$_ = uc $_} @testa;
print "after map: test = $test\n";
delete $testa[1];
print "after delete: test = $test\n";
push @testa, "C";
print "after push: test = $test\n";
| [reply] [d/l] [select] |
|
Nope - still the same problem.
#!/usr/bin/perl -w
use strict;
$_ = "converted";
tie my @test, 'StringArray', $_;
print map "$_\n", map {
delete @$_[2,5];
join '', grep defined, @$_;
} \@test, [ /(.)/sg ];
package StringArray;
require Tie::Array;
use base 'Tie::Array';
sub TIEARRAY { my $str = pop; bless \$str, shift }
sub FETCH { substr(${$_[0]}, $_[1], 1) }
sub STORE { substr(${$_[0]}, $_[1], 1) = $_[2] }
sub FETCHSIZE { length(${$_[0]}) }
sub STORESIZE { $$_[0] = substr(${$_[0]}, 0, $_[1]) }
sub DELETE { substr(${$_[0]}, $_[1], 1) = '' }
1;
__END__
covered
coveted
Makeshifts last the longest. | [reply] [d/l] |
Re: Are strings lists of characters?
by jepri (Parson) on Oct 18, 2002 at 14:45 UTC
|
Petruchio was expanding on this in the chatterbox a while back...but he wanted it as part of a grander plan to have polymorphic functions that worked on all data types. e.g. delete should delete entries from arrays and strings, we should be able to push and pop strings, length to work properly on arrays, etc.
So in that sense, it's a desire for the language to be a bit more consistent, rather than an implementation issue.
____________________
Jeremy
I didn't believe in evil until I dated it. | [reply] |
Re: Are strings lists of characters?
by gjb (Vicar) on Oct 20, 2002 at 17:09 UTC
|
There happens to be a CPAN module that allows one to tie a list to a string: Tie::CharArray.
I've never used it since I generally got around with substr for iterating over strings, but some tests I ran just now seem to show that it works pretty well.
There have been a few situations where I really wished I could treat strings as arrays, but since substr can be assigned to I could get around this limitation. A specific example that comes to mind is the implementation of a genetic programming algorithm where I prefered to have strings rather than lists as datatypes for the chromosomes.
Hope this helps, -gjb-
| [reply] |
Re: Are strings lists of characters?
by Aristotle (Chancellor) on Oct 20, 2002 at 17:53 UTC
|
I had to think very long and hard for a case where having strings be character arrays would offer syntatically superior, more concise ways of expressing than using substr. I have finally come up with something. Consider this:
my @string = "hubris" =~ /(.)/sg;
@string[0,2,5] = "etv" = ~ /(.)/sg;
print reverse @string;
__END__
virtue
This would be very awkward to achieve with substrs, particularly for more complex examples. (Bioinformatics might be an area where such could be useful.) But thanks to /(.)/sg expressiveness doesn't suffer much even here; the only concern I see is efficiency, if you do this a lot. But if that really is a probably, use a class with a real array in its guts and an overloaded stringification operator would probably suffice. And that one is downright trivial, something like:
sub stringify { local $"; "@{$_[0]}" }
Assignment needs to be overloaded too, I guess, and would be slightly less trivial.
All in all, I conclude that there's not much need for such a feature at the language level. A module should suffice.
Makeshifts last the longest. | [reply] [d/l] [select] |
|
A few quite common tasks map more naturally to a string-as-array approach. As an example, consider determining the common prefix (or suffix) of two strings.
It can be done by regex matching:
my @str = ('ABCD', 'ABEF');
my $str = join('-', @str);
if ($str =~ /^([A-Z]+)[A-Z]*\-\1[A-Z]*$/) {
print "common: '$1'\n";
} else {
print "no match\n";
}
but it's much more natural to do it with a simple for over the characters. Of course one can get around with substr, but it looks decidedly weird.
Regards, -gjb-
| [reply] [d/l] |
|
Way too hackish, not to mention it breaks if the chosen delimiter appears in your input strings - I'll get back to that in a bit though.
To find the common prefix, you have to iterate over two variables; be that scalars or arrays. Using a for loop:
my (@str1, @str2);
my ($i, @prefix) = (0);
for(@str1) {
last if $_ ne $str2[$i++];
push @prefix, $_;
}
Or a while loop:
my (@str1, @str2);
my ($i, @prefix) = (0);
push @prefix, $str1[$i++] while($str1[$i] eq $str2[$i]);
I'd definitely prefer the while version, simply because the arrays are treated equally. Now let's look at how you'd do that over scalars:
my ($s1, $s2) = ("ABCD", "ABEF");
my $i = 0;
$i++ while substr($s1, $i, 1) eq substr($s2, $i, 1);
my $prefix = substr $s1, 0, $i;
That's hardly any different to read, way clearer than the regex solution, shorter and more idiomatic to boot, and doesn't break regardless of input. Ok, using the ternary operator for your code would shorten the regex approach, but if anything, it would probably conceil the code's intent even further.
No, the only advantage strings-as-arrays would offer as far as I can see is for simulatenously replacing multiple non-contiguous parts of the string with parts of some other string. But then, that's such a rare circumstance that it shouldn't be unacceptably painful to just listify the strings using /(.)/sg for that, then glue the result back together.
I do see compelling reasons to syntactically extend push, shift and friends for dealing with strings (although I also see reasons not to), but definitely not for making strings fullblown arrays.
Makeshifts last the longest. | [reply] [d/l] [select] |
|
|