bsb has asked for the
wisdom of the Perl Monks concerning the following question:
Well, it's more of a recursive "for".
I've changed treemap around and called it do_rec.
It now recurses data structures composed of hashes,
arrays, scalars and their references, executing
a block/coderef with $_ aliased appropriately
so that changing $_ modifies the element.
By default, the block is only executed for leaves.
If you pass in a prototyped subroutine the prototype
is (mis)used to decide when to run your code
eg. do_rec sub(%;) {...} $data; # is run for values & HASHes
Blessing is ignored, objects are scanned (unless you "cut")
I have some biggish questions about my
implementation:
Is this (ab)use of prototypes as out-of-band info evil?
Would attributes be better?
Does $seen->{ ref($_) ? $_ : \$_} = undef;
really mark a value as seen? (demerphq?)
I use special "cut" return value to signal that this
thing should not be recursed. I'm trying to make it
impossible to accidently return it..?
Is there an easier way to keep the $_ aliasing?
I'd also like local-ized variables, for example
to build up a "1.3.4" type path, but it needs more
thought...
The code as I go home:
#!/usr/bin/perl -w
=pod
do_rec BLOCK LIST
do_rec CODEREF LIST
=head1 SYNOPSIS
# upper case your leaves
do_rec { $_ = uc $_; } $data, $structures;
# upper case your leaves (except children of arrays)
do_rec sub (*) {
$_ = uc $_;
cut if ref($_) eq 'ARRAY';
} $hashref;
=head1 DESCRIPTION
...
=head1 NOTES
Originally stolen from treemap:
http://www.perlmonks.org/index.pl?node_id=60829&displaytype=displayco
+de
Two paths to the same thing will be taken randomly
if there's an intervening hash
If there's two paths to a thing, both must be cut.
=cut
use strict;
use Carp ();
our $cut = \'cut magic cookie';
sub cut() { $cut }
sub VALUE() { 1 }
sub HASH() { 2 }
sub ARRAY() { 4 }
sub SCALAR() { 8 }
sub REF() { 16 }
sub OBJECT() { 32 }
#$ @ % * & ; \@ \% \$
my %proto_map = (
';' => VALUE,
'$' => SCALAR,
'\\' => REF,
'%' => HASH,
'@' => ARRAY,
'*' => HASH|ARRAY|SCALAR|REF|VALUE,
'&' => OBJECT,
# The proto isn't used anyway, these compile for me...
'V' => VALUE,
'S' => SCALAR,
'R' => REF,
'H' => HASH,
'A' => ARRAY,
'O' => OBJECT,
);
sub do_rec (&@) {
my $code = shift;
my $want;
if(my $proto = prototype $code) {
$want = 0;
$proto =~ s{([;$%@*&\\VSRHAO])}
{ $want |= $proto_map{ uc $1 }; '' }gei;
Carp::croak "Unknown prototype: ",prototype $code if $proto;
} else {
$want = VALUE;
}
_do_rec($code, $want, my $seen = {}, @_);
}
sub _do_rec {
my ($code, $want, $seen) = (shift, shift, shift);
for (@_) { # just one after the first;
next if (exists $seen->{$_});
my ($key, $type);
if(ref($_)) {
$key = $_; # gets stringified
# ref can only be blessed into one thing at a time (so it'
+s ok)
if (UNIVERSAL::isa($_,'HASH')) { # isa to ignore blessing
$type = HASH;
} elsif (UNIVERSAL::isa($_,'ARRAY')) {
$type = ARRAY;
} elsif (UNIVERSAL::isa($_,'SCALAR')) {
$type = SCALAR;
} elsif (UNIVERSAL::isa($_,'REF')) {
$type = REF;
}
# if they want an OBJECT, OR it in
if($want & OBJECT && m/=/) {
$type |= OBJECT;
}
} else {
$key = \$_; # reference to where value is stored
$type = VALUE;
}
$seen->{$key} = undef;
# Call the $code
if($want & $type) {
my $ret = $code->(); # pass type/ref?
next if(ref($ret) && $ret == $cut); # return cut cookie ->
+ next
}
# Recurse appropriately
if ($type & HASH) {
_do_rec($code, $want, $seen, $_) for values %$_;
# keys? do in higher level
}
elsif ($type & ARRAY) {
# could hoist nonrefs like reply on pm: (need to mark seen
+)
_do_rec($code, $want, $seen, $_) for @$_;
}
elsif ($type & (SCALAR|REF) ) {
_do_rec($code, $want, $seen, $_) for $$_;
}
}
}
####################### EXAMPLE #############################
our $data = {
'arrays' => [[ 'shared', 'not_shared' ]],
'num' => 2,
#'blahs' => \'blah blah', # mod of ro var
'blahs' => \do { my $a = 'blah blah' },
'hash' => {
'a' => 'vala',
'b' => 'valb',
'c' => bless({ qn=> 'no_qnum' },'Obj'),
},
'blessed' => bless(\{ qn=> 'that' },'Obj::Ref'),
};
# shared value
$data->{arrays} = bless(\$data->{arrays}[0][0],"Not::Scalar");
# ref to the same hash via 2 scalar refs with different blessing
$data->{another_obj} = bless(\do{ my $o = ${$data->{blessed}}},"Not::O
+bj");
use Data::Dumper;
$Data::Dumper::Purity=1;
# objects:
do_rec sub (&) { print ref($_)," " } => $data;
print "\n\n";
# all the leaves
do_rec { $_ = "#$_#"; } $data;
# (*) = everything
my $count = 1;
do_rec sub (*) {
return cut if ref($_) eq 'ARRAY';
return $_ = "=\U$_=" if !ref($_); # leaves
$_->{qnum} = $count++ if ref($_) eq 'HASH' && exists $_->{qn};
} => $data;
print Dumper $data;
print "-----------------------------------------\n\n";
# Another example
$data = [ \do{ my $s = "string"} ];
$data->[1] = \$data->[0];
$data->[2] = \do{ my $s = "last" };
do_rec { $_ = uc $_; } $data;
print Dumper $data;
print "-----------------------------------------\n\n";
do_rec sub (Obj) { print ref($_),"\n" } => $data;
__END__
Brad
Re: Recursive map Design Questions by Abigail-II (Bishop) on Oct 01, 2003 at 14:53 UTC |
$seen->{ ref($_) ? $_ : \$_} = undef
You are aware that ref can return a false value even if
its argument is a reference?
$ perl -wle 'print ref bless [] => 0'
0
You ought to check for definedness, which is easy in 5.8.1
with the defined-or patch:
$seen -> {ref () // \$_} = undef;
Abigail | [reply] [d/l] [select] |
|
$ perl -wle 'print defined ref 0'
1
$ perl -wle 'my $a; print defined ref $a'
1
$ perldoc -f ref
ref EXPR
ref Returns a true value if EXPR is a reference, false otherwise.
If EXPR is not specified, $_ will be used. The value returned
depends on the type of thing the reference is a reference to.
# Salvation?
$ perl -MScalar::Util=blessed -wle 'my $a; print defined blessed(bles
+s [] => 0)'
1
$ perl -MScalar::Util=blessed -wle 'my $a; print defined blessed($a)'
| [reply] [d/l] |
|
Yes, my bad. If the argument isn't reference, 'ref' returns
the empty string, not undef. But the first line of the
documentation you quote is provebly wrong. I'll write a patch.
Abigail
| [reply] |
|
Abigail. Are you seriously suggesting that people should cater to the possibility that references may have been blessed into a package of 0?
Firstly, it would have to be a deliberate, willful act to do it as it isn't possible (that I am aware of) to actually have a package 0;
Secondly, shouldn't bless [], 0; be considered a bug and reported for fixing?
Is there any possible legitimate use of this?
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.
| [reply] [d/l] [select] |
|
| [reply] [d/l] |
|
|
| |
|
| |
Well, people are expected to consider the possibility files
are named "0", or that files can end with "\n0" (for the
latter, there are even warnings build in in Perl).
I'd say that if you code general solutions, and that's what
the OP was doing, you should consider such possibilities.
Programs that run in a specific, or controlled environment
may not have to take such things into consideration.
Oh, and don't get the idea that a package of '0' has never
happened. The original post triggered a memory. Years ago,
a co-worker was working on some complicated template system.
Some code also dealt with 'ref' and blessed objects. And
code *did* fail because some other code (which generated
code) blessed objects in the package '0'.
OTOH, I've never encountered a supposedly text file that
ended with "\n0". That doesn't mean I won't check for
definedness when necessary.
Secondly, shouldn't bless [], 0; be considered a bug and reported for fixing?
Uhm, why? Just because it's inconvenient? Then we might as
well forbid filenames to contain spaces and Microsoft Outlook - that's inconvenient as well. ;-)
Abigail
| [reply] |
|
Re: Recursive map Design Questions by simonm (Vicar) on Oct 01, 2003 at 18:10 UTC |
Is this (ab)use of prototypes as out-of-band info evil? Would attributes be better?
Weird, but not pure evil.
I use special "cut" return value to signal that this thing should not be recursed. I'm trying to make it impossible to accidently return it..?
I think the solution you have is pretty decent; unless someone traverses over your package's symbol table or otherwise goes looking for trouble, it should be pretty darn unlikely that they produce a string that matches your unique reference's ID. | [reply] |
Re: Recursive map Design Questions by demerphq (Chancellor) on Oct 03, 2003 at 16:12 UTC |
Interesting stuff. I think you have some potential zaps and traps with how you are identifying type and handling seen items. It looks to me that something like this would cause trouble:
my $x=\"foo";
my @array=("$x",$x);
As would any object with stringification overloaded. Can you see the havoc this would cause:
package Bitch;
use overload qw("" bitch);
my $bitch="Bitch0000";
sub bitch { $bitch++ };
package main;
my ($x,$y);
$y=bless \$x,'Bitch';
$x=bless \$y,'Bitch';
Also for this type of scenario don't use UNIVERSAL. I know this goes against common advice, but ref and UNIVERSAL are not meant for type checks except in the loosest sense. Just as Abigail-II correctly pointed out that an object of class '0' would cause trouble so too would an object such as would be returned by
bless [],'HASH'
With regard to BrowserUks point on this matter I generally agree that code need not worry about such nasty tricks. If an enduser wants to do such silly things then its their own fault. But with code deliberately designed to traverse a perl data structure correctly IMO should be coded to bypass such foolishness. Especially as it would not be particularly difficult to handle these cases.
What I would do is utilize Scalar::Util (now core as of 5.8), more specifically reftype() and refaddr(). Using refaddr is much more suitable for tracking seen items, and reftype provides a neat and infalable method of determinging an objects underlying type.
Im worried about the protyping trick. I think its cool on face value just for being tricky, but I think its a bit limiting. Why couldnt one provide a hash of types and callbacks for each? Also, $cut is a problem in my eyes. I think I would prefer to just have it return true or false rather than deal with this special variable in user code. (And File::Find and friends trick of localized vars isnt the best plan either IMNSHO.)
A few last minor thoughts are that I don't like the name of the sub (sorry), i prefer 'treemap' to 'do_rec' and I prefer 'apply' to 'treemap', and a little more documentation would have been nice. :-) It took me while to come to grips with what was going on. (IMO you shouldnt have to read the treemap thread first to understand your post... But maybe i'm just hungover. :-) Overall this looks pretty interesting, and leads me to return to thinking about the overall design of Data::Dumper and Data::BFDump. Good stuff. Thanks.
:-)
---
demerphq
First they ignore you, then they laugh at you, then they fight you, then you win.
-- Gandhi
| [reply] [d/l] [select] |
|
Thanks for the Scalar::Util pointers
Im worried about the prototyping trick
I went down that path so that the leaf case could
use the bare block syntax like a map. Also I wanted
a variadic function and since the remaining args could
be anything... there's not much room. I could have
two entry functions, one for leaves and one with a hash
mapping of functions.
I do like the minimal
do {this} to @these ... Maybe 3 calling modes, leaf,
prototype hack or sub hash. I'll think some more.
$cut is a problem in my eyes.
I think I would prefer to just have it return true or false rather than deal with this special variable
Again, this was for aesthetics of the simple case.
Most of the time you won't be cutting, just tweaking,
or at least that the case I want to make easy.
(It's also a design hang over from when it was
more map-like and used the return value as replacement)
Maybe die "cut" could be an appropriate out of band
return value.
I don't like the name either. Still considering traverse,
visit, for_tree, apply. I suspect that functional
languages have a good name for this, I should research it.
| [reply] |
|
I went down that path so that the leaf case could use the bare block syntax like a map.
And I can see your point as well. You already have a wrapper function however. If the underlying implementation used a hash of callbacks then you could provide the neater codeblock style as well as the clunkier but more powerful hash of callbacks style as well. Everybody wins then.
Maybe die "cut" could be an appropriate out of band return value.
Yep, I can see that working nicely.
Cheers,
---
demerphq
First they ignore you, then they laugh at you, then they fight you, then you win.
-- Gandhi
| [reply] [d/l] |
|
|