Grabbing Variable Names

vbrtrmn has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Grabbing Variable Names by theorbtwo (Prior) on Jan 11, 2003 at 22:29 UTC
Both pfaut and cecil36 got part of the answer. The problem with pfaut's answer is that the symbol table (which, BTW, is in %main::, not %$main::) only lists non-my variables. Lexicals are in what's called a "scratchpad" which is much less easy to access from perl code. The problem with cecil36's answer is that perl is very difficult to parse, and since perl offers amazing introspection, you might as well use it instead of reimplementing perl. The easiest answer is to use B::Xref, which will give you a nice cross-reference of where all variables are used, defined, etc. (That's actualy more information then you wanted. I assume it's fairly easy to change the output format, though.) B::Xref, like the other B modules, looks through the internal bytecode generated by perl, so thus reuses perl's parser, meaning your gaurenteed to get the same interpratition as actualy running the code. (This doesn't mean that it's impossible to fool, just that it's more difficult. Using symbolic references will fool anything that doesn't profile the running code, as will `eval STRING`. Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).	[reply] [d/l]
Re^2: Grabbing Variable Names by diotalevi (Canon) on Jan 12, 2003 at 05:57 UTC
B::Xref, like the other B modules, looks through the internal bytecode generated by perl, so thus reuses perl's parser Oh I hate that mischaracterization. Perl doesn't normally have anything to do with bytecode. When the perl compiler parses perl (between toke.c and perly.y) it just constructs a bunch of C structs in memory. That's it. Those structs are the OP codes people sometimes mention. They're named stuff like enter, leave, const, padsv, print, etc. Perl executes by following these things around and occasionally triggering a C routine or two. The only reason people ever mention bytecode is when making some Java comparison or when attempting to use B::Bytecode. Perl's bytecode is just a serialization of the opcode tree. It's also not a particularly effective hack either - (it's never worked for me). So please, don't go on about Perl's bytecode. It doesn't use any. Unless you really mean to force the issue but that doesn't count because it's just loading another module. It's not actually 'perl'	[reply]
Re: Re^2: Grabbing Variable Names by theorbtwo (Prior) on Jan 12, 2003 at 07:28 UTC
I stand corrected. Though, actualy, I think people use the term "bytecode" somwhat incorrectly (and I say only somewhat) because the B::* tree is named B-for-Bytecode (OK, it's actualy not, it's B-for-Backend, but I think it's confusingly similar), and it's original purpose was that ineffective hack. In any case, do you have a better term then bytecode, other then "opcode tree", which seems a little unwieldy? Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).	[reply]
Re^4: Grabbing Variable Names by diotalevi (Canon) on Jan 12, 2003 at 16:32 UTC
Re^4: Grabbing Variable Names by Aristotle (Chancellor) on Jan 12, 2003 at 14:55 UTC
Re: Grabbing Variable Names by ihb (Deacon) on Jan 12, 2003 at 00:14 UTC
As pointed out in previous answers this issue is two totally separate subissues depending on whether you want to find lexicals or globals. For people that know how perl is built up with symbol tables and typeglobs this is an easy match. But I've found that symtables (for short) and typeglobs are something confusing and hard-to-grasp for many. Personally I had to read an extra (non-perldoc) document or two to understand them when I first fiddled with them. perldata certainly wasn't enough. Neither was perlref or perlmod. The documentation needed to solve this problem is so scattered over different documents in perldoc I figured I'd cook up something and give a little step-by-step explanation. It could perhaps be better, but hopefully this gives you some sort of understanding of how symtables and typeglobs can be used in practice. (After rereading my own post I realized that this turned out to a very compact mini-tutorial on symtables and especially typeglobs. That wasn't intentional. I just kept filling in with explanations as I felt necessary. It's quite possible that it got a little bit too dense.) # We define this subroutine anonymously and store it in a lexical # scalar to avoid it getting found by itself. my $find_globs; $find_globs = sub { my ($pkg) = @_; my @vars; no strict 'refs'; my @globs = values %$pkg; # The symbol table is made available through a hash. That hash is # the package's name plus two colons. The values in the symbol # table are typeglobs. Typeglobs are holders of the values used # when you access a global variable, like $foo. That accesses # the foo typeglob's scalar value. Typeglobs are prefixed with # a "", as you can see. foreach my $glob (values %$pkg) { my $name = $glob{NAME}; # Each glob also saves its name, next to the variable-data. if ($name =~ /::\z/) { # As you might recall, a symbol table is made available # through a hash which ends in "::". This hash also lives in # a typeglob, and is thus stored in the symbol table. # (Actually, I'm kind of cheating here. There could be # other datatypes defined here than the hash, and perhaps # the hash isn't even defined. The program won't break # if there are; it just won't return non-hashes that end # with "::".) push @vars => $find_globs->("$pkg$name") unless $name eq 'main::'; # From the main package all packages can be reached, even # main itself. That means main::main:: points to main. You # see where this is leading us: nowhere. So don't follow # any mains. } else { my @types = ( defined $$glob ? '$' : (), defined @$glob ? '@' : (), defined %$glob ? '%' : (), defined &$glob ? '&' : (), ); # Here we see which data types are defined. push @vars => map $_ . $glob{PACKAGE} . "::$name", @types; # Not only does it save the name, it also saves the package # it lives in. } } return @vars; }; use Data::Dumper; # Just to get extra packages. :) # Here you might want to do() your program file. print "$_\n" for $find_globs->('main::'); [download] A typeglob's variable data can also be reached through subscribing it with the type of data you want (called the `foo{THING}` syntax): `SCALAR`, `ARRAY`, `HASH`, `CODE`, `IO`, `GLOB`, (`FILEHANDLE`). E.g. `foo{ARRAY}` gives a reference to `@foo`. Something to be aware of is that `SCALAR` always returns a reference. If the scalar slot for that typeglob isn't defined an anonymous scalar reference will be returned instead. This means that you cannot do `if (foo{SCALAR})` instead of `if (defined $$foo)` because the former will always be true. Also, for subroutines there is a difference between definedness and typeglob slot existance. A forward-declared subroutine will have `foo{CODE}` return true, but if it isn't defined with a body later on (`sub foo { ... }`) `defined &foo` will return false. (You can also check if a subroutine has been declared (independently of defined) with `exists &foo`). It's up to you how you choose to handle this. A forward declaration might indicate that the subroutine will be generated or handled by an `AUTOLOAD` routine, so you could claim that it exists, when needed. There's more to it that I have explained here. This will hopefully get you started though. Quite frankly, I don't know all the magic that goes on under the hood. I know enough to use them effectively, but what actually* happens is not for me to answer. (In fact, if I've gotten something wrong, or something is explained in a weird or backward manner, please notify me one way or another. I'd be nothing but glad if someone would fill in the goriest details or correct my ignorance.) Futher reading can be found in perldata, perlref, and perlmod. Hope I've helped, `ihb`	[reply] [d/l] [select]
Re^2: Grabbing Variable Names by mab (Acolyte) on Mar 30, 2005 at 19:17 UTC
Hi, I'm very interested in your comments about determining if a function has been forward declared but not yet loaded, e.g., when using AutoLoader. For some reason, though, I can't get this to work. Here's my test case: `sub hello; fn = $::{'hello'}; print "declared\n" if fn{CODE}; print "not defined\n" if ! defined &fn; print "exists\n" if exists &fn; print "glob exists\n" if exists $::{'hello'};` [download] For me, this test correctly prints "not defined" and "glob exists" but shouldn't it also print "declared" and "exists?" Thanks for any insight you can give me on this. Best, Mike	[reply] [d/l]
Re^3: Grabbing Variable Names by ihb (Deacon) on Mar 30, 2005 at 21:19 UTC
Long story short is that a forward declaration doesn't create a glob. You can try this by doing `sub foo; print ref \$::{foo}; # "SCALAR" sub bar { 1 }; print ref \$::{bar}; # "GLOB" sub baz; print ref \$::{baz}; # "GLOB" $baz = $baz; # Just to mention the symbol "baz".` [download] As it turns out if you inspect this a bit further the prototype of the subroutine (or "-1" if no prototype) is stored instead of a glob if no glob has been created already. When a symbol with the same name as the forward declaration a glob is created. So when you do `fn = $::{hello}` you actually assign `-1` to `fn` making `fn` aliased to `{-1}`. (`{something}` is just the "safe" way to write `something` for symbols that aren't generally recognized as symbols by Perl and can be used with any sigil, like `&{subname}`.) `sub foo; {-1} = sub { 'oh' }; # Create a sub called "-1". print &{-1}(); # "oh" bar = $::{foo}; # Really? print defined &bar; # "1" Huh? So now &bar is defined # even though &foo isn't? print bar(); # "oh" #$foo; # But all this would change if you # uncommented "$foo;".` [download] Sneaky, eh? `ihb` See perltoc if you don't know which perldoc to read!	[reply] [d/l] [select]
Re: Grabbing Variable Names by pfaut (Priest) on Jan 11, 2003 at 21:36 UTC
You could probably work something up from perl's symbol tables. The root is at `%$main::`. Lookat `perldoc perlmod` or Programming Perl page 293 under 'Symbol Tables'. For a quick dump of the symbol table, try this. `perl -MData::Dumper -e "print Dumper(\%main::)"` `--- print map { my ($m)=1<<hex($_)&11?' ':''; $m.=substr('AHJPacehklnorstu',hex($_),1) } split //,'2fde0abe76c36c914586c';` [download]	[reply] [d/l] [select]
Re^2: Grabbing Variable Names by Aristotle (Chancellor) on Jan 11, 2003 at 23:19 UTC
That's not going to help with lexicals though. You might have more luck with one of the B::* modules. Makeshifts last the longest.	[reply]
Re: Grabbing Variable Names by bart (Canon) on Jan 11, 2003 at 23:04 UTC
Try B::Xref on your script through the O module — meaning you load the O module; B::Xref is like a driver for it; documentation for it is in the latter. The command line syntax basically is: `perl -MO=Xref foo.pl` [download] This will produce a report on what is defined and/or used where. Whether it is readable, now, that is another question. You can write a Perl script to post-process this output, turning into something that you like, and for that, the -r option might be interesting because now you get a table in text format, instead of an outline. Now, the command line looks something like: `perl -MO=Xref,-r foo.pl` [download] I don't know if anybody has written something to produce ressembling what you want, and if so, whether he/she has made it available. So this is a public call: make yourself known!...	[reply] [d/l] [select]
Re: Grabbing Variable Names by cecil36 (Pilgrim) on Jan 11, 2003 at 22:16 UTC
When I took a class in compilers, our first project was to read in a source file written in Pascal, and match up the appropriate tokens for the variables, static values, and operators. I would take a similar approach. Since you know how perl identifies variables, you can write a script that will look for scalars, lists, and hashes by searching for the appropriate variable, and capturing it along with all the text up to the next symbol separating the variable from the rest of the program. I'm assuming that your input is a program that is not obfuscated.	[reply]


Perl-Sensitive Sunglasses
	PerlMonks