Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Grabbing Variable Names

by vbrtrmn (Pilgrim)
on Jan 11, 2003 at 21:18 UTC ( #226128=perlquestion: print w/ replies, xml ) Need Help??
vbrtrmn has asked for the wisdom of the Perl Monks concerning the following question:

I was thinking about this today.
Let's say I had a fairly large program, is there any easy way to find every variable that is in the program?
Have it output to something like this:

Main Routine $abc $cci $abd @stuff1 @thisorthat %allstuff %morestuff SubRoutine1 $alidf $ssfla @test _so forth and so on_

Just wondering

--
paul

Comment on Grabbing Variable Names
Download Code
Re: Grabbing Variable Names
by pfaut (Priest) on Jan 11, 2003 at 21:36 UTC

    You could probably work something up from perl's symbol tables. The root is at %$main::. Lookat perldoc perlmod or Programming Perl page 293 under 'Symbol Tables'.

    For a quick dump of the symbol table, try this.

    perl -MData::Dumper -e "print Dumper(\%main::)"
    --- print map { my ($m)=1<<hex($_)&11?' ':''; $m.=substr('AHJPacehklnorstu',hex($_),1) } split //,'2fde0abe76c36c914586c';
      That's not going to help with lexicals though. You might have more luck with one of the B::* modules.

      Makeshifts last the longest.

Re: Grabbing Variable Names
by cecil36 (Monk) on Jan 11, 2003 at 22:16 UTC
    When I took a class in compilers, our first project was to read in a source file written in Pascal, and match up the appropriate tokens for the variables, static values, and operators. I would take a similar approach. Since you know how perl identifies variables, you can write a script that will look for scalars, lists, and hashes by searching for the appropriate variable, and capturing it along with all the text up to the next symbol separating the variable from the rest of the program. I'm assuming that your input is a program that is not obfuscated.
Re: Grabbing Variable Names
by theorbtwo (Prior) on Jan 11, 2003 at 22:29 UTC

    Both pfaut and cecil36 got part of the answer. The problem with pfaut's answer is that the symbol table (which, BTW, is in %main::, not %$main::) only lists non-my variables. Lexicals are in what's called a "scratchpad" which is much less easy to access from perl code. The problem with cecil36's answer is that perl is very difficult to parse, and since perl offers amazing introspection, you might as well use it instead of reimplementing perl.

    The easiest answer is to use B::Xref, which will give you a nice cross-reference of where all variables are used, defined, etc. (That's actualy more information then you wanted. I assume it's fairly easy to change the output format, though.)

    B::Xref, like the other B modules, looks through the internal bytecode generated by perl, so thus reuses perl's parser, meaning your gaurenteed to get the same interpratition as actualy running the code. (This doesn't mean that it's impossible to fool, just that it's more difficult. Using symbolic references will fool anything that doesn't profile the running code, as will eval STRING.


    Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

      B::Xref, like the other B modules, looks through the internal bytecode generated by perl, so thus reuses perl's parser
      Oh I hate that mischaracterization. Perl doesn't normally have anything to do with bytecode. When the perl compiler parses perl (between toke.c and perly.y) it just constructs a bunch of C structs in memory. That's it. Those structs are the OP codes people sometimes mention. They're named stuff like enter, leave, const, padsv, print, etc. Perl executes by following these things around and occasionally triggering a C routine or two.

      The only reason people ever mention bytecode is when making some Java comparison or when attempting to use B::Bytecode. Perl's bytecode is just a serialization of the opcode tree. It's also not a particularly effective hack either - (it's never worked for me).

      So please, don't go on about Perl's bytecode. It doesn't use any. Unless you really mean to force the issue but that doesn't count because it's just loading another module. It's not actually 'perl'

        I stand corrected. Though, actualy, I think people use the term "bytecode" somwhat incorrectly (and I say only somewhat) because the B::* tree is named B-for-Bytecode (OK, it's actualy not, it's B-for-Backend, but I think it's confusingly similar), and it's original purpose was that ineffective hack.

        In any case, do you have a better term then bytecode, other then "opcode tree", which seems a little unwieldy?


        Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

Re: Grabbing Variable Names
by bart (Canon) on Jan 11, 2003 at 23:04 UTC
    Try B::Xref on your script through the O module — meaning you load the O module; B::Xref is like a driver for it; documentation for it is in the latter. The command line syntax basically is:
    perl -MO=Xref foo.pl
    This will produce a report on what is defined and/or used where. Whether it is readable, now, that is another question. You can write a Perl script to post-process this output, turning into something that you like, and for that, the -r option might be interesting because now you get a table in text format, instead of an outline. Now, the command line looks something like:
    perl -MO=Xref,-r foo.pl
    I don't know if anybody has written something to produce ressembling what you want, and if so, whether he/she has made it available. So this is a public call: make yourself known!...
Re: Grabbing Variable Names
by ihb (Deacon) on Jan 12, 2003 at 00:14 UTC
    As pointed out in previous answers this issue is two totally separate subissues depending on whether you want to find lexicals or globals.

    For people that know how perl is built up with symbol tables and typeglobs this is an easy match. But I've found that symtables (for short) and typeglobs are something confusing and hard-to-grasp for many. Personally I had to read an extra (non-perldoc) document or two to understand them when I first fiddled with them. perldata certainly wasn't enough. Neither was perlref or perlmod. The documentation needed to solve this problem is so scattered over different documents in perldoc I figured I'd cook up something and give a little step-by-step explanation. It could perhaps be better, but hopefully this gives you some sort of understanding of how symtables and typeglobs can be used in practice.

    (After rereading my own post I realized that this turned out to a very compact mini-tutorial on symtables and especially typeglobs. That wasn't intentional. I just kept filling in with explanations as I felt necessary. It's quite possible that it got a little bit too dense.)
    # We define this subroutine anonymously and store it in a lexical # scalar to avoid it getting found by itself. my $find_globs; $find_globs = sub { my ($pkg) = @_; my @vars; no strict 'refs'; my @globs = values %$pkg; # The symbol table is made available through a hash. That hash is # the package's name plus two colons. The values in the symbol # table are typeglobs. Typeglobs are holders of the values used # when you access a global variable, like $foo. That accesses # the *foo typeglob's scalar value. Typeglobs are prefixed with # a "*", as you can see. foreach my $glob (values %$pkg) { my $name = *$glob{NAME}; # Each glob also saves its name, next to the variable-data. if ($name =~ /::\z/) { # As you might recall, a symbol table is made available # through a hash which ends in "::". This hash also lives in # a typeglob, and is thus stored in the symbol table. # (Actually, I'm kind of cheating here. There could be # other datatypes defined here than the hash, and perhaps # the hash isn't even defined. The program won't break # if there are; it just won't return non-hashes that end # with "::".) push @vars => $find_globs->("$pkg$name") unless $name eq 'main::'; # From the main package all packages can be reached, even # main itself. That means *main::main:: points to main. You # see where this is leading us: nowhere. So don't follow # any mains. } else { my @types = ( defined $$glob ? '$' : (), defined @$glob ? '@' : (), defined %$glob ? '%' : (), defined &$glob ? '&' : (), ); # Here we see which data types are defined. push @vars => map $_ . *$glob{PACKAGE} . "::$name", @types; # Not only does it save the name, it also saves the package # it lives in. } } return @vars; }; use Data::Dumper; # Just to get extra packages. :) # Here you might want to do() your program file. print "$_\n" for $find_globs->('main::');
    A typeglob's variable data can also be reached through subscribing it with the type of data you want (called the *foo{THING} syntax): SCALAR, ARRAY, HASH, CODE, IO, GLOB, (FILEHANDLE). E.g. *foo{ARRAY} gives a reference to @foo. Something to be aware of is that SCALAR always returns a reference. If the scalar slot for that typeglob isn't defined an anonymous scalar reference will be returned instead. This means that you cannot do if (*foo{SCALAR}) instead of if (defined $$foo) because the former will always be true. Also, for subroutines there is a difference between definedness and typeglob slot existance. A forward-declared subroutine will have *foo{CODE} return true, but if it isn't defined with a body later on (sub foo { ... }) defined &foo will return false. (You can also check if a subroutine has been declared (independently of defined) with exists &foo). It's up to you how you choose to handle this. A forward declaration might indicate that the subroutine will be generated or handled by an AUTOLOAD routine, so you could claim that it exists, when needed.

    There's more to it that I have explained here. This will hopefully get you started though. Quite frankly, I don't know all the magic that goes on under the hood. I know enough to use them effectively, but what actually happens is not for me to answer. (In fact, if I've gotten something wrong, or something is explained in a weird or backward manner, please notify me one way or another. I'd be nothing but glad if someone would fill in the goriest details or correct my ignorance.)

    Futher reading can be found in perldata, perlref, and perlmod.

    Hope I've helped,
    ihb
      Hi,

      I'm very interested in your comments about determining if a function has been forward declared but not yet loaded, e.g., when using AutoLoader. For some reason, though, I can't get this to work. Here's my test case:
      sub hello; *fn = $::{'hello'}; print "declared\n" if *fn{CODE}; print "not defined\n" if ! defined &fn; print "exists\n" if exists &fn; print "glob exists\n" if exists $::{'hello'};
      For me, this test correctly prints "not defined" and "glob exists" but shouldn't it also print "declared" and "exists?"

      Thanks for any insight you can give me on this.

      Best,
      Mike

        Long story short is that a forward declaration doesn't create a glob. You can try this by doing

        sub foo; print ref \$::{foo}; # "SCALAR" sub bar { 1 }; print ref \$::{bar}; # "GLOB" sub baz; print ref \$::{baz}; # "GLOB" $baz = $baz; # Just to mention the symbol "baz".
        As it turns out if you inspect this a bit further the prototype of the subroutine (or "-1" if no prototype) is stored instead of a glob if no glob has been created already. When a symbol with the same name as the forward declaration a glob is created. So when you do *fn = $::{hello} you actually assign -1 to *fn making *fn aliased to *{-1}. (*{something} is just the "safe" way to write *something for symbols that aren't generally recognized as symbols by Perl and can be used with any sigil, like &{subname}.)
        sub foo; *{-1} = sub { 'oh' }; # Create a sub called "-1". print &{-1}(); # "oh" *bar = $::{foo}; # Really? print defined &bar; # "1" Huh? So now &bar is defined # even though &foo isn't? print bar(); # "oh" #$foo; # But all this would change if you # uncommented "$foo;".
        Sneaky, eh?

        ihb

        See perltoc if you don't know which perldoc to read!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://226128]
Approved by pfaut
Front-paged by pfaut
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2014-09-24 03:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (245 votes), past polls