Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Of Symbol Tables and Globs

by broquaint (Abbot)
on Nov 08, 2002 at 14:53 UTC ( [id://211441]=perlmeditation: print w/replies, xml ) Need Help??

Welcome to Of Symbol Tables and Globs where you'll be taken on a journey through the inner workings of those mysterious perlish substances: globs and symbol tables. We'll start off in the land of symbol tables where the globs live and in the second part of the tutorial progress onto the glob creatures themselves.

Symbol tables

Perl has two different types of variables - lexical and package global. In this particular tutorial we'll only be covering package global variables as lexical variables have nothing to do with globs or symbol tables (see. Lexical scoping like a fox for more information on lexical variables).

Now a package global variable can only live within a symbol table and is dynamically scoped (versus lexically scoped). These package global variables live in symbol tables, or to be more accurate, they live in slots within globs which themselves live in the symbol tables.

A symbol table comes about in various ways, but the most common way in which they are created is through the package declaration. Every variable, subroutine, filehandle and format declared within a package will live in a glob slot within the given package's symbol table (this is of course excluding any lexical declarations)

## create an anonymous block to limit the scope of the package { package globtut; $var = "a string"; @var = qw( a list of strings ); sub var { } } use Data::Dumper; print Dumper(\%globtut::); __output__ $VAR1 = { 'var' => *globtut::var };
There we create a symbol table with package globtut, then the scalar, array and subroutine are all 'put' into the *var glob because they all share the same name. This is implicit behavior for the vars, so if we wanted to explicitly declare the vars into the globtut symbol table we'd do the following
$globtut::var = "a string"; @globtut::var = qw( a list of strings ); sub globtut::var { } use Data::Dumper; print Dumper(\%globtut::); __output__ $VAR1 = { 'var' => *globtut::var };
Notice how we didn't use a package declaration there? This is because the globtut symbol table is auto-vivified when $globtut::var is declared.

Something else to note about the symbol table is that it has two colons appended to the name, so globtut became %globtut::. This means that any packages that live below that will have :: prepended to the name, so if we add a child package it would be neatly separated by the double colons e.g

use Data::Dumper; { package globtut; package globtut::child; ## ^^ }
Another attribute of symbol tables demonstrated when %globtut:: was dumped above is that they are accessed just like normal perl hashes. In fact, they are like normal hashes in many respects, you can perform all the normal hash operations on a symbol table and add normal key-value pairs, and if you're brave enough to look under the hood you'll notice that they are in fact hashes, but with a touch of that perl Magic. Here are some examples of hash operations being used on symbol tables
use Data::Dumper; { package globtut; $foo = "a string"; $globtut::{bar} = "I'm not even a glob!"; %globtut::baz:: = %globtut::; print Data::Dumper::Dumper(\%globtut::baz::); print "keys: ", join(', ', keys %globtut::), $/; print "values: ", join(', ', values %globtut::), $/; print "each: ", join(' => ', each %globtut::), $/; print "exists: ", (exists $globtut::{foo} && "exists"), $/; print "delete: ", (delete $globtut::{foo} && "deleted"), $/; print "defined: ", (defined $globtut::{foo} || "no foo"), $/; } __output__ $VAR1 = { 'foo' => *globtut::foo, 'bar' => 'I\'m not even a glob!', 'baz::' => *{'globtut::baz::'} }; keys: foo, bar, baz:: values: *globtut::foo, I'm not even a glob!, *globtut::baz:: each: foo => *globtut::foo exists: exists delete: deleted defined: no foo
So to access the globs within the globtut symbol table we access the desired key which will correspond to a variable name
{ package globtut; $variable = "a string"; @variable = qw( a list of strings ); sub variable { } print $globtut::{variable}, "\n"; } __output__ *globtut::variable
And if we want to add another glob to a symbol table we add it exactly like we would with a hash
{ package globtut; $foo = "a string"; $globtut::{variable} = *foo; print "\$variable: $variable\n"; } __output__ $variable: a string
If you'd like to see some more advanced uses of symbol tables and symbol table manipulation then check out the Symbol module which comes with the core perl distribution, and more specifically the Symbol::gensym function.

Globs

So we can now see that globs live within symbol tables, but that doesn't tell us a lot about globs themselves and so this section of the tutorial shall endeavour to explain them.

Within a glob are 6 slots where the various perl data types will be stored. The 6 slots which are available are

  • SCALAR - scalar variables
  • ARRAY - array variables
  • HASH - hash variables
  • CODE - subroutines
  • IO - directory/file handles
  • FORMAT - formats

All these slots are accessible bar the FORMAT slot. Why this is I don't know, but I don't think it's of any great loss.

It may be asked as to why there isn't a GLOB type, and the answer would be that globs are containers or meta-types (depending on how you want to see it) not data types.

Accessing globs is similar to accessing hashes, accept we use the * sigil and the only keys are those data types listed above

$scalar = "a simple string"; print *scalar{SCALAR}, "\n"; __output__ SCALAR(0x8107e78)
"$Exclamation", you say, "I was expecting 'a simple string', not a reference!". This is because the slots within the globs only contain references, and these references point to the values. So what we really wanted to say was
$scalar = "a simple string"; print ${ *scalar{SCALAR} }, "\n"; __output__ a simple string
Which is essentially just a complex way of saying
$scalar = "a simple string"; print $::scalar, "\n"; __output__ a simple string
So as you can probably guess perl's sigils are the conventional method of accessing the individual data types within globs. As for the likes of IO it has to be accessed specifically as perl doesn't provide an access sigil for it.

Something you may have noticed is that we're referencing the globs directly, without going through the symbol table. This is because globs are "global" and are not effected by strict. But if we wanted to access the globs via the symbol table then we would do it like so

$scalar = "a simple string"; print ${ *{$main::{scalar}}{SCALAR} }, "\n"; __output__ a simple string
Now the devious among you may be thinking something along the lines of "If it's a hash then why don't I just put any old value in there?". The answer to this of course, is that you can't as globs aren't hashes! So we can try, but we will fail like so
${ *scalar{FOO} } = "the FOO data type"; __output__ Can't use an undefined value as a SCALAR reference at - line 1.
So we can't force a new type into the glob, we'll only ever get an undefined value when an undefined slot is accessed. But if we were to use SCALAR instead of FOO then the $scalar variable would contain "the FOO data type".

Another thing to be noted from the above example is that you can't assign to glob slots directly, only through dereferencing them.

## this is fine as we're dereferencing the stored reference ${ *foo{SCALAR} } = "a string"; ## this will generate a compile-time error *foo{SCALAR} = "a string"; __output__ Can't modify glob elem in scalar assignment at - line 5, near ""a stri +ng";"
As one might imagine having to dereference a glob with the correct data every time one wants to assign to a glob can be tedious and occasionally prohibitive. Thankfully, globs come with some of perl's yet to be patented Magic, so that when you assign to a glob the correct slot will be filled depending on the datatype being used in the assignment e.g
*foo = \"a scalar"; print $foo, "\n"; *foo = [ qw( a list of strings ) ]; print @foo, "\n"; *foo = sub { "a subroutine" }; print foo(), "\n"; __output__ a scalar alistofstrings a subroutine
Note that we're using references there as globs only contain references, not the actual values. If you assign a value to a glob, it will assign the glob to a glob of the name corresponding to the value. Here's some code to help clarify that last sentence
use Data::Dumper; ## use a fresh uncluttered package for minimal Dumper output { package globtut; *foo = "string"; print Data::Dumper::Dumper(\%globtut::); } __output__ $VAR1 = { 'string' => *globtut::string, 'foo' => *globtut::string };
So when the glob *foo is assigned "string" it then points to the glob *string. But this is generally not what you want, so moving on swiftly ...

Bringing it all together

Now that we have some knowledge of symbol tables and globs let's put them to use by implementing an import method.

When use()ing a module the import method is called from that module. The purpose of this is so that you can import things into the calling package. This is what Exporter does, it imports the things listed in @EXPORT and optionally @EXPORT_OK (see the Exporter docs for more details). An import method will do this by assigning things to the caller's symbol table.

We'll now write a very simple import method to import all the subroutines into the caller's package

## put this code in Foo.pm package Foo; use strict; sub import { ## find out who is calling us my $pkg = caller; ## while strict doesn't deal with globs, it still ## catches symbolic de/referencing no strict 'refs'; ## iterate through all the globs in the symbol table foreach my $glob (keys %Foo::) { ## skip anything without a subroutine and 'import' next if not defined *{$Foo::{$glob}}{CODE} or $glob eq 'import'; ## assign subroutine into caller's package *{$pkg . "::$glob"} = \&{"Foo::$glob"}; } } ## this won't be imported ... $Foo::testsub = "a string"; ## ... but this will sub testsub { print "this is a testsub from Foo\n"; } ## and so will this sub fooify { return join " foo ", @_; } q</package Foo>;
Now for the demonstration code
use Data::Dumper; ## we'll stay out of the 'polluted' %main:: symbol table { package globtut; use Foo; testsub(); print "no \$testsub defined\n" unless defined $testsub; print "fooified: ", fooify(qw( ichi ni san shi )), "\n"; print Data::Dumper::Dumper(\%globtut::); } __output__ this is a testsub from Foo no $testsub defined fooified: ichi foo ni foo san foo shi $VAR1 = { 'testsub' => *globtut::testsub, 'BEGIN' => *globtut::BEGIN, 'fooify' => *globtut::fooify };
Hurrah, we have succesfully imported Foo's subroutines into the globtut symbol table (the BEGIN there is somewhat magical and created during the use).

Summary

So in summary, symbol tables store globs and can be treated like hashes. Globs are accessed like hashes and store references to the individual data types. I hope you've learned something along the way and can now go forth and munge these two no longer mysterious aspects of perl with confidence!

_________
broquaint

Replies are listed 'Best First'.
Re: Of Symbol Tables and Globs
by chromatic (Archbishop) on Nov 08, 2002 at 18:22 UTC
    Perl has two different types of variables - lexical and dynamic.

    I think package global is a much better description than dynamic.

    All these slots are accessible bar the FORM slot. Why this is I don't know...

    Because it's spelled FORMAT.

    nother thing to be noted from the above example is that you can't assign toglob slots directly,

    Judging by your next paragraph, I think you need to clarify what you mean by assign directly.

      I think package global is a much better description than dynamic.

      I think neither is that good. Package global implies that it is only global to the package. Dynamic is probably the better term, problem is that almost nobody will know what it means until they've learned enough perl to not care what term is used.

      I agree with your point about the assigning directly.

      --- demerphq
      my friends call me, usually because I'm late....

        I think saying "lexical (my), and dynamic (use vars, our (with caveats), undeclared (except under use strict), all subs, direct playing with globs (see below), filehandles/dirhandles, and formats)" would do the trick. You might just want to say "(all others)" instead of listing all of those.

        Also, more importantly, you've got a terminology mismatch in the whole document: symbol tables aren't like hashes, they are hashes. %main:: (AKA %::, which you might want to mention explicitly) is a hash. *main::{somevar} isn't a hash, it's a hash element, which contains a glob, which is like a hash in many ways.


        Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

      *glob{FORMAT} is not mentioned in my copy of perlref where this {THING} syntax is described.

      Was that added later? What version? What would you assign it to?

      It also notes that the {IO} slot comingles two distinct values, the file handle and directory handle.

        I believe the FORMAT slot has been present since the birth of typeglobs, but I don't have the source code to prove it. It's like the IO slot -- you can't really assign to it directly. (Since formats are associated strongly with filehandles, it makes a certain amount of sense.)

      I think package global is a much better description than dynamic.
      I'll go with this description then. I went with dynamic because I thought it was a more symmetrical description, but I can see that it is a little too technical for a tutorial.
      Because it's spelled FORMAT.
      Indeed it is, but it's still totally inaccessible through native perl code i.e you can't access the glob slot at all.
      ## IO example open(0); print "[", *0{IO}, "]", $/; ## FORMAT example format foo = . print "[", *foo{FORMAT}, "]", $/ __output__ [IO::Handle=IO(0x80fbb0c)] []
      Judging by your next paragraph, I think you need to clarify what you mean by assign directly.
      Alrighty, I'll sort that out.

      Thanks for the input :)

      _________
      broquaint

Re: Of Symbol Tables and Globs
by dws (Chancellor) on Nov 08, 2002 at 21:58 UTC
    No treatment of symbol tables and globs is truly complete without a walkthrough (or at least a mention) of that wonderous piece of twisted beauty, Symbol::gensym().

      For those intrigued as I was here is a link to Symbol documentation. And a brief cut'n'past from aforementioned docs ...

      Symbol::gensym creates an anonymous glob and returns a reference to it. Such a glob reference can be used as a file or directory handle.

      A search found it used to open a bunch of file handled in the days before you could do lexical file handles (see first answer in Open file issues)

      Cheers,
      R.

      Pereant, qui ante nos nostra dixerunt!
Re: Of Symbol Tables and Globs
by John M. Dlugosz (Monsignor) on Nov 08, 2002 at 20:26 UTC
    Perl has two different types of variables - lexical and dynamic

    I agree with chromatic, that's not a dynamic variable per-se, that's a global variable. Saving it and restoring it via local is a separate effect and only meaningful when a variable is shared among functions, but that doesn't mean it must be a package global to be "dynamic". Although local doesn't work, a lexical variable at package scope (or in a block surrounding several related functions) could also push/pop the value in a similar manner.

    but the most common way in which they are created is through the package declaration...Notice how we didn't use a package declaration there?

    That's confusing. The first should say that the symbol table is created when a symbol in a package is mentioned. This doesn't have to be done by using the local package within that package, just refer to a variable anywhere. Saying it's created by the package declaration just introduces more rules later, but the general rule would be right in all cases.

    The double colons allow the separation of symbol tables

    Huh?

    we treat it like we would a hash

    It is a hash; you just said so. I think you mean "...would any other hash.".

    Ever assign something to a symbol table hash entry that wasn't a glob?

    Note that $globtut::{variable} = \$foo; means different things depending on whether there is already an entry by that name or it is added by the assignment. Contrast with ${"*globtut::variable"}= \$foo; which auto-vivifies as a glob.

    To access the individual slots just treat the glob like a hash e.g

    I would prefer something like, "...use a similar syntax to accessing a hash".

    ${ *scalar{FOO} } = "the FOO data type"; Your error is not due to any protection on symbol tables. Rather, *scalar{FOO} gives undef (rather than a run-time error; go figure) and then the $ dereference gives the error. It's not auto-vivifying like a hash entry would, but it's not a hash! It's a special hash-like syntax. Interesting that the reading of it is not an error.

    Another thing to be noted from the above example is that you can't assign toglob slots directly, only through dereferencing them.

    Very interesting... I always thought I could assign a reference to it, just never bothered because you don't have to use the subscript on the left-hand-side since it knows what kind of reference you are assigning. That is, *scalar= \$x instead of *scalar{SCALAR}=\$x

    we have succesfully imported Foo's subroutines into the main package and nothing else

    1) you imported into globtut package, not main package. 2) the grammar is compound-noun((noun(main package) AND noun(nothing else)), which I don't think is what you meant. A couple typos. missing space, "the the", "callers symbol table", "callers package".

    —John

      The first should say that the symbol table is created when a symbol in a package is mentioned.
      The symbol table is created by the package declaration, which then sets the default package for variables to be declared into for the rest of the lexical scope.
      Saying it's created by the package declaration just introduces more rules later, but the general rule would be right in all cases.
      How does the package declaration introduce more rules later? I think it keeps things a lot simpler as it is explicitly creating a symbol table, where as referring to variables is implicitly creating a symbol table through auto-vivification.
      Huh?
      Indeed, I was never to happy about that particular passage, I'll either try and clear it up or drop it altogether as it isn't terribly important relevant to the rest of the tutorial.
      It is a hash; you just said so.
      Ah, I did just say so and I shouldn't have because symbol tables are not hashes, they are just accessed like hashes. Will clear that up also.
      I would prefer something like, "...use a similar syntax to accessing a hash".
      Ok, will clear that up as well. I'll try and make it clear throughout the tutorial that symbol tables and globs are not hashes, just a similiar (or in the case of symbol tables, exact) way of being accessed.
      Your error is not due to any protection on symbol tables. Rather, *scalar{FOO} gives undef (rather than a run-time error; go figure) and then the $ dereference gives the error.
      That is true indeed, but I was trying to illustrate the point that although they may look like hashes they most certainly are not, will also do my best to clarify that section as well.
      1) you imported into globtut package, not main package. 2) the grammar is compound-noun((noun(main package) AND noun(nothing else)), which I don't think is what you meant. A couple typos. missing space, "the the", "callers symbol table", "callers package".
      Yes, yes and yes :)

      Many thanks for the input!

      _________
      broquaint

Re: Of Symbol Tables and Globs
by trammell (Priest) on Sep 17, 2007 at 16:35 UTC
    Minor nitpick: s/effected/affected/.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://211441]
Approved by adrianh
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (2)
As of 2024-03-19 06:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found