Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

The Concept of References

by leriksen (Curate)
on Apr 14, 2005 at 06:19 UTC ( #447659=perlmeditation: print w/ replies, xml ) Need Help??

Some people have a hard time understanding references in Perl - I know, because I did. However, over time, and because I have ended up using references a lot in the code I write, I feel a lot more comfortable with them. Here's how I explain references to new perl users.

Picture how your code may be laid out in memory. A statement like

$count = 10;
might get laid out like follows
SYMBOL TABLE				MEMORY	    Address
________________			___________
...				$count->|   10    | 1000
________________			___________
| $count | 1000 |			|         | 1001
________________			___________
...					|         | 1002
________________			___________
Here Perl has indicated that the contents of the variable $count are stored at memory location 1000. It stores that information in the symbol table, and in memory, at the location 1000, it writes the value 10. When Perl wants to access the value associated with $count, it looks in the symbol table first, then it looks at the memory location indicated.

Now suppose we add some code like this ...

$copy = $count;
The resultant memory layout could be something like this
SYMBOL TABLE				MEMORY	    Address
________________			___________
...				$count->|   10    | 1000
________________			___________
| $count | 1000 |		$copy ->|   10    | 1001
________________			___________
| $copy  | 1001 |			|         | 1002
________________			___________
...					|         | 1003
________________			___________

That is, $copy is a new entry in the symbol table, and the symbol table entry indicates that the place in memory to hold $copy's value is different to the location of $count. Once the symbol table entry is created, Perl looks at the value in memory for $count (by looking in the symbol table for where to look in memory) and places the same value in $copy's memory location. So the values are the same, but the locations are different.

But neither of these are references. Lets see how some code that uses references might end up getting laid out in memory.

$ref = \$count; # initialise $ref's value to be a reference to $count
The resultant memory layout could be something like this
SYMBOL TABLE				MEMORY	    Address
________________			___________
...				$count->|   10    | 1000
________________			___________
| $count | 1000 |		$copy ->|   10    | 1001
________________			___________
| $copy  | 1001 |		$ref  ->|  1000   | 1002
________________			___________
| $ref   | 1002 |			|         | 1003
________________			___________
...					|         | 1004
________________			___________
So the symbol table entry for $ref looks pretty much the same - it indicates where $ref's values will be stored in memory. And what is that value ? Well, because we said initialise it to be a reference to $count, it stores $count's memory address. If we printed this out we'd see something like this
print "ref $ref\n"; ref SCALAR(0x000003e8)
This says that $ref is a reference to a scalar stored at 3e8 (which 1000 written in hex).

So what does that get us ? Well, $ref is a reference a scalar stored at 1000, so to get the value stored at that location we need to 'dereference' $ref - and we do this by the following code

print "ref value $$ref\n"; ref value 10
Now that may seem a hard way to access $count's value. After all we can just use $count instead of $$ref. But we can take references to other things - arrays, hashes, subroutines etc. Lets look at how an array reference could be done.

Lets start with our symbol table and memory map layouts.

@array = (10, 20, 30);
SYMBOL TABLE				MEMORY	    Address
________________			___________
...				@array->|   10    | 1000
________________			___________
| @array | 1000 |			|   20    | 1001
________________			___________
...					|   30    | 1002
________________			___________
					|         | 1003
					___________

So here we see that the symbol table says that @array is stored at memory location 1000, and the memory map shows how the initial values might be laid out - in this case memory location 1000 is where the first element goes, 1001 is the second element etc.

Later we have some code like this

@copy = @array;
SYMBOL TABLE				MEMORY	    Address
________________			___________
...				@array->|   10    | 1000
________________			___________
| @array | 1000 |			|   20    | 1001
________________			___________
| @copy  | 1003 |			|   30    | 1002
________________			___________
...				@copy ->|   10    | 1003
________________			___________
					|   20    | 1004
					___________
					|   30    | 1005
					___________
					|         | 1006
					___________

This all seems pretty much what we would expect.

What about a reference ?

$ref = \@array;
SYMBOL TABLE				MEMORY	    Address
________________			___________
...				@array->|   10    | 1000
________________			___________
| @array | 1000 |			|   20    | 1001
________________			___________
| @copy  | 1003 |			|   30    | 1002
________________			___________
| $ref   | 1006 |		@copy ->|   10    | 1003
________________			___________
...					|   20    | 1004
________________			___________
					|   30    | 1005
					___________
				$ref  ->|  1000   | 1006
					___________
					|         | 1007
					___________

So $ref is just like before - it hold where in memory @array is stored. And we can see this in a print statement
print "ref $ref\n"; ref ARRAY(0x000003e8)
This time we see that $ref is a reference to an array - $ref knows what it is referring to. And we can dereference it too - the most common notations for this are
print $ref->[1]; 20
or
print $$ref[1]; 20
I like the first notation - the arrow seems to read "go to the referenced value at index 1". YMMV.

What about hashes ?

%hash = (a => 1, b => 2);
SYMBOL TABLE				MEMORY	    Address
________________			___________
...				%hash ->|   a     | 1000
________________			___________
| %hash  | 1000 |			|   1     | 1001
________________			___________
...					|   b     | 1002
________________			___________
					|   2     | 1003
					___________
$ref = \%hash;
SYMBOL TABLE				MEMORY	    Address
________________			___________
...				%hash ->|   a     | 1000
________________			___________
| %hash  | 1000 |			|   1     | 1001
________________			___________
| $ref   | 1004 |			|   b     | 1002
________________			___________
...					|   2     | 1003
________________			___________
				$ref  ->|   1000  | 1004
					___________

print "ref $ref"; HASH(000003e8) print $ref->{b}; 2 print $$ref{b}; 2
I wont go into references to other things (subs, filehandles etc) but the concept is the same.

So what do we _do_ with references ? Well, they let us make more complicated data structures for one. Arrays are only allowed to hold scalar values - if we want an entry in an array to be another array, we cant do this

$array[1] = @another; # nope - number of entries in @another is assign +ed to $array[1]
But we can do this
$array[1] = \@another; print $array[1]->[0]; # the first entry in the referenced array
Hence we can make arrays of arrays(AoA's), hashes of hashes (HoH's) or AoHoAoHoHoA....

Secondly they let us be more efficient about using large arrays or hashes in data structures. For example, say @array had ten million elements - something like this would be inefficient

sub func { my (@arr) = @_; if (@arr[1] eq 'command') { ... } func(@ten_million);
This code copies @ten_million's contents to the function func - all ten million of them. A reference saves us all that overhead - just the location in memory goes to the subroutine.
sub func { my ($aref) =@_; if ($aref->[1] eq 'command') { ... } func(\@ten_million);

What happens if we try to manipulate the reference ?

$ref = \@array; print "ref $ref\n"; ARRAY(000003e8) $ref++; print "ref $ref\n"; 1001
Look at that ! $ref is not a reference anymore - the 'ARRAY' word has disappeared, now its just an ordinary number. So you can take a reference to something, but you cant manipulate it to refer to something else. Languages like C let you do this - its called pointer manipulation and its the cause of more core dumps and corruption than I can count. Its cool, but very very easy to get wrong...

So there is a conceptual guide to references - HTH !

And one last point - not one single thing I have described here is what _actually_ happens in Perl - I believe the concepts are correct, but the implementation is vastly more complicated.

...it is better to be approximately right than precisely wrong. - Warren Buffet

Janitored by Arunbear - added readmore tags, as per Monastery guidelines

Comment on The Concept of References
Select or Download Code
Re: The Concept of References
by DrHyde (Prior) on Apr 14, 2005 at 09:37 UTC
    One thing to bear in mind, and which may cause your readers some confusion, is that references are *not* pointers. While it is true that if you stringify a reference:
    $a = \100; # $a is a reference to a scalar print $a;
    you get what looks like an address, that is only a convenient way of uniquely representing the thingy that $a refers to. You can't manipulate that string to make it refer to something else like you can in C - that is, you can't do pointer arithmetic.

    Sometimes I wish you could.

      Just out of curiosity, could you come up with an example of how pointer arithmetic would be applicable for perl? I've been trying to think of a possible application but failed miserably, I'd be really interested to see your perspective.

      Remember rule one...
        You don't, because doing so would violate the guarantee that you cannot buffer overflow in Perl. But, there are good reasons why C gives you the rope to buffer overflow.

        Let's say you want to iterate through the characters of a string and do something with each character. In C, that's pretty simple. Strings are just arrays of characters and arrays are just fancy pointers.

        char string[] = "Hello"; char *ptr; fr ( ptr = string; *ptr; ptr++ ) { do_something_with( *ptr ); // This is a single char }
        Now, let's say you want to do the same thing in Perl. There's a few ways, but none are anywhere as efficient.
        foreach my $char ( split //, $string ) { ... } while ( $string =~ /(.)/g ) { my $char = $1; ... } for ( my $i = 0; $i <= length $string; $i++ ) { my $char = substr( $st +ring, $index, 1 ); ... }
        Brother dragonchild's example is a good one.

        It's not really a question of needing them, but that they are another tool available which let you approach problems in different ways. Sometimes a pointerish solution - like dragonchild's - would be easier to understand, just like sometimes an OO solution is easier, or a functional solution is easier.

        I do take the point that with pointers comes danger. It could probably be mitigated because perl knows more about the underlying data structures that you would be pointing at than C does - for instance, if you increment a pointer beyond the end of a string, perl can know that you've done that and so could automagically extend the string.

Re: The Concept of References
by Anonymous Monk on Apr 14, 2005 at 11:07 UTC
    That looks like an explaination of pointers. It strongly suggests that you can do arithmetic on references, to get to the "next" memory location. I like this image:
    +--------+ Alias: | $alias | ------------+ +--------+ | | \|/ +--------+ +----------+ +---------+ Variable: | $var | ----> | Variable | ----> | Content | +--------+ +----------+ +---------+ /|\ +--------+ +--------+ +----------+ | Reference: | $ref | ----> | Variable | --+ +--------+ +----------+
    But that might be too much internals for some.

      These things are often called name, container and value. Your diagram has them neatly put in columns:

      Name Container Value +--------+ Alias: | $alias | ------------+ +--------+ | | \|/ +--------+ +----------+ +---------+ Variable: | $var | ----> | Variable | ----> | Content | +--------+ +----------+ +---------+ /|\ +--------+ +--------+ +----------+ | Reference: | $ref | ----> | Variable | --+ +--------+ +----------+
      The "variable" label isn't accurate, though, if you also call one of the blocks "variable". And, to more clearly illustrate that an alias is absolutely not a reference, you could consider making the original name and the second name more visually equal:
      +--------+ __ | $name | \ +--------+ \ +-----------+ +-------+ ====> | container | ----> | value | +--------+ / +-----------+ +-------+ | $alias | __/ +--------+
      And I think the content/value part of Reference is missing in your diagram. It does have one, and that is the actual reference! Hard to draw, though.

      Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

        I know how references work internally, but those are details not really needed to understand references and aliases. As for visually equalizing $name and $alias, I usually do the diagram interactively, in front of a blackboard, starting with the middle line of my diagram. And then I won't wipe out the $name box to draw it a bit lower (or higher).
      Nice diagram - I deliberately didnt do it this way, even though yours is more accurate. And I hope to add The Concept of Alias' today.

      ...it is better to be approximately right than precisely wrong. - Warren Buffet

Re: The Concept of References
by Fletch (Chancellor) on Apr 14, 2005 at 12:55 UTC

    And if you're really interested in the hairy details underneath there's the illustrated perlguts (although it is somewhat long in the tooth).

Re: The Concept of References
by polettix (Vicar) on Apr 14, 2005 at 15:13 UTC
    Due to my C++ roots, I really find it more useful to think of Perl References as if they were pointers more than references, because of what my (limited) brain believes a reference should be. Apart from the (correct) pointer arithmetics stuff, the semantic of a Perl reference requires an indirection, while in C++ a reference is just some kind of alias for another variable/object and requires no indirection at all.

    That's also why I'll feel less comfortable with Perl 6, which I understand will jump from "->" to "." for dereferencing in objects. But this is a mined field, 'cause I know really nothing about Perl 6!

    Flavio (perl -e "print(scalar(reverse('ti.xittelop@oivalf')))")

    Don't fool yourself.

      Another difference between Perl's references and pointers ala C, is that a reference knows what it refers to. It knows what kind of reference it is. It's not just a reference, but an array reference or a hash reference, etc.

      Update: my lack of C knowledge should be self-evident. Thanks to frodo72 for explaining why I'm wrong.

        One could argue that a pointer in C, e.g. to a struct tm, knows what it points to, so that you can write:
        struct tm *p = &data; localtime(p); printf("Year: %d\n", p->tm_year);
        which, assuming you have a hash filled with the very same parameters, resembles $p->{'tm_year'} very much.

        Flavio (perl -e "print(scalar(reverse('ti.xittelop@oivalf')))")

        Don't fool yourself.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://447659]
Approved by BrowserUk
Front-paged by ghenry
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2014-10-26 00:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (149 votes), past polls