The Concept of References

Some people have a hard time understanding references in Perl - I know, because I did. However, over time, and because I have ended up using references a lot in the code I write, I feel a lot more comfortable with them. Here's how I explain references to new perl users.

Picture how your code may be laid out in memory. A statement like

$count = 10;
[download]

might get laid out like follows

SYMBOL TABLE				MEMORY	    Address
________________			___________
...				$count->|   10    | 1000
________________			___________
| $count | 1000 |			|         | 1001
________________			___________
...					|         | 1002
________________			___________

Here Perl has indicated that the contents of the variable $count are stored at memory location 1000. It stores that information in the symbol table, and in memory, at the location 1000, it writes the value 10. When Perl wants to access the value associated with $count, it looks in the symbol table first, then it looks at the memory location indicated.

Now suppose we add some code like this ...

$copy = $count;
[download]

The resultant memory layout could be something like this

SYMBOL TABLE				MEMORY	    Address
________________			___________
...				$count->|   10    | 1000
________________			___________
| $count | 1000 |		$copy ->|   10    | 1001
________________			___________
| $copy  | 1001 |			|         | 1002
________________			___________
...					|         | 1003
________________			___________

That is, $copy is a new entry in the symbol table, and the symbol table entry indicates that the place in memory to hold $copy's value is different to the location of $count. Once the symbol table entry is created, Perl looks at the value in memory for $count (by looking in the symbol table for where to look in memory) and places the same value in $copy's memory location. So the values are the same, but the locations are different.

But neither of these are references. Lets see how some code that uses references might end up getting laid out in memory.

$ref = \$count; # initialise $ref's value to be a reference to $count
[download]

The resultant memory layout could be something like this

SYMBOL TABLE				MEMORY	    Address
________________			___________
...				$count->|   10    | 1000
________________			___________
| $count | 1000 |		$copy ->|   10    | 1001
________________			___________
| $copy  | 1001 |		$ref  ->|  1000   | 1002
________________			___________
| $ref   | 1002 |			|         | 1003
________________			___________
...					|         | 1004
________________			___________

So the symbol table entry for $ref looks pretty much the same - it indicates where $ref's values will be stored in memory. And what is that value ? Well, because we said initialise it to be a reference to $count, it stores $count's memory address. If we printed this out we'd see something like this

print "ref $ref\n";

ref SCALAR(0x000003e8)
[download]

This says that $ref is a reference to a scalar stored at 3e8 (which 1000 written in hex).

So what does that get us ? Well, $ref is a reference a scalar stored at 1000, so to get the value stored at that location we need to 'dereference' $ref - and we do this by the following code

print "ref value $$ref\n";

ref value 10
[download]

Now that may seem a hard way to access $count's value. After all we can just use $count instead of $$ref. But we can take references to other things - arrays, hashes, subroutines etc. Lets look at how an array reference could be done.

Lets start with our symbol table and memory map layouts.

@array = (10, 20, 30);
[download]

SYMBOL TABLE				MEMORY	    Address
________________			___________
...				@array->|   10    | 1000
________________			___________
| @array | 1000 |			|   20    | 1001
________________			___________
...					|   30    | 1002
________________			___________
					|         | 1003
					___________

So here we see that the symbol table says that @array is stored at memory location 1000, and the memory map shows how the initial values might be laid out - in this case memory location 1000 is where the first element goes, 1001 is the second element etc.

Later we have some code like this

@copy = @array;
[download]

SYMBOL TABLE				MEMORY	    Address
________________			___________
...				@array->|   10    | 1000
________________			___________
| @array | 1000 |			|   20    | 1001
________________			___________
| @copy  | 1003 |			|   30    | 1002
________________			___________
...				@copy ->|   10    | 1003
________________			___________
					|   20    | 1004
					___________
					|   30    | 1005
					___________
					|         | 1006
					___________

This all seems pretty much what we would expect.

What about a reference ?

$ref = \@array;
[download]

SYMBOL TABLE				MEMORY	    Address
________________			___________
...				@array->|   10    | 1000
________________			___________
| @array | 1000 |			|   20    | 1001
________________			___________
| @copy  | 1003 |			|   30    | 1002
________________			___________
| $ref   | 1006 |		@copy ->|   10    | 1003
________________			___________
...					|   20    | 1004
________________			___________
					|   30    | 1005
					___________
				$ref  ->|  1000   | 1006
					___________
					|         | 1007
					___________

So $ref is just like before - it hold where in memory @array is stored. And we can see this in a print statement

print "ref $ref\n";

ref ARRAY(0x000003e8)
[download]

This time we see that $ref is a reference to an array - $ref knows what it is referring to. And we can dereference it too - the most common notations for this are

print $ref->[1];

20
[download]

print $$ref[1];

20
[download]

I like the first notation - the arrow seems to read "go to the referenced value at index 1". YMMV.

What about hashes ?

%hash = (a => 1, b => 2);
[download]

SYMBOL TABLE				MEMORY	    Address
________________			___________
...				%hash ->|   a     | 1000
________________			___________
| %hash  | 1000 |			|   1     | 1001
________________			___________
...					|   b     | 1002
________________			___________
					|   2     | 1003
					___________

$ref = \%hash;
[download]

SYMBOL TABLE				MEMORY	    Address
________________			___________
...				%hash ->|   a     | 1000
________________			___________
| %hash  | 1000 |			|   1     | 1001
________________			___________
| $ref   | 1004 |			|   b     | 1002
________________			___________
...					|   2     | 1003
________________			___________
				$ref  ->|   1000  | 1004
					___________

print "ref $ref";

HASH(000003e8)

print $ref->{b};

2

print $$ref{b};

2
[download]

I wont go into references to other things (subs, filehandles etc) but the concept is the same.

So what do we _do_ with references ? Well, they let us make more complicated data structures for one. Arrays are only allowed to hold scalar values - if we want an entry in an array to be another array, we cant do this

$array[1] = @another; # nope - number of entries in @another is assign
+ed to $array[1]
[download]

But we can do this

$array[1] = \@another;

print $array[1]->[0]; # the first entry in the referenced array
[download]

Hence we can make arrays of arrays(AoA's), hashes of hashes (HoH's) or AoHoAoHoHoA....

Secondly they let us be more efficient about using large arrays or hashes in data structures. For example, say @array had ten million elements - something like this would be inefficient


sub func {
   my (@arr) = @_;

   if (@arr[1] eq 'command') {
   ...
}

func(@ten_million);
[download]

This code copies @ten_million's contents to the function func - all ten million of them. A reference saves us all that overhead - just the location in memory goes to the subroutine.

sub func {
   my ($aref) =@_;

   if ($aref->[1] eq 'command') {
   ...
}

func(\@ten_million);
[download]

What happens if we try to manipulate the reference ?

$ref = \@array;

print "ref $ref\n";

ARRAY(000003e8)

$ref++;

print "ref $ref\n";

1001
[download]

Look at that ! $ref is not a reference anymore - the 'ARRAY' word has disappeared, now its just an ordinary number. So you can take a reference to something, but you cant manipulate it to refer to something else. Languages like C let you do this - its called pointer manipulation and its the cause of more core dumps and corruption than I can count. Its cool, but very very easy to get wrong...

So there is a conceptual guide to references - HTH !

And one last point - not one single thing I have described here is what _actually_ happens in Perl - I believe the concepts are correct, but the implementation is vastly more complicated.

...it is better to be approximately right than precisely wrong. - Warren Buffet

Janitored by Arunbear - added readmore tags, as per Monastery guidelines

Comment on The Concept of References Select or Download Code

Replies are listed 'Best First'.
Re: The Concept of References by DrHyde (Prior) on Apr 14, 2005 at 09:37 UTC
One thing to bear in mind, and which may cause your readers some confusion, is that references are not pointers. While it is true that if you stringify a reference: `$a = \100; # $a is a reference to a scalar print $a;` [download] you get what looks like an address, that is only a convenient way of uniquely representing the thingy that $a refers to. You can't manipulate that string to make it refer to something else like you can in C - that is, you can't do pointer arithmetic. Sometimes I wish you could.	[reply] [d/l]
Re^2: The Concept of References by Forsaken (Friar) on Apr 14, 2005 at 12:10 UTC
Just out of curiosity, could you come up with an example of how pointer arithmetic would be applicable for perl? I've been trying to think of a possible application but failed miserably, I'd be really interested to see your perspective. Remember rule one...	[reply]
Re^3: The Concept of References by dragonchild (Archbishop) on Apr 14, 2005 at 12:43 UTC
You don't, because doing so would violate the guarantee that you cannot buffer overflow in Perl. But, there are good reasons why C gives you the rope to buffer overflow. Let's say you want to iterate through the characters of a string and do something with each character. In C, that's pretty simple. Strings are just arrays of characters and arrays are just fancy pointers. `char string[] = "Hello"; char ptr; fr ( ptr = string; ptr; ptr++ ) { do_something_with( *ptr ); // This is a single char }` [download] Now, let's say you want to do the same thing in Perl. There's a few ways, but none are anywhere as efficient. `foreach my $char ( split //, $string ) { ... } while ( $string =~ /(.)/g ) { my $char = $1; ... } for ( my $i = 0; $i <= length $string; $i++ ) { my $char = substr( $st +ring, $index, 1 ); ... }` [download] My wife's blog	[reply] [d/l] [select]
Re^3: The Concept of References by DrHyde (Prior) on Apr 15, 2005 at 08:48 UTC
Brother dragonchild's example is a good one. It's not really a question of needing them, but that they are another tool available which let you approach problems in different ways. Sometimes a pointerish solution - like dragonchild's - would be easier to understand, just like sometimes an OO solution is easier, or a functional solution is easier. I do take the point that with pointers comes danger. It could probably be mitigated because perl knows more about the underlying data structures that you would be pointing at than C does - for instance, if you increment a pointer beyond the end of a string, perl can know that you've done that and so could automagically extend the string.	[reply]
Re: The Concept of References by Anonymous Monk on Apr 14, 2005 at 11:07 UTC
That looks like an explaination of pointers. It strongly suggests that you can do arithmetic on references, to get to the "next" memory location. I like this image: `+--------+ Alias: \| $alias \| ------------+ +--------+ \| \| \\|/ +--------+ +----------+ +---------+ Variable: \| $var \| ----> \| Variable \| ----> \| Content \| +--------+ +----------+ +---------+ /\|\ +--------+ +--------+ +----------+ \| Reference: \| $ref \| ----> \| Variable \| --+ +--------+ +----------+` [download] But that might be too much internals for some.	[reply] [d/l]
Re^2: The Concept of References by Juerd (Abbot) on Apr 14, 2005 at 12:25 UTC
These things are often called name, container and value. Your diagram has them neatly put in columns: `Name Container Value +--------+ Alias: \| $alias \| ------------+ +--------+ \| \| \\|/ +--------+ +----------+ +---------+ Variable: \| $var \| ----> \| Variable \| ----> \| Content \| +--------+ +----------+ +---------+ /\|\ +--------+ +--------+ +----------+ \| Reference: \| $ref \| ----> \| Variable \| --+ +--------+ +----------+` [download] The "variable" label isn't accurate, though, if you also call one of the blocks "variable". And, to more clearly illustrate that an alias is absolutely not a reference, you could consider making the original name and the second name more visually equal: `+--------+ __ \| $name \| \ +--------+ \ +-----------+ +-------+ ====> \| container \| ----> \| value \| +--------+ / +-----------+ +-------+ \| $alias \| __/ +--------+` [download] And I think the content/value part of Reference is missing in your diagram. It does have one, and that is the actual reference! Hard to draw, though. Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }	[reply] [d/l] [select]
Re^3: The Concept of References by Anonymous Monk on Apr 14, 2005 at 13:31 UTC
I know how references work internally, but those are details not really needed to understand references and aliases. As for visually equalizing $name and $alias, I usually do the diagram interactively, in front of a blackboard, starting with the middle line of my diagram. And then I won't wipe out the $name box to draw it a bit lower (or higher).	[reply]
Re^2: The Concept of References by leriksen (Curate) on Apr 15, 2005 at 02:27 UTC
Nice diagram - I deliberately didnt do it this way, even though yours is more accurate. And I hope to add The Concept of Alias' today. ...it is better to be approximately right than precisely wrong. - Warren Buffet	[reply]
Re: The Concept of References by Fletch (Bishop) on Apr 14, 2005 at 12:55 UTC
And if you're really interested in the hairy details underneath there's the illustrated `perlguts` (although it is somewhat long in the tooth).	[reply] [d/l]
Re: The Concept of References by polettix (Vicar) on Apr 14, 2005 at 15:13 UTC
Due to my C++ roots, I really find it more useful to think of Perl References as if they were pointers more than references, because of what my (limited) brain believes a reference should be. Apart from the (correct) pointer arithmetics stuff, the semantic of a Perl reference requires an indirection, while in C++ a reference is just some kind of alias for another variable/object and requires no indirection at all. That's also why I'll feel less comfortable with Perl 6, which I understand will jump from "->" to "." for dereferencing in objects. But this is a mined field, 'cause I know really nothing about Perl 6! Flavio (perl -e "print(scalar(reverse('ti.xittelop@oivalf')))") Don't fool yourself.	[reply]
Re^2: The Concept of References by revdiablo (Prior) on Apr 14, 2005 at 16:29 UTC
Another difference between Perl's references and pointers ala C, is that a reference knows what it refers to. It knows what kind of reference it is. It's not just a reference, but an array reference or a hash reference, etc. Update: my lack of C knowledge should be self-evident. Thanks to frodo72 for explaining why I'm wrong.	[reply]
Re^3: The Concept of References by polettix (Vicar) on Apr 14, 2005 at 17:43 UTC
One could argue that a pointer in C, e.g. to a `struct tm`, knows what it points to, so that you can write: `struct tm p = &data; localtime(p); printf("Year: %d\n", p->tm_year);` [download] which, assuming you have a hash filled with the very same parameters, resembles `$p->{'tm_year'}` very much. Flavio (perl -e "print(scalar(reverse('ti.xittelop@oivalf')))") Don't fool yourself.*	[reply] [d/l] [select]
Re^4: The Concept of References by revdiablo (Prior) on Apr 14, 2005 at 18:06 UTC
Re^5: The Concept of References by polettix (Vicar) on Apr 14, 2005 at 18:13 UTC
Re^4: The Concept of References by adrianh (Chancellor) on Apr 15, 2005 at 14:57 UTC
Re^5: The Concept of References by Anonymous Monk on Apr 15, 2005 at 15:45 UTC
Some notes below your chosen depth have not been shown here