The Concept of References

Some people have a hard time understanding references in Perl - I know, because I did. However, over time, and because I have ended up using references a lot in the code I write, I feel a lot more comfortable with them. Here's how I explain references to new perl users.

Picture how your code may be laid out in memory. A statement like

$count = 10;
[download]

might get laid out like follows

SYMBOL TABLE				MEMORY	    Address
________________			___________
...				$count->|   10    | 1000
________________			___________
| $count | 1000 |			|         | 1001
________________			___________
...					|         | 1002
________________			___________

Here Perl has indicated that the contents of the variable $count are stored at memory location 1000. It stores that information in the symbol table, and in memory, at the location 1000, it writes the value 10. When Perl wants to access the value associated with $count, it looks in the symbol table first, then it looks at the memory location indicated.

Now suppose we add some code like this ...

$copy = $count;
[download]

The resultant memory layout could be something like this

SYMBOL TABLE				MEMORY	    Address
________________			___________
...				$count->|   10    | 1000
________________			___________
| $count | 1000 |		$copy ->|   10    | 1001
________________			___________
| $copy  | 1001 |			|         | 1002
________________			___________
...					|         | 1003
________________			___________

That is, $copy is a new entry in the symbol table, and the symbol table entry indicates that the place in memory to hold $copy's value is different to the location of $count. Once the symbol table entry is created, Perl looks at the value in memory for $count (by looking in the symbol table for where to look in memory) and places the same value in $copy's memory location. So the values are the same, but the locations are different.

But neither of these are references. Lets see how some code that uses references might end up getting laid out in memory.

$ref = \$count; # initialise $ref's value to be a reference to $count
[download]

The resultant memory layout could be something like this

SYMBOL TABLE				MEMORY	    Address
________________			___________
...				$count->|   10    | 1000
________________			___________
| $count | 1000 |		$copy ->|   10    | 1001
________________			___________
| $copy  | 1001 |		$ref  ->|  1000   | 1002
________________			___________
| $ref   | 1002 |			|         | 1003
________________			___________
...					|         | 1004
________________			___________

So the symbol table entry for $ref looks pretty much the same - it indicates where $ref's values will be stored in memory. And what is that value ? Well, because we said initialise it to be a reference to $count, it stores $count's memory address. If we printed this out we'd see something like this

print "ref $ref\n";

ref SCALAR(0x000003e8)
[download]

This says that $ref is a reference to a scalar stored at 3e8 (which 1000 written in hex).

So what does that get us ? Well, $ref is a reference a scalar stored at 1000, so to get the value stored at that location we need to 'dereference' $ref - and we do this by the following code

print "ref value $$ref\n";

ref value 10
[download]

Now that may seem a hard way to access $count's value. After all we can just use $count instead of $$ref. But we can take references to other things - arrays, hashes, subroutines etc. Lets look at how an array reference could be done.

Lets start with our symbol table and memory map layouts.

@array = (10, 20, 30);
[download]

SYMBOL TABLE				MEMORY	    Address
________________			___________
...				@array->|   10    | 1000
________________			___________
| @array | 1000 |			|   20    | 1001
________________			___________
...					|   30    | 1002
________________			___________
					|         | 1003
					___________

So here we see that the symbol table says that @array is stored at memory location 1000, and the memory map shows how the initial values might be laid out - in this case memory location 1000 is where the first element goes, 1001 is the second element etc.

Later we have some code like this

@copy = @array;
[download]

SYMBOL TABLE				MEMORY	    Address
________________			___________
...				@array->|   10    | 1000
________________			___________
| @array | 1000 |			|   20    | 1001
________________			___________
| @copy  | 1003 |			|   30    | 1002
________________			___________
...				@copy ->|   10    | 1003
________________			___________
					|   20    | 1004
					___________
					|   30    | 1005
					___________
					|         | 1006
					___________

This all seems pretty much what we would expect.

What about a reference ?

$ref = \@array;
[download]

SYMBOL TABLE				MEMORY	    Address
________________			___________
...				@array->|   10    | 1000
________________			___________
| @array | 1000 |			|   20    | 1001
________________			___________
| @copy  | 1003 |			|   30    | 1002
________________			___________
| $ref   | 1006 |		@copy ->|   10    | 1003
________________			___________
...					|   20    | 1004
________________			___________
					|   30    | 1005
					___________
				$ref  ->|  1000   | 1006
					___________
					|         | 1007
					___________

So $ref is just like before - it hold where in memory @array is stored. And we can see this in a print statement

print "ref $ref\n";

ref ARRAY(0x000003e8)
[download]

This time we see that $ref is a reference to an array - $ref knows what it is referring to. And we can dereference it too - the most common notations for this are

print $ref->[1];

20
[download]

print $$ref[1];

20
[download]

I like the first notation - the arrow seems to read "go to the referenced value at index 1". YMMV.

What about hashes ?

%hash = (a => 1, b => 2);
[download]

SYMBOL TABLE				MEMORY	    Address
________________			___________
...				%hash ->|   a     | 1000
________________			___________
| %hash  | 1000 |			|   1     | 1001
________________			___________
...					|   b     | 1002
________________			___________
					|   2     | 1003
					___________

$ref = \%hash;
[download]

SYMBOL TABLE				MEMORY	    Address
________________			___________
...				%hash ->|   a     | 1000
________________			___________
| %hash  | 1000 |			|   1     | 1001
________________			___________
| $ref   | 1004 |			|   b     | 1002
________________			___________
...					|   2     | 1003
________________			___________
				$ref  ->|   1000  | 1004
					___________

print "ref $ref";

HASH(000003e8)

print $ref->{b};

2

print $$ref{b};

2
[download]

I wont go into references to other things (subs, filehandles etc) but the concept is the same.

So what do we _do_ with references ? Well, they let us make more complicated data structures for one. Arrays are only allowed to hold scalar values - if we want an entry in an array to be another array, we cant do this

$array[1] = @another; # nope - number of entries in @another is assign
+ed to $array[1]
[download]

But we can do this

$array[1] = \@another;

print $array[1]->[0]; # the first entry in the referenced array
[download]

Hence we can make arrays of arrays(AoA's), hashes of hashes (HoH's) or AoHoAoHoHoA....

Secondly they let us be more efficient about using large arrays or hashes in data structures. For example, say @array had ten million elements - something like this would be inefficient


sub func {
   my (@arr) = @_;

   if (@arr[1] eq 'command') {
   ...
}

func(@ten_million);
[download]

This code copies @ten_million's contents to the function func - all ten million of them. A reference saves us all that overhead - just the location in memory goes to the subroutine.

sub func {
   my ($aref) =@_;

   if ($aref->[1] eq 'command') {
   ...
}

func(\@ten_million);
[download]

What happens if we try to manipulate the reference ?

$ref = \@array;

print "ref $ref\n";

ARRAY(000003e8)

$ref++;

print "ref $ref\n";

1001
[download]

Look at that ! $ref is not a reference anymore - the 'ARRAY' word has disappeared, now its just an ordinary number. So you can take a reference to something, but you cant manipulate it to refer to something else. Languages like C let you do this - its called pointer manipulation and its the cause of more core dumps and corruption than I can count. Its cool, but very very easy to get wrong...

So there is a conceptual guide to references - HTH !

And one last point - not one single thing I have described here is what _actually_ happens in Perl - I believe the concepts are correct, but the implementation is vastly more complicated.

...it is better to be approximately right than precisely wrong. - Warren Buffet

Janitored by Arunbear - added readmore tags, as per Monastery guidelines

Back to Meditations