Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
Some people have a hard time understanding references in Perl - I know, because I did. However, over time, and because I have ended up using references a lot in the code I write, I feel a lot more comfortable with them. Here's how I explain references to new perl users.

Picture how your code may be laid out in memory. A statement like

$count = 10;
might get laid out like follows
SYMBOL TABLE				MEMORY	    Address
________________			___________
...				$count->|   10    | 1000
________________			___________
| $count | 1000 |			|         | 1001
________________			___________
...					|         | 1002
________________			___________
Here Perl has indicated that the contents of the variable $count are stored at memory location 1000. It stores that information in the symbol table, and in memory, at the location 1000, it writes the value 10. When Perl wants to access the value associated with $count, it looks in the symbol table first, then it looks at the memory location indicated.

Now suppose we add some code like this ...

$copy = $count;
The resultant memory layout could be something like this
SYMBOL TABLE				MEMORY	    Address
________________			___________
...				$count->|   10    | 1000
________________			___________
| $count | 1000 |		$copy ->|   10    | 1001
________________			___________
| $copy  | 1001 |			|         | 1002
________________			___________
...					|         | 1003
________________			___________

That is, $copy is a new entry in the symbol table, and the symbol table entry indicates that the place in memory to hold $copy's value is different to the location of $count. Once the symbol table entry is created, Perl looks at the value in memory for $count (by looking in the symbol table for where to look in memory) and places the same value in $copy's memory location. So the values are the same, but the locations are different.

But neither of these are references. Lets see how some code that uses references might end up getting laid out in memory.

$ref = \$count; # initialise $ref's value to be a reference to $count
The resultant memory layout could be something like this
SYMBOL TABLE				MEMORY	    Address
________________			___________
...				$count->|   10    | 1000
________________			___________
| $count | 1000 |		$copy ->|   10    | 1001
________________			___________
| $copy  | 1001 |		$ref  ->|  1000   | 1002
________________			___________
| $ref   | 1002 |			|         | 1003
________________			___________
...					|         | 1004
________________			___________
So the symbol table entry for $ref looks pretty much the same - it indicates where $ref's values will be stored in memory. And what is that value ? Well, because we said initialise it to be a reference to $count, it stores $count's memory address. If we printed this out we'd see something like this
print "ref $ref\n"; ref SCALAR(0x000003e8)
This says that $ref is a reference to a scalar stored at 3e8 (which 1000 written in hex).

So what does that get us ? Well, $ref is a reference a scalar stored at 1000, so to get the value stored at that location we need to 'dereference' $ref - and we do this by the following code

print "ref value $$ref\n"; ref value 10
Now that may seem a hard way to access $count's value. After all we can just use $count instead of $$ref. But we can take references to other things - arrays, hashes, subroutines etc. Lets look at how an array reference could be done.

Lets start with our symbol table and memory map layouts.

@array = (10, 20, 30);
SYMBOL TABLE				MEMORY	    Address
________________			___________
...				@array->|   10    | 1000
________________			___________
| @array | 1000 |			|   20    | 1001
________________			___________
...					|   30    | 1002
________________			___________
					|         | 1003
					___________

So here we see that the symbol table says that @array is stored at memory location 1000, and the memory map shows how the initial values might be laid out - in this case memory location 1000 is where the first element goes, 1001 is the second element etc.

Later we have some code like this

@copy = @array;
SYMBOL TABLE				MEMORY	    Address
________________			___________
...				@array->|   10    | 1000
________________			___________
| @array | 1000 |			|   20    | 1001
________________			___________
| @copy  | 1003 |			|   30    | 1002
________________			___________
...				@copy ->|   10    | 1003
________________			___________
					|   20    | 1004
					___________
					|   30    | 1005
					___________
					|         | 1006
					___________

This all seems pretty much what we would expect.

What about a reference ?

$ref = \@array;
SYMBOL TABLE				MEMORY	    Address
________________			___________
...				@array->|   10    | 1000
________________			___________
| @array | 1000 |			|   20    | 1001
________________			___________
| @copy  | 1003 |			|   30    | 1002
________________			___________
| $ref   | 1006 |		@copy ->|   10    | 1003
________________			___________
...					|   20    | 1004
________________			___________
					|   30    | 1005
					___________
				$ref  ->|  1000   | 1006
					___________
					|         | 1007
					___________

So $ref is just like before - it hold where in memory @array is stored. And we can see this in a print statement
print "ref $ref\n"; ref ARRAY(0x000003e8)
This time we see that $ref is a reference to an array - $ref knows what it is referring to. And we can dereference it too - the most common notations for this are
print $ref->[1]; 20
or
print $$ref[1]; 20
I like the first notation - the arrow seems to read "go to the referenced value at index 1". YMMV.

What about hashes ?

%hash = (a => 1, b => 2);
SYMBOL TABLE				MEMORY	    Address
________________			___________
...				%hash ->|   a     | 1000
________________			___________
| %hash  | 1000 |			|   1     | 1001
________________			___________
...					|   b     | 1002
________________			___________
					|   2     | 1003
					___________
$ref = \%hash;
SYMBOL TABLE				MEMORY	    Address
________________			___________
...				%hash ->|   a     | 1000
________________			___________
| %hash  | 1000 |			|   1     | 1001
________________			___________
| $ref   | 1004 |			|   b     | 1002
________________			___________
...					|   2     | 1003
________________			___________
				$ref  ->|   1000  | 1004
					___________

print "ref $ref"; HASH(000003e8) print $ref->{b}; 2 print $$ref{b}; 2
I wont go into references to other things (subs, filehandles etc) but the concept is the same.

So what do we _do_ with references ? Well, they let us make more complicated data structures for one. Arrays are only allowed to hold scalar values - if we want an entry in an array to be another array, we cant do this

$array[1] = @another; # nope - number of entries in @another is assign +ed to $array[1]
But we can do this
$array[1] = \@another; print $array[1]->[0]; # the first entry in the referenced array
Hence we can make arrays of arrays(AoA's), hashes of hashes (HoH's) or AoHoAoHoHoA....

Secondly they let us be more efficient about using large arrays or hashes in data structures. For example, say @array had ten million elements - something like this would be inefficient

sub func { my (@arr) = @_; if (@arr[1] eq 'command') { ... } func(@ten_million);
This code copies @ten_million's contents to the function func - all ten million of them. A reference saves us all that overhead - just the location in memory goes to the subroutine.
sub func { my ($aref) =@_; if ($aref->[1] eq 'command') { ... } func(\@ten_million);

What happens if we try to manipulate the reference ?

$ref = \@array; print "ref $ref\n"; ARRAY(000003e8) $ref++; print "ref $ref\n"; 1001
Look at that ! $ref is not a reference anymore - the 'ARRAY' word has disappeared, now its just an ordinary number. So you can take a reference to something, but you cant manipulate it to refer to something else. Languages like C let you do this - its called pointer manipulation and its the cause of more core dumps and corruption than I can count. Its cool, but very very easy to get wrong...

So there is a conceptual guide to references - HTH !

And one last point - not one single thing I have described here is what _actually_ happens in Perl - I believe the concepts are correct, but the implementation is vastly more complicated.

...it is better to be approximately right than precisely wrong. - Warren Buffet

Janitored by Arunbear - added readmore tags, as per Monastery guidelines


In reply to The Concept of References by leriksen

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others avoiding work at the Monastery: (12)
    As of 2014-07-30 18:57 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      My favorite superfluous repetitious redundant duplicative phrase is:









      Results (239 votes), past polls