Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Changing array by changing $_?

by johnvandam (Acolyte)
on Oct 13, 2008 at 12:02 UTC ( [id://716791]=perlquestion: print w/replies, xml ) Need Help??

johnvandam has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I came across an unexpected behavior in perl and I would very much like your input on what is happening in my code.

I have an array of which I want to check every element against the keys of a hash containing reference data. Because I want to be able to exclude options (genomes in this example) I want to put a '-' in front of genomes I want to exclude later on. The code below is meant to check if I have provided correct genome names and did not misspell them.

In the first peace of code I am manipulating $_ in a foreach loop. But somehow this ends up manipulating my array @species. The last element which initially contained '-CIOINT' now contains 'CIOINT', the - is removed.

In the code below that I'm doing the same but now I am not manipulating $_. The array is not changed.

I am not working with references so I am very curious as to why the array has changed in the first instance. I'm running perl 5.8.8 on a linux machine. Could someone clarify what happens in the first peace of code? I would be very grateful because it's racking my brain!

John

Below is the isolated code with the faulty and working code in it.

#!/usr/bin/perl use warnings; use strict; my %refspecies = ( HOMSAP => 'Homo sapiens', MUSMUS => 'Mus musculus', CIOINT => 'Ciona intestinalis' ); my @species =('HOMSAP','MUSMUS','-CIOINT'); my @duplicate = @species; # Incorrect working code print "Before processing: @species\n"; foreach (@species) { /^-?(.+)$/; $_ = $1; unless (exists $refspecies{$_}) { # Species not in list die "$_ is not in the species table\n"; } } print "After processing: @species\n"; #Correct working code #Duplicate run with minor code adjustment print "Before processing: @duplicate\n"; foreach (@duplicate) { /^-?(.+)$/; #$_ = $1; Removed this line and changed $_ below in $1. This shoul +dn't matter because I am manipulating $_ in the code above and NOT @s +pecies, but somehow it does! unless (exists $refspecies{$1}) { # Species not in list die "$1 is not in the species table\n"; } } print "After processing: @duplicate\n"; #Output is: #Before processing: HOMSAP MUSMUS -CIOINT #After processing: HOMSAP MUSMUS CIOINT #Before processing: HOMSAP MUSMUS -CIOINT #After processing: HOMSAP MUSMUS -CIOINT # Look at @species element 3. Why is @species changed?

Replies are listed 'Best First'.
Re: Changing array by changing $_?
by JavaFan (Canon) on Oct 13, 2008 at 12:06 UTC
    If you are looping over an array, the array element is aliased to the loop variable ($_ in your case). The loop variable is just another name for the array element, it isn't a copy.

    This is a feature. It's designed this way.

    If you don't want to modify the original element, make a copy first. For instance:

    foreach (@array) { local $_ = $_; # Change 'local' to 'my' in 5.10. ... }

      Thank you! Now it makes perfect sense. I am actually in the habit of using

      foreach my $variable (@array) { doingStuffWith($variable); }

      I guess that also prevents the elements from being changed. Thanks again for showing me the perl 'magic'.

      20081016 Janitored by Corion: Closed code tag, as per Writeup Formatting Tips

        Nope.

        foreach my $foo (@bar) { $foo .= 'ack' ; ... } ;
        will affect the contents of the array just as surely as:
        foreach (@bar) { $_ .= 'ack' ; ... } ;
        To avoid this you need to:
        foreach (@bar) { my $s = $_ . 'ack' ; ... } ;
        or similar.

        It's genuine Perl magic. Useful once you know it. A chunk out of your backside when you don't.

        That doesn't help the fact that you are modifying a global variable ($_) without localizing it first. Why use a global variable at all!
        foreach (@species) { ( my $species = $_ ) =~ s/^-//; unless (exists $refspecies{$species}) { die "$species is not in the species table\n"; } }

        Are you aware that:

        use strict; use warnings; my @array = 1 .. 10; doingStuffWith ($_) for @array; print "@array"; sub doingStuffWith { $_[0] +=10; }

        Prints:

        11 12 13 14 15 16 17 18 19 20

        It's aliases all the way down.


        Perl reduces RSI - it saves typing
        No, it won't.

        Loop variables are always aliases for the elements of the array.
        So it doesn't matter whether you give them a name or not.

Re: Changing array by changing $_?
by wol (Hermit) on Oct 13, 2008 at 13:39 UTC
    JavaFan and others have already given a good explanation of what's going on, so I shalln't repeat them.

    Instead I was going to advise that you can avoid the aliasing effect by using map and throwing away the output of the mapping, like this:

    my @species =('HOMSAP','MUSMUS','-CIOINT'); map { /^-?(.+)$/; $_ = $1; unless (exists $refspecies{$_}) { # Species not in list die "$_ is not in the species table\n"; } } @species;
    Grep could be used in the same way.

    Fortunately, I tried this before posting. map fails in exactly the same way as foreach!

    This means that some pretty innocuous code could be flawed. I think I might have used this form a few times in the past:

    @in = qw{a b c -d -e}; @out = map { s/-//; $_ } @in;
    I've been corrupting @in without realising!?! I should have been using this:
    @in = qw{a b c -d -e}; map { s/-//; $_ } @out = @in;
    Shame that looks so odd )-:

    --
    .sig : File not found.

      The reason map (and for/foreach and grep) make an alias instead of a copy is speed. A lot code, probably a majority of the code, doesn't modify $_ inside a map or grep; slightly less code will not modify $_ (or the loop variable) inside a for/foreach.

      If don't plan to modify $_, having an alias instead of a copy means the data doesn't have to be copied, saving memory and speed. This benefits everyone; the only case where extra code is needed is where one modifies $_ without the intention to modify the original array. That code has to do the copy explicitely.

      IMO, the right choice has been made.

        I would also like to add that this feature, in addition to speed, is a godsend in terms of saving (memory) space. Why make a copy of a variable when we only need to read it? Some other languages do make copies implicitly.

        When dealing with large datasets, the programmer needs to be aware of this behavior (of any programming language, for that matter) to ensure scalability.

        IMHO this is something that could usefully be rendered safer by strict, or similar. When it's useful to be able to modify $_ (or named alias) a little extra 'decoration' would not be material, and would highlight the use of this feature. The rest of the time it would reduce follicular wear and tear !

      my @out = map { my $s = $_; $s =~ s/-//; $s } @in;
      my @out = @in; s/-// for @out;
      use List::MoreUtils qw( apply ); my @out = apply { s/-// } @in;

      I'd like to see a safe version of foreach, in analogous relationship with List::MoreUtils::apply and map.

      my @foo = 1..10; applyfor (@foo) { $_ *= 2; # Does not change @foo; print "$_\n"; } foreach ( @foo ) { $_ *= 2; # Changes @foo print "$_\n"; }

      applyfor() is a bad name, but I'm not sure what I'd call it. Maybe 'foreachcopy' would be good.


      TGI says moo

Re: Changing array by changing $_?
by ig (Vicar) on Oct 13, 2008 at 20:18 UTC
    There are several features of for/foreach loops that are not obvious. If the aliasing feature is new to you, you might review the sections on for and foreach in the perlsyn man page. Familiarity may help you avoid other troublesome surprises.
Re: Changing array by changing $_?
by TGI (Parson) on Oct 14, 2008 at 18:36 UTC

    As ikegami and wol hinted at, apply from List::MoreUtils is a good tool to use here.

    use List::MoreUtils qw(apply ); my @bad_species = grep { !exists $refspecies{$_} # find undefined names } apply { s/^-//; # normalize species names } @species; if( @bad_species ) { my $bad = join ', ', @bad_species; die "These values are not in the species table: $bad\n"; }

    List::MoreUtils has many simple little functions that can make your code much easier to read and maintain.


    TGI says moo

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://716791]
Approved by wfsp
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2024-04-26 04:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found