Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

A C-like brain in a Perl-like world

by cyberscribe (Acolyte)
on Sep 26, 2001 at 23:12 UTC ( #114909=perlquestion: print w/ replies, xml ) Need Help??
cyberscribe has asked for the wisdom of the Perl Monks concerning the following question:

I am attempting to merge two arrays and not succeeding. I believe it has to do with how Perl references arrays and deals with functions ("subs") and how I do not understand either very well.

see the following:

#start code (such as it is) sub merge { my (@array1,@array2) = @_; $array1_size = @array1; $array2_size = @array2; for($i=0;$i<$array1_size;$i++) { if(!lookup(@array1[$i],$array2)) { push(@array2,($array1[$i])); } else { break; } } return $array2; } sub lookup { my($value,$array) = @_; $return = FALSE; $array_size = @array; for($i=0;$i<$array_size;$i++) { if($value eq @array[$i]) { $return = TRUE; } } return $return; } @array1 = split(/ /,"a b c d e"); @array2 = split(/ /,"c d e f"); @merged_array = merge(@array1,@array2); #end code

merged_array should now be {"a","b","c","d","e","f"}. Instead I get an infinite loop.

I've managed to squeak by coding perl scripts to do nslookups and DBI data access, enjoying how the language can make quick work of messy strings. But I obviously don't understand the "finer points" of the language. Any pointers (pun intended?) appreciated.


Comment on A C-like brain in a Perl-like world
Download Code
Replies are listed 'Best First'.
Re: A C-like brain in a Perl-like world
by tachyon (Chancellor) on Sep 26, 2001 at 23:23 UTC

    You do it using a hash like this - it is rather brief :-) but that's perl for you:

    sub merge { my %hash; $hash{$_}++ for @_; return keys %hash; } sub intersection { my %hash; $hash{$_}++ for @_; return grep { $hash{$_} > 1 } keys %hash } sub lookup { $value = shift; for (@_) { return 1 if $value eq $_; } return 0; } @ary1 = qw( a b c d e f g ); @ary2 = qw( e f g h i j k ); print merge ( @ary1, @ary2), "\n"; print intersection ( @ary1, @ary2 ), "\n"; for ( qw( j a p h ) ) { print lookup( $_, @ary2) ? "$_ Found\n" : "$_ Not found\n"; }

    There is a thing called the Perl FAQ which is nine documents covering all the common stuff you will want to do like merge arrays, find intersections.....

    By the way it will be a good idea to lose the TRUE FALSE habit. In perl everything is true except for:

    • 0 (including the string equvalents "0" which evaluates to zero)
    • '' - the null string
    • undef - an undefined value
    • () - an empty list/array

    This is really handy as it lets you do stuff like print @array if @array which will only print @array if it contains elements and is thus true or &some_func if $flag which will only call the sub if $flag is true. This also means that unless you have defined a sub sub FALSE { 0 } that when you think you return FALSE you don't.

    sub oops { return FALSE; } print "Oops 'FALSE' is true in Perl" if &oops;


    CheeseLord points out a error in my understanding. Thanks!




      In perl everything is true except for:

      0 (including the string equvalents "0" "0.0" etc which all evaluate to zero)

      Sorry, tachyon, but that's not quite right. "0.0" is not false in perl:

      % perl -le 'print "" ? "True" : "False"' False % perl -le 'print "0" ? "True" : "False"' False % perl -le 'print "0.0" ? "True" : "False"' True

      Now with new grammatical goodness! (Thanks, blakem. ;-)

      His Royal Cheeziness

        Apparently Perl is not as seemless in its conversions between strings and floating point numbers as it is in its conversions between strings and integers. Because the following works:

        % perl -le '$foo=0.0; print $foo ? "True" : "False";' False
        but the following does not:
        % perl -le '$foo="0.0"; print $foo ? "True" : "False";' True

        Be bloody, bold, and resolute; laugh to scorn
        The power of man...

        Hey good point! "0.0" is only zero if you force an eval on it one way or another. These are interesting:

        print "0.0 is == 0\n" if "0.0" == 0; $string_zero = "0.0"; print "True" if $string_zero; print "True" if eval $string_zero;




Re: A C-like brain in a Perl-like world
by runrig (Abbot) on Sep 27, 2001 at 00:13 UTC
    sub merge { my (@array1,@array2) = @_;
    This won't work. All arguments will go into @array1, and @array2 will be empty. You need to pass in array references if you want to preserve which argument is which (although the way your loop functions, it sort of doesn't matter). In fact, it looks like you are mixing up your array names with your scalar names in the merge function. Maybe you need to use strict and warnings??

    Besides, the subroutine can be done more simply as:

    sub merge { my %hash; @hash{@_} = (); sort keys %hash; } my @array1 = qw(a b c d e); my @array2 = qw(c d e f g); my @merged = merge(@array1, @array2); print "@merged\n";
      This is almost the approach I'd take; except to make it reference-safe, I'd put it like this:
      sub merge { my %hash; @hash{ @_ } = @_; @hash{ sort keys %hash }; }

      Update: last line was originally map $hash{ $_ }, sort keys %hash;

        Sorry for the reply to an old-node, but I was wondering what you meant by "reference-safe", and thought you might mean something like the following (which I thought might emit a "Modification of a read-only value" error if the $_ in the for loop were still referring to the keys of the hash), but it seems to work just fine without error (tested on 5.6 and 5.8):
        #!/usr/bin/perl use strict; use warnings; sub merge { my %hash; @hash{@_} = (); sort keys %hash; } my @array1 = qw(a b c d e); my @array2 = qw(c d e f g); for (merge(@array1, @array2)) { $_++; print "$_\n"; }
Re: A C-like brain in a Perl-like world
by ducky (Scribe) on Sep 26, 2001 at 23:49 UTC

    Of course TMTOWTDI (variation on tachyon's theme, really):

    sub unique_merge { my ( %hash ) = map {$_ => 1 } @_ ; return keys %hash } my @array1 = qw(a b c d e) ; my @array2 = qw(c d e f) ; my @merged_array = unique_merge(@array1, @array2) ;

    Hashes may only have one key per value, so the task becomes how to get make a hash out of an array (or two). Using map I turn the arrays into key/value pairs (the value is not so important) and then ask for the keys back.



      Using keys in this way causes stringification of values. If your arrays have objects or references in them, this will bite. Also note that that 2 for 1 map was slow until Perl 5.6.1. The following is therefore how I do that:
      sub unique_merge { my %seen; grep !$seen{$_}++, @_; }
      Side benefit? I don't lose the order of the incoming elements. :-) But, as with all hash approaches, my definition of "unique" is "stringifies the same". That is not always the appropriate definition to use.
Re: A C-like brain in a Perl-like world
by toma (Vicar) on Sep 27, 2001 at 11:09 UTC
    I agree that hashes are the best approach for merging keyed data. If there is some other reason to continue to use arrays, you can compare different approaches for an array lookup routine.

    You will find these lines quite helpful when you put them at the top of your program:

    use strict; use warnings; use diagnostics;
    The diagnostics provide paragraph-long explanations of many common perl coding errors. For the type of mistakes that I make, the diagnostics are correct about 80% of the time. I think this is an amazingly good percentage.

    As premchai21 said, you may need to update your version of perl to get this capability.

    It should work perfectly the first time! - toma

      Please explain "strict" to me. Perl seems anything but strict. Except when it comes to if statements, which MUST involve curly braces:


      unlike shell script or other, looser constructs that know the next line off an if is %eval_true%.

      I must believe there is a good reason for this that eludes me, as I must believe that Perl is about simplicity and elegance. Yet removing curly braces and getting away with it seems like a good, intuitive thing to do to a compiler/interpreter.


      p.s. I would like to propose a new tag for posts and cb ... 'pseudocode' - for formatting code that is really shorthand and differentiating it from stuff between legitimate 'code' tags, which should execute in a Perl environment without errors, or with only errors specifically described in the post. What I wrote above was 'pseudocode.

        In perl you can use if as a statement modifier... Therefore, this is perfectly valid code:
        #!/usr/bin/perl -wT use strict; for my $var (0..9) { print "$var is even\n" if $var%2 == 0; # using if as a statement mo +difier }
        For more on use strict, please see


        Forcing a BLOCK rather than an instruction after an if is because of a simple reason: the "dangling else". It is not apparent in which way if $condition if $more_condition do_something() else do_other() is to be disambiguated; many tradiditional languages arbitrarily choose to bind the else branch to the closest if, but that's not readily apparent from looking at the code - and if you want to bind the else of the first if, you still have to use a BLOCK around the second.

        So Perl forces you to use BLOCKs for clarity, but lets you append an if clause as a statement modifier for when you want brevity: do_something if $condition;.

        <CODE> has more functions than formatting. For pseudocode, you can use the regular <tt></tt> HTML tags.

Re: A C-like brain in a Perl-like world
by premchai21 (Curate) on Sep 27, 2001 at 06:24 UTC
    Another thing: you have
    } else { break; }
    in merge. I think you want last. break does not exist. Try using strict and either warnings (later perls) or the -w flag (earlier perls). I think the split is at 5.6.0. Correct me if I'm wrong about that.
one-liner to merge arrays (Re: A C-like brain in a Perl-like world)
by andye (Curate) on Sep 27, 2001 at 19:43 UTC
    my @merged = values %{{ map {$_ => $_ } @array1,@array2 }} ;
    The double-curlies are necessary - with single-curlies the interpreter whines and dies. If anyone can explain why, I'd be interested.


      The reason why is because your map is evaluating into a hash reference, not a hash - values take a hash for it's argument, not a hash reference. Now, if you use a tempory hash reference instead, the reason for the double curlies becomes much more apparent:
      my $hash = { map {$_ => $_ } @a,@b }; my @merged = values %{ $hash };
      Good question, by the way :)


      Lets look at it blown out:
      = values %{{ map {$_ => $_ } @a,@b }} ; = values # I want a hash and return a list %{ } # I turn a hashref into a hash { } # I turn a list into a hashref map { } @a,@b # I return a list from a list => , # We construct lists


      $you = new YOU;
      honk() if $you->love(perl)

        Thanks extremely and jeffa, it's all clear now.

        I should have realised, since the array equivalent would be @{[  ...  ]} , that the brackets were doing different things - but I'd managed to confuse myself.


Pointers for a C brain
by John M. Dlugosz (Monsignor) on Sep 28, 2001 at 02:27 UTC
    Others have pointed out that this specific problem can be solved in other ways. However, for general learning:

    1) use strict (or die!). This would tell you right off that $array2 is not defined, so it's not what you think.

    2) my (@array1,@array2) = @_; looks like a mistake, even though on further reading I think you know what it really does. Use that idiom for a parameter list, not for cute tricks with the semantics.

    3) for($i=0;$i<$array1_size;$i++) if you want to do this from C, think of a foreach construct instead.

    4) look up the difference between @an and $an.

    That's all the time I have... I'm off to Cozumel for a long weekend. Keep plugging away!


Re: A C-like brain in a Perl-like world
by dlc (Acolyte) on Sep 27, 2001 at 21:49 UTC

    Boy, that's a lot of work. Try this:

    my %uniq; @merged = grep { ++$uniq{$_} == 1 } (@array1, @array2);


      What is the incrementation doing there? Is it moving
      through the keys of %uniq to see if $_ is a match?
        The key is that $uniq{$_} does not exist the first time a specific $_ is seen. Because it is preincremented, its value becomes 1. Then it is tested against == 1, and well, that is true. The next time that $_ is seen, $uniq{$_} already has a value of 1 or higher, so after the preincrement it becomes 2 or higher, then fails the == 1 test and is therefor dropped by grep. Thus, only the first appearance of a certain $_ makes it into the final list.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://114909]
Approved by root
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (6)
As of 2016-05-02 06:17 GMT
Find Nodes?
    Voting Booth?