http://www.perlmonks.org?node_id=11106277

ozboomer has asked for the wisdom of the Perl Monks concerning the following question:

Folks:

I'm doing some (what I think is) more unusual things with some arrays.. and I'm looking to see if there's a more 'elegant' method than what I'm currently using...

I have a 'reference' array that is processed, producing a 'subset' array. Generally, the elements of the 'subset' array are not in the same order as they appear in the 'reference' array... but that's the order I need to do some more processing.

So, I've cobbled together some code that does the job, viz:-

use Data::Dumper; @array = ( "APPLE", "ORANGE", "PEACH", # Use this for the ORDE +R "GRAPE", "BANANA", "PINEAPPLE" ); @subset = ( "PINEAPPLE", "GRAPE", "APPLE" ); # A required subset but + NOT # in the required order @results = (); # The subset with its e +lements in the # same order as the 're +ference' array # ---- Method 1: Using arrays directly @tmp = (); foreach my $check (@subset) { my ($index) = grep { $array[$_] eq $check } 0..$#array; push(@tmp, $index); } @index_list = sort {$a <=> $b} @tmp; # Sort the index values numerica +lly @results = @array[@index_list]; # Slice the index items out of ' +@array' printf("Method 1:\n"); print Dumper(@results); printf("\n"); # ---- Method 2: Using a sub Reset_Order(\@subset, \@array, \@results); printf("Method 2:\n"); print Dumper(@results); printf("\n"); # ========== end mainline ========== # ------------------------------------- # Reset_Order - ... # Uses Globals: # ------------------------------------- sub Reset_Order { my ($small_ref, $large_ref, $res_ref) = @_; my (@tmp); @tmp = (); foreach my $check (@$small_ref) { my ($index) = grep { $$large_ref[$_] eq $check } 0..$#$large_ref +; push(@tmp, $index); } my @index_list = sort {$a <=> $b} @tmp; # Sort the index values nu +merically @$res_ref = @$large_ref[@index_list]; # Slice the index items ou +t of the large array return(1); } # end Reset_Order

...but maybe there's a better way to do it, say, using grep in conjunction with map but I can't vizualize how that might be done -- I'm still not super flash with the 'more clever' ways of using these functions(!)

I would search for something to help me out... but I don't even know how to describe what I'm trying to do(!)

I'd appreciate any suggestions on where to go next...

Replies are listed 'Best First'.
Re: How to Order an Array's Elements to Match Another Array's Element Order
by dave_the_m (Monsignor) on Sep 17, 2019 at 10:40 UTC
    Create a hash to map name to index and use that to sort.
    my %index; @index{@array} = 0..$#array; my @result = sort { $index{$a} <=> $index{$b} } @subset;

    Dave.

Re: How to Order an Array's Elements to Match Another Array's Element Order
by Eily (Monsignor) on Sep 17, 2019 at 08:22 UTC

    It looks like what you are trying to do in the end is select the elements from @array that are also in @subset, by preserving the original order. The element that let's you check that something "is in" in perl is a hash. A hash does not preserve order so that's another clue that you should use it on the data where order is not important. You can build quickly a hash from an array like this:

    my %in_subset = map { $_ => 1 } @subset; # Set the value 1 for each ke +y in @subset
    Then filtering @array is very simple:
    my @filtered = grep { exists $in_subset{$_} } @array; # grep for only +the elements of @array which are in subset my @same = grep { $in_subset{$_} } @array; # Technically also works be +cause trying to access a non existing key will return undef, which is + false #Edited (missing } on the second line) thanks to AnomalousMonk

    FYI, if what you actually needed was the indexes of the elements of @array that are in subset, you can again use a hash:

    my %index_map = map { $array[$_] => $_ } 0..$#array; # Associate each + element with its index (only works for unique values though) my @matching_indexes = map { $index_map{$_} } @subset; # Get the index + for each value in @subset

    And last note, actually your code isn't bad, or inelegant at all, mostly because it is correctly indented, and there is some thought put into variable names. But rather than just rely on _ref suffixes for references, I strongly encourage you to add use strict; and use warnings; at the top of your program, to let perl warn you about the more subtle errors.

      Just a data point, but an alternative way to generate the hash is to use slice assignment. It is faster than using map if %index_map is to be generated many times, e.g. across data sets.

      my %index_map; @index{@array} = (0..$#array);

        Good point :). Although I like it more because it short than because it is fast (I basically treat speed considerations as insignificant until proven otherwise :) )

Re: How to Order an Array's Elements to Match Another Array's Element Order
by rsFalse (Chaplain) on Sep 17, 2019 at 08:36 UTC
    Hello, ozboomer,

    Your method (same method, but with or without subroutine) is clear and good. But if you have large data, it is too slow, because time complexity of the method is N*M. For every element from subset (M) you check all elements from list (N). For better time complexity you can use hashes (search: perldata).

    #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my @array = ( "APPLE", "ORANGE", "PEACH", # Use this for the O +RDER "GRAPE", "BANANA", "PINEAPPLE" ); my @subset = ( "PINEAPPLE", "GRAPE", "ORANGE", "APPLE" ); # A requir +ed subset but NOT # in the required order my @results = (); # The subset with it +s elements in the # same order as the 're +ference' array # ---- Method 2: Using hash my %indexes; my $index = 0; map { $indexes{ $_ } = $index; $index ++; } @array; printf " Index of element '$_' is: %d\n", $indexes{ $_ } for @subset; my @tmp = (); foreach my $check ( @subset ){ my( $index ) = $indexes{ $check }; push( @tmp, $index ); } my @index_list = sort {$a <=> $b} @tmp; # Sort the index values numer +ically @results = @array[@index_list]; # Slice the index items out of ' +@array' print "Method 2 (uses hash):\n"; print Dumper(@results); print "\n";
    OUTPUT:
    Index of element 'PINEAPPLE' is: 5 Index of element 'GRAPE' is: 3 Index of element 'ORANGE' is: 1 Index of element 'APPLE' is: 0 Method 2 (uses hash): $VAR1 = 'APPLE'; $VAR2 = 'ORANGE'; $VAR3 = 'GRAPE'; $VAR4 = 'PINEAPPLE';
Re: How to Order an Array's Elements to Match Another Array's Element Order
by Athanasius (Archbishop) on Sep 17, 2019 at 08:06 UTC

    Hello ozboomer,

    Perl’s built-in sort function accepts a custom subroutine to do the comparisons. Here is one using the first_index function from List::MoreUtils:

    use strict; use warnings; use Data::Dump; use List::MoreUtils qw( first_index ); my @fruit = qw( APPLE ORANGE PEACH GRAPE BANANA PINEAPPLE ); my @subset = qw( PINEAPPLE GRAPE APPLE ); #my @subset = qw( PINEAPPLE GRAPE PEACH BANANA APPLE ORANGE ); my @results = sort fruit_order @subset; dd \@results; sub fruit_order { (first_index { $a eq $_ } @fruit) <=> (first_index { $b eq $_ } @fruit); }

    Output:

    18:00 >perl 2017_SoPW.pl ["APPLE", "GRAPE", "PINEAPPLE"] 18:04 >

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: How to Order an Array's Elements to Match Another Array's Element Order
by tybalt89 (Monsignor) on Sep 17, 2019 at 11:02 UTC
    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11106277 use warnings; my @array = ( "APPLE", "ORANGE", "PEACH", # Use this for the O +RDER "GRAPE", "BANANA", "PINEAPPLE" ); my @subset = ( "PINEAPPLE", "GRAPE", "APPLE" ); # A required subset +but NOT # in the required order my %order; @order{ @array } = 1 .. @array; my @results = sort { $order{$a} <=> $order{$b} } @subset; print "results: @results\n";

    Outputs:

    results: APPLE GRAPE PINEAPPLE
Re: How to Order an Array's Elements to Match Another Array's Element Order
by GrandFather (Saint) on Sep 17, 2019 at 11:08 UTC

    Why? What does your real data look like?

    Our knowing why and what the data looks like is important because an optimum solution may depend on things like how many items there are in the respective arrays and how big those items are. Or it may be that you've landed on a solution that can't be made to work well and knowing why you ended up here may make finding a better solution possible.

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: How to Order an Array's Elements to Match Another Array's Element Order
by tybalt89 (Monsignor) on Sep 18, 2019 at 02:45 UTC

    Here's something completely different - it doesn't use a sort !

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11106277 use warnings; my @array = ( "APPLE", "ORANGE", "PEACH", # Use this for the O +RDER "GRAPE", "BANANA", "PINEAPPLE" ); my @subset = ( "PINEAPPLE", "GRAPE", "APPLE" ); # A required subset +but NOT # in the required order my %order; @order{ @array } = 0 .. $#array; my @results; @results[ @order{ @subset } ] = @subset; @results = grep defined, @results; print "results: @results\n";

    Just kidding, this is called a distribution sort.

      >

      my @array = ( "APPLE", "ORANGE", "PEACH", # Use this for the +ORDER "GRAPE", "BANANA", "PINEAPPLE" );

      shouldn't this be completely ordered ?

      BANANA < GRAPE < ORANGE

      UPDATE

      never mind, I should've read the OP first! :-)

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Re: How to Order an Array's Elements to Match Another Array's Element Order
by ozboomer (Friar) on Sep 17, 2019 at 11:43 UTC

    Some very helpful thoughts there, thanks. I'll review the discussion a bit more yet...

    The current application is only dealing with small arrays (maybe selecting 10-20 items at a time from an array of a 100-200 elements)... but I thought it was an interesting question in terms of determining a technique to use on larger arrays. It's akin to extracting a slice of an array BUT in this case, all the required elements are not contiguous, hence manipulating indexes would seem the zippier approach.

    As it happens, the arrays are comprised of hash keys already... and the use of Tie::IxHash to try and preserve the insertion order into multi-key hashes isn't working the way I expect. For example, doing something like:-

    push(@{$hash{$project}{$vitem}}, @list);

    ...and then 'traversing the hash' (say, using Data::Dumper) reveals the {$project} order is maintained but the order of {$vitem}s within each {$project} is not; hence, I need to work out another way.

    Better I build-up some example data and include my code/input/output here than to try and explain further for now, methinks...

    Again, thanks for the suggestions... More to follow later...

    Edit: Replaced Tie::Hash with Tie:IxHash

      If you want to preserve insertion order then have a look at Hash::Ordered.

      It does not allow access to the position of items, though, so might not do all you need in this case.

Re: How to Order an Array's Elements to Match Another Array's Element Order
by ozboomer (Friar) on Sep 18, 2019 at 13:42 UTC

    Well, after a bit of study, I think I'm pursuing hippo's suggestion, Fanx! (see Re^2: How to Order an Array's Elements to Match Another Array's Element Order above)... mainly because it was clear to me and directly manipulates the arrays and hashes without any sorting (which is not really an issue with maybe 200 records... but scaling may ultimately be an issue for the approach).... but sorting indexes isn't too horrible, I suppose...

    I keep forgetting about the likes of List::MoreUtils, etc, so it was a helpful reminder to always look in the core modules (and others in CPAN).

    I also appreciate everyone's efforts at producing some slick suggestions, some of which are simply too clever for 'me of little brain' to understand(!)

    Back to coding I go... and Thanks! again to everyone.

Re: How to Order an Array's Elements to Match Another Array's Element Order
by ozboomer (Friar) on Sep 17, 2019 at 12:40 UTC

    Ok... Something to contemplate...

    Firstly, some example program code:-

    use warnings; use strict; use Data::Dumper; use Tie::IxHash; my %hash; tie %hash, "Tie::IxHash"; %hash = (); while (my $buf = <DATA>) { chomp($buf); my ($vitem, $project_info) = split(/,/, $buf); my ($project_name, $segment_info) = split(/\|/, my $project_info); my (@segments) = split(/;/, $segment_info); push(@{$hash{$project_name}{$vitem}}, @segments); } print Dumper(%hash); exit(0); # ----------- __DATA__ J_071117,BM:3|12.0-25.2;32.9-88.0 J_070424,BM:3|625-920;1017.905-1178 J_021212,BB:1|123-166;409-455 070526,SWT:1|53.160-59.320;77.720-86.120;370.800-416.800 070609,SWT:1|713.760-1159.200 070616a,SWT:1|0.0-652.0 070616b,SWT:1|401.40-461.800;483.160-490.640;503.400-595.200;602.440-6 +95.400;699.200-882.400 J_071019a,SWT:1|372.925-910.385;927.620-1038.830 J_071019b,SWT:1|268.15-808.330 J_071025,SWT:1|936.215-1659.635 071123_F,SWT:1|45.4550-81.665

    ...and the output it produces:-

    $VAR1 = 'BM:3'; $VAR2 = { 'J_070424' => [ '625-920', '1017.905-1178' ], 'J_071117' => [ '12.0-25.2', '32.9-88.0' ] }; $VAR3 = 'BB:1'; $VAR4 = { 'J_021212' => [ '123-166', '409-455' ] }; $VAR5 = 'SWT:1'; $VAR6 = { '070526' => [ '53.160-59.320', '77.720-86.120', '370.800-416.800' ], '071123_F' => [ '45.4550-81.665' ], 'J_071019a' => [ '372.925-910.385', '927.620-1038.830' ], 'J_071025' => [ '936.215-1659.635' ], '070609' => [ '713.760-1159.200' ], '070616b' => [ '401.40-461.800', '483.160-490.640', '503.400-595.200', '602.440-695.400', '699.200-882.400' ], '070616a' => [ '0.0-652.0' ], 'J_071019b' => [ '268.15-808.330' ] };

    As (I) expected/understood, the 'first-level' key is maintained in insertion order. However, subsequent (hash) key items are NOT in insertion order (which I did NOT expect/understand); array items ARE in insertion order.

    Ultimately, I want to be able to refer to the hash in a form like:-

    @{$hash{$project}{$vitem}}

    ...realizing that this can be problematic, given that I'm using a couple of different Perl versions and the "keys on reference is experimental at..." issue is (not) present in some cases... but one thing at a time...(!)

    Effectively, do something like:-

    for each project output "Project: ", project for each vitem in the current project output " VItem: ", vitem for each vitem segment output " Segment: ", segment endfor endfor endfor

    ...but if you looked at the output, the "VItems" would be in the original order they were shown in the __DATA__ block (within in each project).

    Oh.. and something I should have explained earlier... Running all this with Perl v5.20.2, build 2001, under 32-bit Win8.1 OR Perl v5.18.1, under Linux Kernel 3.14.55 (Puppy Linux 32-bit v6.3.2).

    Hopefully, this helps explain the context some...

      With arrays for ordering instead of IxHashes:

      #!/usr/bin/perl use warnings; use strict; my %hash; my @projects; my %vitems; while (my $buf = <DATA>) { chomp($buf); my ($vitem, $project_info) = split(/,/, $buf); my ($project_name, $segment_info) = split(/\|/, $project_info); my (@segments) = split(/;/, $segment_info); push @projects, $project_name unless exists $hash{$project_name}; push @{$vitems{$project_name}}, $vitem; push(@{$hash{$project_name}{$vitem}}, @segments); } for my $proj (@projects) { print "Project: $proj\n"; for my $vi (@{$vitems{$proj}}) { print " VItem: $vi\n"; for my $seg (@{$hash{$proj}{$vi}}) { print " Segment: $seg\n"; } } } exit(0); # ----------- __DATA__ J_071117,BM:3|12.0-25.2;32.9-88.0 J_070424,BM:3|625-920;1017.905-1178 J_021212,BB:1|123-166;409-455 070526,SWT:1|53.160-59.320;77.720-86.120;370.800-416.800 070609,SWT:1|713.760-1159.200 070616a,SWT:1|0.0-652.0 070616b,SWT:1|401.40-461.800;483.160-490.640;503.400-595.200;602.440-6 +95.400;699.200-882.400 J_071019a,SWT:1|372.925-910.385;927.620-1038.830 J_071019b,SWT:1|268.15-808.330 J_071025,SWT:1|936.215-1659.635 071123_F,SWT:1|45.4550-81.665

      You could keep everything in the same hash :)

      #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11106277 use warnings; my $order = '*order*'; my %hash; while( <DATA> ) { my ($vitem, $project, $segments) = /(\w+),(\w+:\d+)\|(.*)/; exists $hash{$project} or push @{$hash{$order}}, $project; exists $hash{$project}{$vitem} or push @{$hash{$project}{$order}}, $ +vitem; push @{$hash{$project}{$vitem}}, split /;/, $segments; } for my $project ( @{ $hash{$order} } ) { print "Project: $project\n"; for my $vitem ( @{ $hash{$project}{$order} } ) { print " Vitem: $vitem\n"; for my $segment ( @{ $hash{$project}{$vitem} } ) { print " Segment: $segment\n"; } } } __DATA__ J_071117,BM:3|12.0-25.2;32.9-88.0 J_070424,BM:3|625-920;1017.905-1178 J_021212,BB:1|123-166;409-455 070526,SWT:1|53.160-59.320;77.720-86.120;370.800-416.800 070609,SWT:1|713.760-1159.200 070616a,SWT:1|0.0-652.0 070616b,SWT:1|401.40-461.800;483.160-490.640;503.400-595.200;602.440-6 +95.400;699.200-882.400 J_071019a,SWT:1|372.925-910.385;927.620-1038.830 J_071019b,SWT:1|268.15-808.330 J_071025,SWT:1|936.215-1659.635 071123_F,SWT:1|45.4550-81.665
Re: How to Order an Array's Elements to Match Another Array's Element Order
by Anonymous Monk on Sep 19, 2019 at 15:07 UTC
    It's just "Tim Toady the Time Waster." Get to one way that's reasonably clear and that you can verify works properly ... then, stop. If and only if subsequent profiling reveals the logic to be a "hot spot," revisit it again. We have long since passed the time when we actually needed to "bum code" to squeeze out a few more precious milliseconds. If either one of your OP points fit that criteria, select one and move along.
      Shush. The adults are talking.