Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Sorting Strings By Vowel Sequence

by NewToPerl777 (Novice)
on Nov 17, 2014 at 21:45 UTC ( #1107485=perlquestion: print w/replies, xml ) Need Help??

NewToPerl777 has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I am trying to sort strings taken in from a text file by the string's vowel sequence.

For example: Let vow1 be the substring of vowels appearing in str1, and vow2 the corresponding substring of str2. If the substring vow1 appears before vow2 in the ascending ASCII order,str1 appears before str2 in the output.

Here is what I have so far:

#!/usr/bin/perl -w use strict; my %hash; my @argv; my $file = $ARGV[0]; open (IN, $file) or die "Could not open file: $!"; while (my $line = <IN>) { my $vowels = my $w; $vowels =~ s/[^aeiou]//g; $hash {$vowels}{$w} = 1; } close(IN); foreach my $key (sort %hash) { foreach my $word (sort {$hash {$key}}) { print "$word\n"; } }

Example input would be: fanfare, apparate, panacea, parmesan, albatross, albacore, false, vermeil, candelabra, beans

The output should be: apparate fanfare panacea albatross albacore false parmesan candelabra beans vermeil

I am beyond stuck and any help would be appreciated.

Replies are listed 'Best First'.
Re: Sorting Strings By Vowel Sequence
by choroba (Archbishop) on Nov 17, 2014 at 21:52 UTC
    I used the Schwartzian Transform:
    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; my @words = qw( fanfare apparate panacea parmesan albatross albacore false vermeil candelabra beans ); say for map $_->[0], sort { $a->[1] cmp $b->[1] } map [ $_, join q(), /[aeiou]/g ], @words;
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      UPDATED: Second edge case added.

      I am curious how this deals with words with vowel sequences being the same? For example, running the above script with the following words:
      my @words = qw( xaxexix babebib xaxexi babebibb );
      Produces the following output:
      xaxexix babebib xaxexi babebibb
      Since the vowel positions for xaxexix and babebib are the same, I would expect a secondary alpha-sort to occur here. Another potential issue is how the algorithm deals with words where the vowel sequences are in the same order but not in the same positions. For example:
      my @words = qw( babebib baebbib );
      Produces the following output:
      babebib baebbib
      Even though babebib is alphabetically before baebbib, the e occurs one position sooner in baebbib, which should put it before babebib.

      Maybe I am reading too much into the initial question. Having done Computer Programming Contests a number of times before, these are the kinds of things that the judges are looking for to see if the teams caught all the possibilities.

      Zucan

        Well it wasn't part of the original problem statement, so the solution allows sort to pick any order it chooses for elements with the same vowel string. You can tack on the original word to the end of the vowels string to sort by consonants when vowel strings match.

        Here is that idea, but using GRT Sort instead of ST:

        use 5.014; say join ', ', map {s/.*\0//r} sort map {tr/aeiou//cdr."\0$_"} qw( xexix babebib xaxexi babebibb );
        output:
        babebib, babebibb, xaxexi, xexix
        You can combine many sorts. Here's something quick and dirty:
        #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; my @words = qw( xaxexix babebib xaxexi babebibb baebbib ); sub prepare_sort { my ( $vowels, $positions ) = ( '', '' ); while (/([aeiou])/g) { $vowels .= $1; $positions .= sprintf '%03d', pos; } return [ $_, $positions, $vowels ]; } say for map $_->[0], sort { $a->[2] cmp $b->[2] } # vowels sort sort { $a->[1] cmp $b->[1] } # positional sort sort { $a->[0] cmp $b->[0] } # alphabetical sort map prepare_sort(), @words;
        Output:
        baebibb babebib babebibb xaxexi xaxexix
        Hi Zucan

        If you want to sort alphabetically the words with the vowels at the same position, you can make just a very simple change to just one line of the original ST code suggested by choroba:

        #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; my @words = qw( fanfare apparate panacea parmesan albatross albacore false vermeil candelabra beans ); say for map $_->[0], sort { $a->[1] cmp $b->[1] || $a->[0] cmp $b->[0]} map [ $_, join q(), /[aeiou]/g ], @words;
        For each time comparison step in the sorting algorithm, if the vowels compare equal, then
        $a->[1] cmp $b->[1]
        return 0. And it returns -1 or +1 in all the other cases. If you get a 0, then the logical or (||) will execute the second part of the sort statement:
        $a->[0] cmp $b->[0]
        to try to see if the whole expression is true. But if you get -1 or +1 with the first part, then the whole expression if already known to be true, so that the second part of the statement will not be executed (short-circuited).

        I do not think that the other edge case you mention is compatible with the initial question.

        That is a very good point, I never stopped to think if the words were the same. Realizing now that could be a possibility I will fix that. Thank you for pointing that out. I am new to Perl so I am just taking programming contest-esque problems and am trying to work them out to learn.

      Wow, I did not know this was a known method. Thank you for the link, insight, and example I am going to research this more. Thank you very much for your help!

        The Schwartzian Transform is a fairly common idiom in Perl and is, to a large extent, using some principles of functional programming (languages such as Lisp or Scheme) in Perl. If Choroba had not done it before, I would have suggested almost the same solution (with probably just some very minor syntactical variations).

        To understand such a construct, you have to read it from bottom to top (and sometimes, depending on the exact layout, from right to left).

        To understand it better, you can split it mentally into three separate instructions:

        my @temp_array = map [ $_, join q(), /[aeiou]/g ], @words; @temp_array = sort { $a->[1] cmp $b->[1] } @temp_array; say for map $_->[0], @temp_array;
        The first instruction creates a temporary array of array references whose content will be something like this:
        (["fanfare", "aae"], [apparate, "aaae"], etc.)
        The second instruction will sort the array references in accordance with the second field of each array ref (i.e. the vowels). At the end of this, the temporary array contains the array refs sorted according to the vowels.

        The final step is to print out the original words (stripped of the data added in the first step), which will be now sorted according to the vowels.

        This process is sometimes referred to as : "decorate, sort, undecorate", because it is adding new data to the original one to enable the sort to work efficiently, and then removes the added data.

        Now, the beauty of the Schwartzian Transform is that you don't really need the temporary array: you can just pipeline the data from one step to the next. So if you look back at the code suggested by choroba, the map at the bottom creates a list of array references which are fueled into the sort line above, which sorts in accordance with what the map at the bottom has added to the data, and the result of the sort is fed into the map of the first line, which removes what the first map had added and returns the original list duly sorted in accordance to the vowels of each word.

        I hope this helps understanding this slightly advanced construct. Once understood, it becomes very natural to use it.

Re: Sorting Strings By Vowel Sequence
by AnomalousMonk (Bishop) on Nov 17, 2014 at 23:44 UTC

    If you are dealing with lotsa data, the so-called Guttman-Rosler Transform (GRT, the second technique below; see A Fresh Look at Efficient Perl Sorting, which discusses both ST and GRT techniques) may be faster:

    c:\@Work\Perl>perl -wMstrict -le "use Test::More 'no_plan'; use Test::NoWarnings; ;; VECTOR: for my $ar_vector ( [ [ qw(fanfare apparate panacea parmesan albatross albacore false v +ermeil candelabra beans) ], [ qw(apparate fanfare panacea albatross albacore false parmesan c +andelabra beans vermeil) ], ], ) { my ($ar_unsorted, $ar_expected) = @$ar_vector; my @unsorted = @$ar_unsorted; note qq{input: @$ar_unsorted}; ;; my @sorted = map $_->[0], sort { $a->[1] cmp $b->[1] } map [ $_, join '', m{ [aeiou]+ }xmsg ], @unsorted ; is_deeply \@sorted, $ar_expected, qq{ST: @sorted}; ;; @sorted = map { (split m{ \x00 }xms)[1] } sort map join(qq{\x00}, join('', m{ [aeiou]+ }xmsg), $_), @unsorted ; is_deeply \@sorted, $ar_expected, qq{GRT: @sorted}; ;; is_deeply \@unsorted, $ar_unsorted, qq{input data unchanged}; } " # input: fanfare apparate panacea parmesan albatross albacore false ve +rmeil candelabra beans ok 1 - ST: apparate fanfare panacea albatross albacore false parmesan + candelabra beans vermeil ok 2 - GRT: apparate fanfare panacea albatross albacore false parmesan + candelabra beans vermeil ok 3 - input data unchanged ok 4 - no warnings 1..4

    Update: The
        map  join(qq{\x00}, join('', m{ [aeiou]+ }xmsg), $_),
    expression in the GRT sort above is rather messy. A neater (tested) version is
        map  join('', m{ [aeiou]+ }xmsg, qq{\x00}, $_),

Re: Sorting Strings By Vowel Sequence
by salva (Canon) on Nov 18, 2014 at 10:06 UTC
    You can use Sort::Key.

    If you have a recent perl (>= 5.14):

    use 5.14; use Sort::Key qw(keysort); my @sorted = keysort { s/[^aeiou]+//gr } @data;

    Otherwise:

    use Sort::Key qw(keysort); my @sorted = keysort { join '', /[aeiou]+/g } @data;

    Or in pure perl, just build a hash with the sorting keys:

    my %keys; $key{$_} = s/[^aeiou]+//gr for @data; my @sorted = sort { $key{$a} cmp $key{$b} } @data;

    The ST is an overrated technique, but quite popular because learning it is like some kind of initiation ritual into the realms of intermediate Perl. In practice, in only works for datasets that are not too big because it uses too much memory.

Re: Sorting Strings By Vowel Sequence
by Zucan (Beadle) on Nov 18, 2014 at 03:31 UTC
    My solution isn't as elegant as the other algorithms already suggested, however, I think mine is more straight-forward, especially if you are still new to Perl:
    #!/usr/bin/perl -w use strict; my @words = qw(fanfare apparate panacea parmesan albatross albacore false vermeil candelabra beans); my %hash; foreach my $word (@words) { my $vowels = lc $word; $vowels =~ s/[^aeiou]//g; # EXAMPLE B my @array = ($word); push @array, @{$hash{$vowels}} if defined $hash{$vowels}; $hash{$vowels} = \@array; } my @sorted; foreach my $vowels (sort keys %hash) { push @sorted, @{$hash{$vowels}}; # EXAMPLE A } print " INPUT: @words\n"; print "SORTED: @sorted\n";

    The above code provides identical output to what was suggested in the original question.

    For the edge case where words have the same vowels in the same positions, there may be a desire to sort the words in alphabetical order. The above code would take the words "xaxexix babebib" (in that order) and print "xaxexix babebib". To make the code print "babebib xaxexix" instead, change the line marked "EXAMPLE A" to the following:
    push @sorted, sort @{$hash{$vowels}};
    For the edge case where words have the same vowels in the same order but with differing positions, there may be a desire to have the a vowel that shows up sooner in a word to sort higher than a word that has the vowel show up later in a word. For example, the above code would take the words "babebib baebbib" (in that order) and print "babebib baebbib". To make the code print "baebbib babebib" instead, change the line marked "EXAMPLE B" to the following:
    $vowels =~ s/[^aeiou]/z/g;
    We essentially replace all the non-vowels with "z", preserving the position of the vowels.

    This was a great question. It is great to see the different ways a problem can be solved in Perl!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1107485]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2021-05-17 20:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Perl 7 will be out ...





    Results (161 votes). Check out past polls.

    Notices?