Beefy Boxes and Bandwidth Generously Provided by pair Networks Ovid
XP is just a number
 
PerlMonks  

File Sorting Question

by vbrtrmn (Pilgrim)
on Jun 18, 2001 at 21:26 UTC ( [id://89463]=perlquestion: print w/replies, xml ) Need Help??

This is an archived low-energy page for bots and other anonmyous visitors. Please sign up if you are a human and want to interact.

vbrtrmn has asked for the wisdom of the Perl Monks concerning the following question:

A seeker may sort a flat file using this method:
open(INF,$filename); @indata = <INF>; close(INF); @sorted = sort { lc($a) cmp lc($b) } @indata;
Many of my records start with: The Mortgage or The Bank
How does a seeker sort ignoring the word "The"?

-- paul

Replies are listed 'Best First'.
Re: File Sorting Question
by John M. Dlugosz (Monsignor) on Jun 18, 2001 at 21:32 UTC
    1) Make any value beginning with "The" less than any other value. Then they will sort to the beginning of the list, as a group.

    2) Filter them out first using grep before sorting.

Re: File Sorting Question
by dimmesdale (Friar) on Jun 18, 2001 at 21:42 UTC
    Well, a seeker would create a subroutine like this:

    @temp = map { s/^The // } @indata; @sorted = sort {lc($a) cmp lc($b) } @temp;
    I made a temp array so as not to keep the changes the substitution made.

    Update: Actually, it *does not* alter the original values. Note that I assign the map s/// to @temp; @indata remains unchanged. This, from the manpage for map(type it in on this site's search box, for ease).

    %hash = map { getkey($_) => $_ } @array; is just a funny way to write %hash = (); foreach $_ (@array) { $hash{getkey($_)} = $_; }

    Update (number two): My mistake. . . I did test the code, though I changed it slightly and didn't think to test it again. I'm sorry--I misread the manpage, and misinterpreted what they were doing. This code should do what I meant in the first place:

    @temp = map {my $tmp; ($tmp = $_) =~ s/^The //; $tmp} @indata
      You'll find that your s/// has changed your original data. And your 'sorted' array no longer has the word 'The'.

      Update:And your map is a 'funny' way to write (thanks for pointing out another bug :-):

      for (@array) { my $num = s/^The //; push @temp, $num; }
      which DOES modify the original @array. And doesn't put the desired value into @temp ($num is the number of substitutions, which is 0 (undef) or 1).
        s/^The\s+// for @temp= @array;         - tye (but my friends call me "Tye")
      Yes, it does modify the original values, because the assignment to @temp occurs after the map, and after $_ has been changed in place for each element of @indata.

      Furthermore, substitution returns true or false; it does not return the modified string. Therefore, @temp will be a list of null strings and 1s, and @sorted will be a sorted list of null strings and 1s.

      Please test your code before posting it, or at least before contradicting a helpful correction from another Perl Monk.

      #!/usr/local/bin/perl -w $, = "\n"; chomp(@indata = <DATA>); @temp = map { s/^The // } @indata; @sorted = sort {lc($a) cmp lc($b) } @temp; print @indata, "---", @temp, "---", @sorted, "---\n"; __DATA__ The Dog A Bird The Cat Lots of Fish
      And the output:
      Dog A Bird Cat Lots of Fish --- 1 1 --- 1 1 ---
Re: File Sorting Question
by wog (Curate) on Jun 18, 2001 at 21:43 UTC
    You probably could use the Schwartzian Transform (also see this FMTYEWTK ):

    @sorted = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { (my $tmp = lc $_) =~ s/^the\s+//; [ $_, $tmp ] } @indata;

    update: moved parens to the right place on the my.

Re: File Sorting Question
by runrig (Abbot) on Jun 18, 2001 at 21:47 UTC
    sub my_sort_key { my $file = lc(shift); chomp $file; my $the_file =~ s/^the\s+//; $the_file; } my @files = map { $$_[1] } sort { $$a[0] cmp $$b[0] } map { [ my_sort_key($_), $_ ] } <INF>;
    Update:Fixed. And I figure you probably only want to ignore 'The' if its a word, not a word prefix.(Ahh, wog is too quick for me :-)
Re: File Sorting Question
by jeroenes (Priest) on Jun 18, 2001 at 22:00 UTC
    Any solution is going to be expensive.
    1. Don't eat memory:
      my $ignore = '(^The )|(^A )'; ... @sorted = sort { $a =~ s/$ignore//; $b =~ s/$ignore//; lc( $a ) cmp lc( $b ); } @indata;
    2. Or store better keys first:
      %data = map { my $data = $_; s/$ignore//; ($_, $data } @indata; @sorted = @data{ sort{ lc( $a) cmp lc( $b)} keys %data };

    Hope this helps,

    Jeroen
    "We are not alone"(FZ)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://89463]
Approved by root
help
Sections?
Information?
Find Nodes?
Leftovers?
    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.