Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Sorting a text file

by rsriram (Hermit)
on Mar 15, 2007 at 09:19 UTC ( #604945=perlquestion: print w/replies, xml ) Need Help??

rsriram has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

This is another question relating to the content, which I seeked help before. I have a text file, which has less than 100 lines and I need to sort this based on the second column.

77876 8543 CA84985 54E 77873 8003 CA84985 54E 77875 7725 CA84985 54E 77872 8511 CA84985 54E 77873 8123 CA84985 54E 77822 9908 CA84985 54E 77819 8503 CA84985 54E 77826 8040 CA84985 54E 77822 7874 CA84985 54E 77884 8543 CA84985 54E 77809 7211 CA84985 54E

I can add each line to a array and sort it, but it is sorting only based on the first column. Can anyone help me?

The output needs to look similar to this

77809 7211 CA84985 54E 77875 7725 CA84985 54E 77822 7874 CA84985 54E 77873 8003 CA84985 54E 77826 8040 CA84985 54E 77873 8123 CA84985 54E 77819 8503 CA84985 54E 77872 8511 CA84985 54E 77876 8543 CA84985 54E 77884 8543 CA84985 54E 77822 9908 CA84985 54E

Replies are listed 'Best First'.
Re: Sorting a text file
by Corion (Patriarch) on Mar 15, 2007 at 09:24 UTC

    This is a FAQ:

    Searching with Google or typing perldoc -q sort gives you How do I sort an array by (anything). You will need to split up your lines into separate items or at least extract the items for the comparison, for example by using

    sort { substr($a,8,4) cmp substr($b,8,4) }

    Please also show the code you've written. It is very hard to give you helpfull advice if we cannot see your code and have to guess as to what you have tried and where your problem might be.

Re: Sorting a text file
by McDarren (Abbot) on Mar 15, 2007 at 09:29 UTC
    You don't even need Perl for this. Assuming that you are on a *nix system (perhaps a poor assumption), then you could simply do:
    sort -nk2 file.txt

    Update: It turns out that MS-DOS also has a sort command (I never knew that!). So on a winders system, the following from a command prompt:

    sort /+10 file.txt
    (Assumes spaces between the 1st and 2nd fields, rather than tabs)

    Cheers,
    Darren :)

Re: Sorting a text file
by GrandFather (Saint) on Mar 15, 2007 at 09:30 UTC

    There are many ways to do this trick, but they all depend on pulling out the sort key (in this case the second column). A simple way for small quantites of data is to use a hash:

    use strict; use warnings; my %data; while (<DATA>) { my $key = /^\d+\s+(\d+)/ ? $1 : next; $data{$key} = $_; } print $data{$_} for sort keys %data; __DATA__ 77876 8543 CA84985 54E 77873 8003 CA84985 54E 77875 7725 CA84985 54E 77872 8511 CA84985 54E 77873 8123 CA84985 54E 77822 9908 CA84985 54E 77819 8503 CA84985 54E 77826 8040 CA84985 54E 77822 7874 CA84985 54E 77884 8543 CA84985 54E 77809 7211 CA84985 54E

    Prints:

    77809 7211 CA84985 54E 77875 7725 CA84985 54E 77822 7874 CA84985 54E 77873 8003 CA84985 54E 77826 8040 CA84985 54E 77873 8123 CA84985 54E 77819 8503 CA84985 54E 77872 8511 CA84985 54E 77884 8543 CA84985 54E 77822 9908 CA84985 54E

    DWIM is Perl's answer to Gödel

      GrandFather,

      I think here we cannot use hash directly as you did, because the second column has duplicates like '8543' which'll fail.

      In your input 11 lines are present but in your output only 10 lines present.

      If I am wrong, please correct me.

      Prasad

        You are quite right. However if you have duplicate keys how do you expect the sort to arrange those lines? Do you fall back to a secondary key, or does it not matter, or do you retain the original file order? You could for example rely on sort's stability in recent versions of Perl to retain the lines with identical keys in file order:

        use strict; use warnings; print sort {substr ($a, 9, 4) cmp substr ($b, 9, 4)} <DATA> __DATA__ ...

        Prints (using the original data):

        77809 7211 CA84985 54E 77875 7725 CA84985 54E 77822 7874 CA84985 54E 77873 8003 CA84985 54E 77826 8040 CA84985 54E 77873 8123 CA84985 54E 77819 8503 CA84985 54E 77872 8511 CA84985 54E 77876 8543 CA84985 54E 77884 8543 CA84985 54E 77822 9908 CA84985 54E

        DWIM is Perl's answer to Gödel
        You could turn GrandFather's approach around and swap keys and values in the hash:
        my %data; ( $data{ $_}) = /^\d+\s+(\d+)/ while <DATA>; print for sort { $data{ $a} cmp $data{ $b} } keys %data;
        This is essentially a variant of the Schwartz Transform, using a hash in place of an array to keep the sort field(s).

        Anno

Re: Sorting a text file
by johngg (Canon) on Mar 15, 2007 at 09:56 UTC
    You could easily adapt the answer I gave to your question yesterday. Simply take out the @wanted array and the $rxExtract regex and replace the

    grep { $_->[2] =~ $rxExtract }

    with

    sort { $a->[2] <=> $b->[2] }

    Note that I have used a numerical comparison in the sort (<=>) just in case your second column has numbers of other than four digits. If there will always be only four digits you could use string comparison (cmp).

    I hope this is of use

    Cheers,

    JohnGG

Re: Sorting a text file
by kwaping (Priest) on Mar 15, 2007 at 14:45 UTC
    Since the file is small, this would also be a good candidate for Tie::File. Then the sorting should be easy (see other posts in this thread for that).

    ---
    It's all fine and dandy until someone has to look at the code.
Re: Sorting a text file
by derby (Abbot) on Mar 15, 2007 at 13:52 UTC

    If you're on a *nix platform (or have cygwin on MS), I would just use the sort command for complicated sorting:

    $ sort -k 1.10,1.14 <filename>

    -derby
Re: Sorting a text file
by holli (Abbot) on Mar 16, 2007 at 12:55 UTC
    la Schwartz...
    use warnings; use strict; use File::Slurp qw(read_file); print map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [(split /\s+/)[1], $_] } read_file( $ARGV[0] );


    holli, /regexed monk/
Re: Sorting a text file
by Moron (Curate) on Mar 15, 2007 at 18:30 UTC
    I suppose a perl -e would be based on...
    %_=(); print sort { %_{ $a } ||= SortKey( $a ) <=> %_{ $b } ||= SortKey( $b ) } <>; sub SortKey { my @a=split /\t/, shift(); $a[1]; }

    -M

    Free your mind

Re: Sorting a text file
by jonadab (Parson) on Mar 15, 2007 at 10:48 UTC

    If you want to play amusing games with this problem, you could implement a crude Radix Sort:

    require v5.8.0; open INPUT, '<', '/path/to/the/file/where/the/data/are.stored'; my @line = <INPUT>; close INPUT; for (reverse 8..13) { @line = sort {substr($a,$_,1) <=> substr($b,$_,1)} @line; } print @line;
    -- 
    We're working on a six-year set of freely redistributable Vacation Bible School materials.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://604945]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (7)
As of 2022-09-30 11:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I prefer my indexes to start at:




    Results (126 votes). Check out past polls.

    Notices?