Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Sorting text files.

by Saner (Novice)
on Nov 06, 2015 at 18:00 UTC ( #1147112=perlquestion: print w/replies, xml ) Need Help??

Saner has asked for the wisdom of the Perl Monks concerning the following question:

I have a file that is out of order.I would like to reorder it by using a second file.

File 1

1 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR@4;2306=eng@106:2308;2309=eng:0:21000:2:2066:0
2 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2315=27:2316=NAR@4;2317=eng@106:2318;2319=eng:0:21020:2:2066:0
3 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2320=27:2321=NAR@4;2322=eng@106:2323;2324=eng:0:21030:2:2066:0
ITV HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR@4;2306=eng@106:2308;2309=eng:0:21000:2:2066:0


And a second file

3 HD
1 HD
2 HD

I want to scan file two, and reorder file 1, and left overs get amended to the end if the file, so the end result is

3 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2320=27:2321=NAR@4;2322=eng@106:2323;2324=eng:0:21030:2:2066:0
1 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR@4;2306=eng@106:2308;2309=eng:0:21000:2:2066:0
2 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2315=27:2316=NAR@4;2317=eng@106:2318;2319=eng:0:21020:2:2066:0
ITV HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR@4;2306=eng@106:2308;2309=eng:0:21000:2:2066:0

Any ideas? Thanks in advance.

Replies are listed 'Best First'.
Re: Sorting text files.
by BrowserUk (Pope) on Nov 06, 2015 at 19:19 UTC

    Using a hash lookup to specify the order; and a GRT to make it somewhat efficient:

    #! perl -slw use strict; use Inline::Files; use Data::Dump qw[ pp ]; my $n = 0; my %order = map{ chomp; $_ => ++$n } <FILE2>; my @data = <FILE1>; my @sorted = map{ unpack 'x4A*', $_; } sort map { my( $key ) = m[(^[^;]+);]; pack 'NA*', $order{ $key } // 2**31, $_; } @data; pp \@sorted; __FILE1__ 1 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR@4;2306=en +g@106:2308;2309=eng:0:21000:2:2066:0 2 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2315=27:2316=NAR@4;2317=en +g@106:2318;2319=eng:0:21020:2:2066:0 3 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2320=27:2321=NAR@4;2322=en +g@106:2323;2324=eng:0:21030:2:2066:0 ITV HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR@4;2306= +eng@106:2308;2309=eng:0:21000:2:2066:0 __FILE2__ 3 HD 1 HD 2 HD __OUTPUT__ C:\test>1147112.pl [ "3 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2320=27:2321=NAR\@4;232 +2=eng\@106:2323;2324=eng:0:21030:2:2066:0", "1 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR\@4;230 +6=eng\@106:2308;2309=eng:0:21000:2:2066:0", "2 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2315=27:2316=NAR\@4;231 +7=eng\@106:2318;2319=eng:0:21020:2:2066:0", "ITV HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR\@4;2 +306=eng\@106:2308;2309=eng:0:21000:2:2066:0", ]

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Sorting text files.
by Laurent_R (Canon) on Nov 06, 2015 at 20:07 UTC
    If you can fit the first file into a hash (key: first field, value: full line), then you don't even need to use the sort function: load file 1 into the hash, read file 2 and, for each line of file 2, lookup into the hash with the key, print the line from the hash, delete the hash record; at the end, print out the hash records still there.

      Yeah, why sort when you don't have to :)

      #!/usr/bin/perl # http://perlmonks.org/?node_id=1147112 use Inline::Files; use strict; use warnings; my %id; $id{ s/;.*//sr } .= $_ while <FILE1>; print delete @id{ map s/\n//r, <FILE2> }, values %id; __FILE1__ 1 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR@4;2306=en +g@106:2308;2309=eng:0:21000:2:2066:0 2 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2315=27:2316=NAR@4;2317=en +g@106:2318;2319=eng:0:21020:2:2066:0 3 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2320=27:2321=NAR@4;2322=en +g@106:2323;2324=eng:0:21030:2:2066:0 ITV HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR@4;2306= +eng@106:2308;2309=eng:0:21000:2:2066:0 __FILE2__ 3 HD 1 HD 2 HD

        What happens if there are duplicate keys?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Sorting text files.
by pme (Prior) on Nov 06, 2015 at 18:12 UTC
    Hi Saner,

    Are they big files? Can they be entirely read into the memory?

      Sure they are only a couple of thousand lines
Re: Sorting text files.
by Tux (Abbot) on Nov 08, 2015 at 12:09 UTC

    Nice one. I could not resist doing that in Text::CSV_XS' csv function (this won't fly if your file is huge):

    use 5.20.0; use warnings; use Text::CSV_XS qw(csv); my $n = 1; open my $fh, "<", "file2.txt"; my %sort = map { chomp; $_ => $n++ } <$fh>; close $fh; csv (sep => ";", before_out => sub { shift @{$_[1]} }, quote_space => +0, in => [ sort { $a->[0] <=> $b->[0] } @{csv (sep => ";", in => *DA +TA, after_parse => sub { unshift @{$_[1]}, $sort{$_[1][0]} // $n })}] +, ); __END__ 1 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR@4;2306=en +g@106:2308;2309=eng:0:21000:2:2066:0 2 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2315=27:2316=NAR@4;2317=en +g@106:2318;2319=eng:0:21020:2:2066:0 3 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2320=27:2321=NAR@4;2322=en +g@106:2323;2324=eng:0:21030:2:2066:0 ITV HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR@4;2306= +eng@106:2308 => 3 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2320=27:2321=NAR@4;2322=en +g@106:2323;2324=eng:0:21030:2:2066:0 1 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR@4;2306=en +g@106:2308;2309=eng:0:21000:2:2066:0 2 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2315=27:2316=NAR@4;2317=en +g@106:2318;2319=eng:0:21020:2:2066:0 ITV HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR@4;2306= +eng@106:2308;2309=eng:0:21000:2:2066:0

    Enjoy, Have FUN! H.Merijn

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1147112]
Approved by rminner
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2020-09-21 12:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If at first I donít succeed, I Ö










    Results (125 votes). Check out past polls.

    Notices?