Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Sorting text files.

by Saner (Novice)
on Nov 06, 2015 at 18:00 UTC ( #1147112=perlquestion: print w/replies, xml ) Need Help??

Saner has asked for the wisdom of the Perl Monks concerning the following question:

I have a file that is out of order.I would like to reorder it by using a second file.

File 1

1 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR@4;2306=eng@106:2308;2309=eng:0:21000:2:2066:0
2 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2315=27:2316=NAR@4;2317=eng@106:2318;2319=eng:0:21020:2:2066:0
3 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2320=27:2321=NAR@4;2322=eng@106:2323;2324=eng:0:21030:2:2066:0
ITV HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR@4;2306=eng@106:2308;2309=eng:0:21000:2:2066:0


And a second file

3 HD
1 HD
2 HD

I want to scan file two, and reorder file 1, and left overs get amended to the end if the file, so the end result is

3 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2320=27:2321=NAR@4;2322=eng@106:2323;2324=eng:0:21030:2:2066:0
1 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR@4;2306=eng@106:2308;2309=eng:0:21000:2:2066:0
2 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2315=27:2316=NAR@4;2317=eng@106:2318;2319=eng:0:21020:2:2066:0
ITV HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR@4;2306=eng@106:2308;2309=eng:0:21000:2:2066:0

Any ideas? Thanks in advance.

Replies are listed 'Best First'.
Re: Sorting text files.
by BrowserUk (Pope) on Nov 06, 2015 at 19:19 UTC

    Using a hash lookup to specify the order; and a GRT to make it somewhat efficient:

    #! perl -slw use strict; use Inline::Files; use Data::Dump qw[ pp ]; my $n = 0; my %order = map{ chomp; $_ => ++$n } <FILE2>; my @data = <FILE1>; my @sorted = map{ unpack 'x4A*', $_; } sort map { my( $key ) = m[(^[^;]+);]; pack 'NA*', $order{ $key } // 2**31, $_; } @data; pp \@sorted; __FILE1__ 1 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR@4;2306=en +g@106:2308;2309=eng:0:21000:2:2066:0 2 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2315=27:2316=NAR@4;2317=en +g@106:2318;2319=eng:0:21020:2:2066:0 3 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2320=27:2321=NAR@4;2322=en +g@106:2323;2324=eng:0:21030:2:2066:0 ITV HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR@4;2306= +eng@106:2308;2309=eng:0:21000:2:2066:0 __FILE2__ 3 HD 1 HD 2 HD __OUTPUT__ C:\test>1147112.pl [ "3 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2320=27:2321=NAR\@4;232 +2=eng\@106:2323;2324=eng:0:21030:2:2066:0", "1 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR\@4;230 +6=eng\@106:2308;2309=eng:0:21000:2:2066:0", "2 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2315=27:2316=NAR\@4;231 +7=eng\@106:2318;2319=eng:0:21020:2:2066:0", "ITV HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR\@4;2 +306=eng\@106:2308;2309=eng:0:21000:2:2066:0", ]

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Sorting text files.
by Laurent_R (Canon) on Nov 06, 2015 at 20:07 UTC
    If you can fit the first file into a hash (key: first field, value: full line), then you don't even need to use the sort function: load file 1 into the hash, read file 2 and, for each line of file 2, lookup into the hash with the key, print the line from the hash, delete the hash record; at the end, print out the hash records still there.

      Yeah, why sort when you don't have to :)

      #!/usr/bin/perl # http://perlmonks.org/?node_id=1147112 use Inline::Files; use strict; use warnings; my %id; $id{ s/;.*//sr } .= $_ while <FILE1>; print delete @id{ map s/\n//r, <FILE2> }, values %id; __FILE1__ 1 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR@4;2306=en +g@106:2308;2309=eng:0:21000:2:2066:0 2 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2315=27:2316=NAR@4;2317=en +g@106:2318;2319=eng:0:21020:2:2066:0 3 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2320=27:2321=NAR@4;2322=en +g@106:2323;2324=eng:0:21030:2:2066:0 ITV HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR@4;2306= +eng@106:2308;2309=eng:0:21000:2:2066:0 __FILE2__ 3 HD 1 HD 2 HD

        What happens if there are duplicate keys?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Sorting text files.
by pme (Prior) on Nov 06, 2015 at 18:12 UTC
    Hi Saner,

    Are they big files? Can they be entirely read into the memory?

      Sure they are only a couple of thousand lines
Re: Sorting text files.
by Tux (Abbot) on Nov 08, 2015 at 12:09 UTC

    Nice one. I could not resist doing that in Text::CSV_XS' csv function (this won't fly if your file is huge):

    use 5.20.0; use warnings; use Text::CSV_XS qw(csv); my $n = 1; open my $fh, "<", "file2.txt"; my %sort = map { chomp; $_ => $n++ } <$fh>; close $fh; csv (sep => ";", before_out => sub { shift @{$_[1]} }, quote_space => +0, in => [ sort { $a->[0] <=> $b->[0] } @{csv (sep => ";", in => *DA +TA, after_parse => sub { unshift @{$_[1]}, $sort{$_[1][0]} // $n })}] +, ); __END__ 1 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR@4;2306=en +g@106:2308;2309=eng:0:21000:2:2066:0 2 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2315=27:2316=NAR@4;2317=en +g@106:2318;2319=eng:0:21020:2:2066:0 3 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2320=27:2321=NAR@4;2322=en +g@106:2323;2324=eng:0:21030:2:2066:0 ITV HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR@4;2306= +eng@106:2308 => 3 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2320=27:2321=NAR@4;2322=en +g@106:2323;2324=eng:0:21030:2:2066:0 1 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR@4;2306=en +g@106:2308;2309=eng:0:21000:2:2066:0 2 HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2315=27:2316=NAR@4;2317=en +g@106:2318;2319=eng:0:21020:2:2066:0 ITV HD;BSkyB:11097:VC23M5O25P0S1:S28.2E:23000:2305=27:2307=NAR@4;2306= +eng@106:2308;2309=eng:0:21000:2:2066:0

    Enjoy, Have FUN! H.Merijn

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1147112]
Approved by rminner
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (8)
As of 2019-12-07 11:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Strict and warnings: which comes first?



    Results (160 votes). Check out past polls.

    Notices?