Hi PMs!
I've CSV file, comma delimited, embedded (, & ") in double quotes; as you can see in row given below.
6450458,6011,"Urine - Culture & Sensitivity",1658,"Colony Count:","10^
+3 cfu/ml",,2016-10-26 09:55:34,0,"", ,"",,2016-10-26 09:55:34,SS00002
+03,6,"All Tests Done and Verified",,SCIN72669,2016-10-24 12:04:58,21,
+"Max Smart Super, Speciality Hospital "O"",3445,"Bansidhar Tarai ",0,
+"SAVITRI DEVI",False,"SAVITRI ","DEVI",SKCT,334,20905,"Anjani Kuma
+r Agrawal",1957-01-01 00:00:00,0,NULL,"7838457000","INFO@MAXHEALTHCA
+RE.COM",OP,NO,Verified,2016-10-24 12:04:58,2016-10-24 12:04:58,Lab,24
+3981,0,"",F
6444885,21732,"Blood - Culture & Sensitivity",3147,"Method BacT/ALERT3
+D & Vitek 2","SubHead",,2016-10-26 09:00:11,1,"min", ,"",0,2016-10-26
+ 09:00:11,PM0004746,6,"All Tests Done and Verified",,PMIN4335,2016-10
+-21 19:07:36,25,"PMC",3445,"Bansidhar Tarai ",0,"SUSHILA DEVI 94907",
+False,"SUSHILA DEVI","94907",PMCL,1861,69142,"Parkash Gera",1961-01
+-01 00:00:00,0,NULL,"0143000000","INFO@MAXHEALTHCARE.COM",OP,NO,Verif
+ied,2016-10-21 19:07:36,2016-10-21 19:07:36,Lab,781642,0,"",F
6444891,21732,"Blood - Culture & Sensitivity",3147,"Method BacT/ALERT3
+D & Vitek 2","SubHead",,2016-10-26 09:00:36,1,"min", ,"",0,2016-10-26
+ 09:00:36,PM0004748,6,"All Tests Done and Verified",,PMIN4337,2016-10
+-21 19:11:24,25,"PMC",3445,"Bansidhar Tarai ",0,"TUSHAR BHATIA 94916"
+,False,"TUSHAR BHATIA","94916",PMCL,1876,69142,"Parkash Gera",1985-
+01-01 00:00:00,0,NULL,"0143000000","INFO@MAXHEALTHCARE.COM",OP,NO,Ver
+ified,2016-10-21 19:11:24,2016-10-21 19:11:24,Lab,773211,0,"",M
Requirement:
Sort file based on 3 columns in file i.e. primary: 31, secondary: 1, tertiary: 2
i.e. sorted as below, 1st -> 2nd -> 3rd
1,2,3
1,2,4
1,3,4
2,1,5
What I'm doing:
i.) converting file to pipe(|) separated from comma(,) -> using Text::CSV module
ii.) Sorting using File::Sort
Here is the code snippet:
&commaToPipeDelimiter($maxFile, $pipeMaxFile);
sort_file({t => '|', k => ['31n', '1n', '2n'], I => $pipeMaxFile, o =>
+ $sortedMaxFile});
sub commaToPipeDelimiter{
my ($file, $pfile) = @_;
my $csv = Text::CSV->new({binary => 1, decode_utf8 => 1, auto_diag
+ => 1, allow_loose_quotes => 1});
open(my $data, '<:encoding(utf8)', $file) or die "Could not open '
+$file' $!\n";
open(W, ">".$pfile) || die "Could not open $pfile $!\n";
while(my $line = <$data>){
chomp $line;
if($csv->parse($line)){
my @fields = $csv->fields();
print W join("|",@fields),"\n";
}
else{
warn "Line could not be parsed: $!\n";
}
}
Is there some other efficient way someone can suggest? Rather than converting file into pipe separated file then sort since files could be much larger.
FYI -> Embedded commas need to be taken care of.
-Chetan