create separate output files based on name

rruser has asked for the wisdom of the Perl Monks concerning the following question:

I am currently processing a file that contains multiple companies and when complete I am manually splitting out based on Company name $f[0]. It would be a great timesaver if I could create a separate output file based on the Company name field $f[0]. Something like COMPANY-A.TXT, COMPANY-B.TXT, etc.

Thanks Perl Monks I appreciate your time and help

sample of company data: COMPANY-A, COMPANY-B, COMPANY-C, etc.
[download]

while (<$file>) {
      my @f = split '\s+', $_;   
#         if ($f[0] =~ (/COMPANY-A/)) {  # this is just for 1 company 
+and not efficient normally i remark this out
           if ( $f[0] =~ (/-C$/)) {   # this process carryover records
           my @ymd1 = split ',',$f[4] //= $prev_mth_end; # fill empty 
+dates with previous month end
           my @ymd2 = split ',',$f[5] //= $prev_mth_end;
           my $diff = Delta_Days(@ymd1, @ymd2) +1;  # total days
           my $prev = $prev_days{$f[2]} //= 0;
           my $amt = ($diff + $prev > 3) ? $diff + $prev - 3 : 0;  # d
+ays to be charged
           my $cost = ($amt) * 100;
           my $free = $diff - $amt;
           my $stg_chg = "N/A";  #static field 
           my $sw_chg = "N/A";   #static filed
           my $waive = " ";
           my $cartot = ($cost + $stg_chg + $sw_chg);
           my $comment = "Carry over from last month";
           my $pfmt = "%-12s %-5s %-8s %-5s %-15s %-15s %-6s %-6s %-8s
+ %-12s %-12s %-12s %-8s %-12s %-40s\n";
           printf OUT $pfmt, @f[0..3], fmt_mdy($f[4]), fmt_mdy($f[5]),
+ $diff, $free, $waive, fmt_curr($cost), $stg_chg, $sw_chg, $f[6], fmt
+_curr($cartot), $comment;
          }
    #process non-carryover records
    else {  
        my @ymd1 = split ',',$f[4] //= $prev_mth_end; # fill empty dat
+es with previous month end
        my @ymd2 = split ',',$f[5] //= $prev_mth_end;
        my $diff = Delta_Days(@ymd1, @ymd2);
        my $prev = $prev_days{$f[2]} //= 0;
        my $amt = ($diff + $prev > 3) ? $diff + $prev - 3 : 0;
        my $cost = ($amt * 100);
        my $free = $diff - $amt;
        my $stg_chg = "N/A";  #static field
        my $sw_chg = "N/A";   #static field
        my $waive = "";
        my $cartot = ($cost + $stg_chg + $sw_chg);
        my $comment = " ";
        my $pfmt = "%-12s %-5s %-8s %-5s %-15s %-15s %-6s %-6s %-8s %-
+12s %-12s %-12s %-8s %-12s %-22s\n";
        printf OUT $pfmt, @f[0..3], fmt_mdy($f[4]), fmt_mdy($f[5]), $d
+iff, $free, $waive, fmt_curr($cost), $stg_chg, $sw_chg, $f[6], fmt_cu
+rr($cartot), $comment;
              }
#        }
}
[download]

Comment on create separate output files based on name Select or Download Code

Replies are listed 'Best First'.
Re: create separate output files based on name by roboticus (Chancellor) on Dec 18, 2013 at 18:14 UTC
rruser: Try using a hash table to hold your open file handles, using the company name as the key. For each record, check whether you have an open file handle. If you do, then write the record. Otherwise open the file, store it in the hash, and then write the record. ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply]
Re: create separate output files based on name by Laurent_R (Canon) on Dec 18, 2013 at 22:26 UTC
OK, assuming you have sorted your records by company, you can have something like this: `my $company = "unlikelyname"; my $OUT; while (<$file>) { my @f = split '\s+', $_; if ($f[0] !~ /$company/) { close $OUT if defined $OUT; open $OUT, ">", "$f[0].txt" or die "blabla $!"; $company = $f[0]; } my @ymd1 = split ',',$f[4] //= $prev_mth_end; my @ymd2 = split ',',$f[5] //= $prev_mth_end; # ... printf OUT $pfmt, @f[0..3], fmt_mdy($f[4]), fmt_mdy($f[5]), $diff +, $free, $waive, fmt_curr($cost), $stg_chg, $sw_chg, $f[6], fmt_curr( +$cartot), $comment; }` [download] I don't have data and can't test in detail, there may be some errors here and there, but I am fairly sure the basic idea works and it is very simple: you change company? Just close the previous file, open a new file with the same filehandler but a new name, so that you can always write to $OUT (and nonetheless to the right file), which will be at any time associated with the right file.	[reply] [d/l]
Re: create separate output files based on name by Laurent_R (Canon) on Dec 18, 2013 at 18:47 UTC
How many companies do you have in total? Are all the records for one company grouped together or are they mixed? If they are mixed, can you sort them on the company name prior to processing? Depending on the answers, the solution might be extremely easy or (very) slightly more complicated.	[reply]
Re^2: create separate output files based on name by rruser (Acolyte) on Dec 18, 2013 at 20:29 UTC
the records are grouped randomly but I can sort them by company, the number of companies can vary month to month however usually less that 12.	[reply]
Re: create separate output files based on name by sundialsvc4 (Abbot) on Dec 19, 2013 at 00:56 UTC
As I look through the collected responses to this thread so-far, I would suggest that there are two general approaches that are being (equally seriously ...) offered: If the total number of output-files is both “truly unpredictable” and “can be counted-on to be small,” then it is possible to have all of the possible output-files open at the same time. As long as you are sure that the operating system won’t object (fatally... as operating-systems are wot to do when their Godly Prerogatives are crossed by Mere Mortals), then you can simply throw each incoming record into the appropriate (simultaneously...) open bucket. If this is not the case, then you probably are going to need to sort the incoming records first. This, by definition, will cause all records having an identical key-value to be physically adjacent ... so that all of the records that are destined for any particular destination are adjacent ... so that you can meaningfully react to a change in the destination, with no need to remember history. The advantage of this approach is, of course, that there is never more than one destination-bucket (file..) open at any one time. The disadvantage is “the overhead of sorting.” (Which may, actually, be quite acceptable. There is, indeed, a reason why one of Dr. Knuth’s seminal books was titled: Sorting and Searching ...)	[reply]
Re^2: create separate output files based on name by Laurent_R (Canon) on Dec 19, 2013 at 07:12 UTC
Yes, exactly, it is for the purpose of selecting one of these two approaches that i asked the question above.	[reply]
Re^3: create separate output files based on name by rruser (Acolyte) on Dec 20, 2013 at 15:00 UTC
Thanks so much for all the input.	[reply]

Back to Seekers of Perl Wisdom