http://www.perlmonks.org?node_id=1067670

rruser has asked for the wisdom of the Perl Monks concerning the following question:

I am currently processing a file that contains multiple companies and when complete I am manually splitting out based on Company name $f[0]. It would be a great timesaver if I could create a separate output file based on the Company name field $f[0]. Something like COMPANY-A.TXT, COMPANY-B.TXT, etc.

Thanks Perl Monks I appreciate your time and help

sample of company data: COMPANY-A, COMPANY-B, COMPANY-C, etc.
while (<$file>) { my @f = split '\s+', $_; # if ($f[0] =~ (/COMPANY-A/)) { # this is just for 1 company +and not efficient normally i remark this out if ( $f[0] =~ (/-C$/)) { # this process carryover records my @ymd1 = split ',',$f[4] //= $prev_mth_end; # fill empty +dates with previous month end my @ymd2 = split ',',$f[5] //= $prev_mth_end; my $diff = Delta_Days(@ymd1, @ymd2) +1; # total days my $prev = $prev_days{$f[2]} //= 0; my $amt = ($diff + $prev > 3) ? $diff + $prev - 3 : 0; # d +ays to be charged my $cost = ($amt) * 100; my $free = $diff - $amt; my $stg_chg = "N/A"; #static field my $sw_chg = "N/A"; #static filed my $waive = " "; my $cartot = ($cost + $stg_chg + $sw_chg); my $comment = "Carry over from last month"; my $pfmt = "%-12s %-5s %-8s %-5s %-15s %-15s %-6s %-6s %-8s + %-12s %-12s %-12s %-8s %-12s %-40s\n"; printf OUT $pfmt, @f[0..3], fmt_mdy($f[4]), fmt_mdy($f[5]), + $diff, $free, $waive, fmt_curr($cost), $stg_chg, $sw_chg, $f[6], fmt +_curr($cartot), $comment; } #process non-carryover records else { my @ymd1 = split ',',$f[4] //= $prev_mth_end; # fill empty dat +es with previous month end my @ymd2 = split ',',$f[5] //= $prev_mth_end; my $diff = Delta_Days(@ymd1, @ymd2); my $prev = $prev_days{$f[2]} //= 0; my $amt = ($diff + $prev > 3) ? $diff + $prev - 3 : 0; my $cost = ($amt * 100); my $free = $diff - $amt; my $stg_chg = "N/A"; #static field my $sw_chg = "N/A"; #static field my $waive = ""; my $cartot = ($cost + $stg_chg + $sw_chg); my $comment = " "; my $pfmt = "%-12s %-5s %-8s %-5s %-15s %-15s %-6s %-6s %-8s %- +12s %-12s %-12s %-8s %-12s %-22s\n"; printf OUT $pfmt, @f[0..3], fmt_mdy($f[4]), fmt_mdy($f[5]), $d +iff, $free, $waive, fmt_curr($cost), $stg_chg, $sw_chg, $f[6], fmt_cu +rr($cartot), $comment; } # } }

Replies are listed 'Best First'.
Re: create separate output files based on name
by roboticus (Chancellor) on Dec 18, 2013 at 18:14 UTC

    rruser:

    Try using a hash table to hold your open file handles, using the company name as the key.

    For each record, check whether you have an open file handle. If you do, then write the record. Otherwise open the file, store it in the hash, and then write the record.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: create separate output files based on name
by Laurent_R (Canon) on Dec 18, 2013 at 22:26 UTC
    OK, assuming you have sorted your records by company, you can have something like this:
    my $company = "unlikelyname"; my $OUT; while (<$file>) { my @f = split '\s+', $_; if ($f[0] !~ /$company/) { close $OUT if defined $OUT; open $OUT, ">", "$f[0].txt" or die "blabla $!"; $company = $f[0]; } my @ymd1 = split ',',$f[4] //= $prev_mth_end; my @ymd2 = split ',',$f[5] //= $prev_mth_end; # ... printf OUT $pfmt, @f[0..3], fmt_mdy($f[4]), fmt_mdy($f[5]), $diff +, $free, $waive, fmt_curr($cost), $stg_chg, $sw_chg, $f[6], fmt_curr( +$cartot), $comment; }
    I don't have data and can't test in detail, there may be some errors here and there, but I am fairly sure the basic idea works and it is very simple: you change company? Just close the previous file, open a new file with the same filehandler but a new name, so that you can always write to $OUT (and nonetheless to the right file), which will be at any time associated with the right file.
Re: create separate output files based on name
by Laurent_R (Canon) on Dec 18, 2013 at 18:47 UTC

    How many companies do you have in total?

    Are all the records for one company grouped together or are they mixed? If they are mixed, can you sort them on the company name prior to processing?

    Depending on the answers, the solution might be extremely easy or (very) slightly more complicated.

      the records are grouped randomly but I can sort them by company, the number of companies can vary month to month however usually less that 12.

Re: create separate output files based on name
by sundialsvc4 (Abbot) on Dec 19, 2013 at 00:56 UTC

    As I look through the collected responses to this thread so-far, I would suggest that there are two general approaches that are being (equally seriously ...) offered:

    1. If the total number of output-files is both “truly unpredictable” and “can be counted-on to be small,” then it is possible to have all of the possible output-files open at the same time.   As long as you are sure that the operating system won’t object (fatally... as operating-systems are wot to do when their Godly Prerogatives are crossed by Mere Mortals), then you can simply throw each incoming record into the appropriate (simultaneously...) open bucket.
    2. If this is not the case, then you probably are going to need to sort the incoming records first.   This, by definition, will cause all records having an identical key-value to be physically adjacent ... so that all of the records that are destined for any particular destination are adjacent ... so that you can meaningfully react to a change in the destination, with no need to remember history.   The advantage of this approach is, of course, that there is never more than one destination-bucket (file..) open at any one time.   The disadvantage is “the overhead of sorting.”   (Which may, actually, be quite acceptable.   There is, indeed, a reason why one of Dr. Knuth’s seminal books was titled:   Sorting and Searching ...)

      Yes, exactly, it is for the purpose of selecting one of these two approaches that i asked the question above.

        Thanks so much for all the input.