Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

create separate output files based on name

by rruser (Acolyte)
on Dec 18, 2013 at 18:02 UTC ( #1067670=perlquestion: print w/replies, xml ) Need Help??
rruser has asked for the wisdom of the Perl Monks concerning the following question:

I am currently processing a file that contains multiple companies and when complete I am manually splitting out based on Company name $f[0]. It would be a great timesaver if I could create a separate output file based on the Company name field $f[0]. Something like COMPANY-A.TXT, COMPANY-B.TXT, etc.

Thanks Perl Monks I appreciate your time and help

sample of company data: COMPANY-A, COMPANY-B, COMPANY-C, etc.
while (<$file>) { my @f = split '\s+', $_; # if ($f[0] =~ (/COMPANY-A/)) { # this is just for 1 company +and not efficient normally i remark this out if ( $f[0] =~ (/-C$/)) { # this process carryover records my @ymd1 = split ',',$f[4] //= $prev_mth_end; # fill empty +dates with previous month end my @ymd2 = split ',',$f[5] //= $prev_mth_end; my $diff = Delta_Days(@ymd1, @ymd2) +1; # total days my $prev = $prev_days{$f[2]} //= 0; my $amt = ($diff + $prev > 3) ? $diff + $prev - 3 : 0; # d +ays to be charged my $cost = ($amt) * 100; my $free = $diff - $amt; my $stg_chg = "N/A"; #static field my $sw_chg = "N/A"; #static filed my $waive = " "; my $cartot = ($cost + $stg_chg + $sw_chg); my $comment = "Carry over from last month"; my $pfmt = "%-12s %-5s %-8s %-5s %-15s %-15s %-6s %-6s %-8s + %-12s %-12s %-12s %-8s %-12s %-40s\n"; printf OUT $pfmt, @f[0..3], fmt_mdy($f[4]), fmt_mdy($f[5]), + $diff, $free, $waive, fmt_curr($cost), $stg_chg, $sw_chg, $f[6], fmt +_curr($cartot), $comment; } #process non-carryover records else { my @ymd1 = split ',',$f[4] //= $prev_mth_end; # fill empty dat +es with previous month end my @ymd2 = split ',',$f[5] //= $prev_mth_end; my $diff = Delta_Days(@ymd1, @ymd2); my $prev = $prev_days{$f[2]} //= 0; my $amt = ($diff + $prev > 3) ? $diff + $prev - 3 : 0; my $cost = ($amt * 100); my $free = $diff - $amt; my $stg_chg = "N/A"; #static field my $sw_chg = "N/A"; #static field my $waive = ""; my $cartot = ($cost + $stg_chg + $sw_chg); my $comment = " "; my $pfmt = "%-12s %-5s %-8s %-5s %-15s %-15s %-6s %-6s %-8s %- +12s %-12s %-12s %-8s %-12s %-22s\n"; printf OUT $pfmt, @f[0..3], fmt_mdy($f[4]), fmt_mdy($f[5]), $d +iff, $free, $waive, fmt_curr($cost), $stg_chg, $sw_chg, $f[6], fmt_cu +rr($cartot), $comment; } # } }

Replies are listed 'Best First'.
Re: create separate output files based on name
by roboticus (Chancellor) on Dec 18, 2013 at 18:14 UTC


    Try using a hash table to hold your open file handles, using the company name as the key.

    For each record, check whether you have an open file handle. If you do, then write the record. Otherwise open the file, store it in the hash, and then write the record.


    When your only tool is a hammer, all problems look like your thumb.

Re: create separate output files based on name
by Laurent_R (Abbot) on Dec 18, 2013 at 22:26 UTC
    OK, assuming you have sorted your records by company, you can have something like this:
    my $company = "unlikelyname"; my $OUT; while (<$file>) { my @f = split '\s+', $_; if ($f[0] !~ /$company/) { close $OUT if defined $OUT; open $OUT, ">", "$f[0].txt" or die "blabla $!"; $company = $f[0]; } my @ymd1 = split ',',$f[4] //= $prev_mth_end; my @ymd2 = split ',',$f[5] //= $prev_mth_end; # ... printf OUT $pfmt, @f[0..3], fmt_mdy($f[4]), fmt_mdy($f[5]), $diff +, $free, $waive, fmt_curr($cost), $stg_chg, $sw_chg, $f[6], fmt_curr( +$cartot), $comment; }
    I don't have data and can't test in detail, there may be some errors here and there, but I am fairly sure the basic idea works and it is very simple: you change company? Just close the previous file, open a new file with the same filehandler but a new name, so that you can always write to $OUT (and nonetheless to the right file), which will be at any time associated with the right file.
Re: create separate output files based on name
by Laurent_R (Abbot) on Dec 18, 2013 at 18:47 UTC

    How many companies do you have in total?

    Are all the records for one company grouped together or are they mixed? If they are mixed, can you sort them on the company name prior to processing?

    Depending on the answers, the solution might be extremely easy or (very) slightly more complicated.

      the records are grouped randomly but I can sort them by company, the number of companies can vary month to month however usually less that 12.

Re: create separate output files based on name
by sundialsvc4 (Abbot) on Dec 19, 2013 at 00:56 UTC

    As I look through the collected responses to this thread so-far, I would suggest that there are two general approaches that are being (equally seriously ...) offered:

    1. If the total number of output-files is both “truly unpredictable” and “can be counted-on to be small,” then it is possible to have all of the possible output-files open at the same time.   As long as you are sure that the operating system won’t object (fatally... as operating-systems are wot to do when their Godly Prerogatives are crossed by Mere Mortals), then you can simply throw each incoming record into the appropriate (simultaneously...) open bucket.
    2. If this is not the case, then you probably are going to need to sort the incoming records first.   This, by definition, will cause all records having an identical key-value to be physically adjacent ... so that all of the records that are destined for any particular destination are adjacent ... so that you can meaningfully react to a change in the destination, with no need to remember history.   The advantage of this approach is, of course, that there is never more than one destination-bucket (file..) open at any one time.   The disadvantage is “the overhead of sorting.”   (Which may, actually, be quite acceptable.   There is, indeed, a reason why one of Dr. Knuth’s seminal books was titled:   Sorting and Searching ...)

      Yes, exactly, it is for the purpose of selecting one of these two approaches that i asked the question above.

        Thanks so much for all the input.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1067670]
Approved by taint
Front-paged by toolic
[Corion]: Meh. $effin_bad_system has an interface breakdown and then loads events in parallel with events overtaking one another instead of being processed sequentially
[shmem]: Discipulus: dunno, but we do all the time ^^
[choroba]: Discipulus I was taught so by a Londoner
[shmem]: Corion: very clear case of missing sequence number
[Corion]: shmem: Yeah. I guess they have a sequence number but distribute the events across threads or machines or whatever.
[karlgoethebier]: choroba: another chapter of "Learning English At The Monastry"?
[shmem]: Corion, well then... next issue, sequence number not a shared resource :P
[Discipulus]: shmem i'm searching it.. but failing i was sure was in Re: Let's Make PerlMonks Great Again! -- suggestions and dreams
erix recommends Vanished Kingdoms
[Corion]: shmem: Yeah, something like that. Not that that would be a solved issue. Simply process all events that come in from a single interface sequentially. Ah well.

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (13)
As of 2017-05-23 08:25 GMT
Find Nodes?
    Voting Booth?