Help with splitting one text file into other text files

eversuhoshin has asked for the wisdom of the Perl Monks concerning the following question:

Hello dear monks,

It's raining and my head is dead. I have a text file that contains SEC filings information with dates.

For each year, I want to create a separate text file

For instance if the year is 1996 then I want to have a new separate text file that only contains SEC filings information for the year 1996

Is there an efficient way for me to do that instead of opening 10 file handles and using a bunch of if statements like if(year==1996) print $_ FILE;

Thank you for your time and consideration

Comment on Help with splitting one text file into other text files

Replies are listed 'Best First'.
Re: Help with splitting one text file into other text files by moritz (Cardinal) on Mar 30, 2011 at 15:30 UTC
You can store file handles in hashes, something like `use autodie; my %handles; for (1996..2011) } open $handles{$_}, '<', "$_.sec"; } ... print { $handles{$year} } $yourdata;` [download] Perl 6 - second systems done right	[reply] [d/l]
Re^2: Help with splitting one text file into other text files by Fletch (Bishop) on Mar 30, 2011 at 15:35 UTC
Depending on the OS and the number of simultaneous handles you need FileCache might also be of interest. The cake is a lie. The cake is a lie. The cake is a lie.	[reply]
Re^3: Help with splitting one text file into other text files by Anonymous Monk on Mar 30, 2011 at 15:43 UTC
Or FileCache::Handle, but what is really needed, is a lexically scoped pragma like autodie, so you can `use autofilecache; use autodie; my @Fs; for(1..1024){ open my($f),'<',$_; push @Fs, $f; }` [download]	[reply] [d/l]
Re: Help with splitting one text file into other text files by BrowserUk (Patriarch) on Mar 30, 2011 at 15:33 UTC
How about: `my %handles; open $handles{ $_ }, '>', "sec.$_" for 1996 .. 2010; while( <> ) { m[^(\d{4})] and print { $handles{ $1 } } $_; } close for @handles;` [download] That assumes that the date is the first four characters of the record; and all the dates are in the range specified. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re: Help with splitting one text file into other text files by JavaFan (Canon) on Mar 30, 2011 at 15:50 UTC
`$ grep 1996 input > output.1996 $ grep 1997 input > output.1997 ... $ grep 2005 input > output.2005` [download]	[reply] [d/l]
Re^2: Help with splitting one text file into other text files by davido (Cardinal) on Mar 30, 2011 at 17:31 UTC
Since we haven't seen an example of the input file we have to make assumptions about how it looks. Then we have to derive solutions based on those assumptions. Your solution is so elegant for a simple and small data set; why look for a Perlish solution when the easiest solution is simple OS grep? But there are a couple of pitfalls I see right off the bat. First, what if '1996' appears in a commentary portion of a filing record. I don't know what his SEC filings file looks like, but what if... `2/27/1996 - Noble Energy Corporation Based on a 1995 SEC request for information Noble Energy Corp (NE) sub +mits the following findings on its joint venture with Transocean Rese +arch for offshore rigs to be installed in the North Seas with a proje +cted completion date of 3Q 1997. ...` [download] Of course we really don't know what the data set looks like, so your solution may be spot on. On the other hand, if general text starts looking like trigger dates, you've got a problem. Secondly, your solution has to iterate over the file once for each year. If the input file isn't too large, 'who cares.' But if the input file is or may grow large, the simplest solution may turn out to be an inefficient one. But if it turns out your take on the problem works, I actually like it the best since it keeps simple things simple. In a way, even though it doesn't use Perl, it's the most Perlish solution. Dave	[reply] [d/l]

Back to Seekers of Perl Wisdom