http://www.perlmonks.org?node_id=896440

eversuhoshin has asked for the wisdom of the Perl Monks concerning the following question:

Hello dear monks,

It's raining and my head is dead. I have a text file that contains SEC filings information with dates.

For each year, I want to create a separate text file

For instance if the year is 1996 then I want to have a new separate text file that only contains SEC filings information for the year 1996

Is there an efficient way for me to do that instead of opening 10 file handles and using a bunch of if statements like if(year==1996) print $_ FILE;

.

Thank you for your time and consideration

  • Comment on Help with splitting one text file into other text files

Replies are listed 'Best First'.
Re: Help with splitting one text file into other text files
by moritz (Cardinal) on Mar 30, 2011 at 15:30 UTC
    You can store file handles in hashes, something like
    use autodie; my %handles; for (1996..2011) } open $handles{$_}, '<', "$_.sec"; } ... print { $handles{$year} } $yourdata;

      Depending on the OS and the number of simultaneous handles you need FileCache might also be of interest.

      The cake is a lie.
      The cake is a lie.
      The cake is a lie.

        Or FileCache::Handle, but what is really needed, is a lexically scoped pragma like autodie, so you can
        use autofilecache; use autodie; my @Fs; for(1..1024){ open my($f),'<',$_; push @Fs, $f; }
Re: Help with splitting one text file into other text files
by BrowserUk (Patriarch) on Mar 30, 2011 at 15:33 UTC

    How about:

    my %handles; open $handles{ $_ }, '>', "sec.$_" for 1996 .. 2010; while( <> ) { m[^(\d{4})] and print { $handles{ $1 } } $_; } close for @handles;

    That assumes that the date is the first four characters of the record; and all the dates are in the range specified.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Help with splitting one text file into other text files
by JavaFan (Canon) on Mar 30, 2011 at 15:50 UTC
    $ grep 1996 input > output.1996 $ grep 1997 input > output.1997 ... $ grep 2005 input > output.2005

      Since we haven't seen an example of the input file we have to make assumptions about how it looks. Then we have to derive solutions based on those assumptions.

      Your solution is so elegant for a simple and small data set; why look for a Perlish solution when the easiest solution is simple OS grep? But there are a couple of pitfalls I see right off the bat. First, what if '1996' appears in a commentary portion of a filing record. I don't know what his SEC filings file looks like, but what if...

      2/27/1996 - Noble Energy Corporation Based on a 1995 SEC request for information Noble Energy Corp (NE) sub +mits the following findings on its joint venture with Transocean Rese +arch for offshore rigs to be installed in the North Seas with a proje +cted completion date of 3Q 1997. ...

      Of course we really don't know what the data set looks like, so your solution may be spot on. On the other hand, if general text starts looking like trigger dates, you've got a problem.

      Secondly, your solution has to iterate over the file once for each year. If the input file isn't too large, 'who cares.' But if the input file is or may grow large, the simplest solution may turn out to be an inefficient one.

      But if it turns out your take on the problem works, I actually like it the best since it keeps simple things simple. In a way, even though it doesn't use Perl, it's the most Perlish solution.


      Dave