Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

splitting files into multiple sections to be processed as a file

by firstchance (Initiate)
on Nov 29, 2011 at 22:26 UTC ( #940712=perlquestion: print w/ replies, xml ) Need Help??
firstchance has asked for the wisdom of the Perl Monks concerning the following question:

firstchance: Is there a method that will allow me to easly split a file into multiple files by blank line as seperator. Perhaps a module that will make it easy

firstchance: can perl create virtual files for this process with the help of CPAN

tye: $/, 'paragraph mode' ?

Any advances?

;>

Comment on splitting files into multiple sections to be processed as a file
Re: splitting files into multiple sections to be processed as a file
by CountZero (Bishop) on Nov 29, 2011 at 22:56 UTC
    What do you understand by "virtual files"?

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      I guess I mean where subcontents of a file are held in memory and then are processed as if they were infact files.
        You can save the subcontents in an array, that is easy.
        use Modern::Perl; use Data::Dump qw/dump/; my @subsections; { local $/ = "\n\n"; @subsections = <DATA> } say dump(@subsections); __DATA__ subsection 1 line 1 subsection 1 line 2 subsection 1 line 3 subsection 1 line 4 subsection 2 line 1 subsection 2 line 2 subsection 2 line 3 subsection 3 line 1 subsection 3 line 2 subsection 3 line 3 subsection 4 line 1 subsection 4 line 2
        But what file-specific operations do you want to perform on the subcontents?

        Update: You can use a scalar reference as a "virtual" (or in-memory) file (since Perl v5.8.0.)

        open my $fh, '<', \$subsections[1]; print while <$fh>;

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        You really need to clarify, since the answer to any plausible meaning I can infer from your question is Yes!" All that's required is that you apply yourself to specifying the steps by which you'd do this if the file were a book; searching this site (and others) for keywords growing out of that exercise (restricting the search to Perl) and learning how to tell Perl to do what you want.

        So please, expand and be explicit: does "subcontents" equal the various fragments (elements of an array) created by splitting on blank lines? If so, how do those elements, "held in memory" fail to satisfy your requirements? What processing do you have in mind that has to occur "as if they were infactfiles?" What is the desired outcome? What's the big picture on your (woefully inadequate, to date) specs?

        And what research (asking the question in the CB and repeating it as a SOPW question doesn't count) have you done to find your own solution.

Re: splitting files into multiple sections to be processed as a file
by Khen1950fx (Canon) on Nov 29, 2011 at 23:38 UTC
    Take a look at Inline::Files.
    #!/usr/bin/perl use strict; use warnings; use Inline::Files; while(<FILE>) { print "FILE: $_"; } __FILE__ File 1 Hello __FILE__ File 2 World __OTHER_FILE__ Other File 1 Good-bye World! __FILE__ File 3 Hello again! __END__
      I really need a module called Outline::Files then!
Re: splitting files into multiple sections to be processed as a file
by GrandFather (Cardinal) on Nov 30, 2011 at 00:02 UTC

    We can probably give you better help if you tell us why you want to do this. However the following may be of use:

    use strict; use warnings; my $file = <<FILE; firstchance: Is there a method that will allow me to easly split a fil +e into multiple files by blank line as seperator. Perhaps a module th +at will make it easy firstchance: can perl create virtual files for this process with the h +elp of CPAN tye: $/, 'paragraph mode' ? Any advances? ;> FILE open my $in, '<', \$file; local $/ = "\n\n"; my @parts = <$in>; chomp @parts; print "----------\n$_\n" for @parts;

    Prints:

    ---------- firstchance: Is there a method that will allow me to easly split a fil +e into multiple files by blank line as seperator. Perhaps a module th +at will make it easy ---------- firstchance: can perl create virtual files for this process with the h +elp of CPAN ---------- tye: , 'paragraph mode' ? ---------- Any advances? ---------- ;>

    Note the use of an "in memory" file using $file (as suggested by CountZero) and the use of the special variable $/ to read blank line separated blocks as records (as suggested by tye).

    True laziness is hard work
Re: splitting files into multiple sections to be processed as a file
by sundialsvc4 (Monsignor) on Nov 30, 2011 at 01:03 UTC

    I believe that what you are referring to is memory-mapped files, where the contents of a file (or a window into its entirety) actually is mapped into a process’s virtual-address space such that page-faults to that area are resolved by the contents of the file (window).   A quick search of the term “mapped” at http://search.cpan.org immediately shows Win32::MMF as the top hit.

    Another thing that pops into my head is that you could split a file pretty-well just by fseek()ing into it at an approximate location, then reading forward or backward until you hit a newline \n character or what-have-you.   You know where you positioned to, and you know how many bytes you had to read to find a newline, so you therefore know the actual split-point position.   If you know the file’s contents reasonably well, this simple strategy should work just fine.

    Logic to process arbitrary files “line by line” really doesn’t have to be complicated:   just read it a chunk at a time.   You don’t need to fool with memory-mapping.   When your search for “the next newline” comes up empty-handed, move the unprocessed content to the top of the buffer and then read enough bytes to fill the buffer back up again.   And “CPAN to the rescue” again, e.g. with Text::Buffer.

Re: splitting files into multiple sections to be processed as a file
by Marshall (Prior) on Nov 30, 2011 at 05:23 UTC
    I would be very helpful if you could explain why you want to do this, ie, the application?

    If you want to split some big file because it doesn't fit on a CD, just open the input file and start writing to an output file. Keep track of the number of bytes written, when its too big, start a new file (outfile_part1, outfile_part2, etc).

    I think there is a *nix utility that does file splitting, but not necessarily on line boundaries.

    I have no idea at all what you mean or intend by "virtual file".

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://940712]
Approved by muba
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2014-07-29 07:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (211 votes), past polls