Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

breaking up a file by delimiter

by navalned (Acolyte)
on Oct 07, 2018 at 02:44 UTC ( #1223630=perlquestion: print w/replies, xml ) Need Help??
navalned has asked for the wisdom of the Perl Monks concerning the following question:

I have the following code that seems to work thus far. However, it just feels like there is a better way to do it. If so I'd love to see it.

use strict; use warnings; use Data::Dumper; $Data::Dumper::Indent = 0; my $file = "test"; my @data; { local $/ = undef; open FILE, "$file" or die $!; while (my $line = <FILE>) { push @data, split /XXX/, $line; # I thought it should be /^XXX$/, but that didn't work } } @data = grep /\S/, @data; # I don't understand why I had to do this print Dumper \@data;

This gives the following correct output:

$VAR1 = ['This is a test.','This is a multiline test.']

The file looks like so:

This is a test. XXX This is a multiline test. XXX

2018-10-07 Athanasius put code tags around script output and file contents

Replies are listed 'Best First'.
Re: breaking up a file by delimiter
by hippo (Canon) on Oct 07, 2018 at 09:30 UTC
    This gives the following correct output:

    $VAR1 = ['This is a test.','This is a multiline test.']

    Not on my perl. Here it gives:

    $VAR1 = ['This is a test. ',' This is a multiline test. '];

    As it's hard to know from that what you actually want, here's an SSCCE which preserves the multi-line nature of the second string but removes the trailling newlines. You can amend it to suit your actual requirements.

    use strict; use warnings; use Test::More tests => 1; my @data; my @want = ('This is a test.', "This is a multiline\ntest."); { local $/ = "\nXXX\n"; chomp (@data = <DATA>); } is_deeply (\@data, \@want); __DATA__ This is a test. XXX This is a multiline test. XXX
Re: breaking up a file by delimiter
by tybalt89 (Vicar) on Oct 07, 2018 at 11:18 UTC
    #!/usr/bin/perl # https://perlmonks.org/?node_id=1223630 use strict; use warnings; use Data::Dumper; $Data::Dumper::Indent = 0; local $/ = "\nXXX\n"; s/\nXXX\n\z//, tr/\n/ / for my @data = <DATA>; print Dumper \@data; __DATA__ This is a test. XXX This is a multiline test. XXX

      I don't generally like statement modifiers. However, this does exactly what I need. I think my main issue was I forgot while slurping my /^XXX$/ probably didn't work and I should have changed it to /\nXXX\n/.

        I don't generally like statement modifiers.
        You don't have to follow me on that, but statement modifiers can help making your code clearer and more concise.

        Consider for instance this (somewhat meaningless) example:

        while (<$IN>) { next if /^\s*$/; # remove empty lines next if /^\s*#/; # remove comments next if /ORA/i; # remove lines with Oracle warnings and errors next unless some_condition($_); last if /File processed/; # ... now do the actual processing of useful lines }
        If you wanted to do the same with regular conditionals, you would need about three times as many code lines, and it would probably end up being less clear.

        Update: added a missing closing parenthesis to the while condition.<\small>

Re: breaking up a file by delimiter
by jwkrahn (Monsignor) on Oct 07, 2018 at 04:55 UTC
    use strict; use warnings; use Data::Dumper; $Data::Dumper::Indent = 0; my $file = 'test'; my @data; { local $/; open my $FILE, '<', $file or die "Cannot open '$file' because: $!" +; @data = grep /\S/, split /XXX\n/, <$FILE>; } print Dumper \@data;

      This works great! Thanks!!

Re: breaking up a file by delimiter
by eyepopslikeamosquito (Chancellor) on Oct 07, 2018 at 06:07 UTC

    I would write it something like this:

    use strict; use warnings; use Data::Dumper; $Data::Dumper::Indent = 0; my $file = "test"; # Use lexical file handles and 3-argument form of open # See http://modernperlbooks.com/mt/2010/04/three-arg-open-migrating-t +o-modern-perl.html open my $fh, '<', $file or die "error opening '$file': $!"; # Slurp file contents into string $contents # See http://modernperlbooks.com/mt/2009/08/a-one-line-slurp-in-perl-5 +.html my $contents = do { local $/; <$fh> }; close $fh; # Split contents into fields by XXX on a line by itself (note the /m m +odifier) my @fields = split /^XXX$/m, $contents; # Tidy up newlines/spaces in multiline fields for (@fields) { tr/\n/ /s; s/^ +//; s/ +$//; } print Dumper \@fields;

    Update: See Also

Re: breaking up a file by delimiter
by Marshall (Abbot) on Oct 07, 2018 at 05:14 UTC
    When you undef the input record separator, you will get the whole input file. I don't think that is the best way. Note that the line endings (\n) would be included in this, that's why you need your grep on non-white space.

    I am not at a computer where I can test, but is something like this what you want? Untested:

    while (my $line = <$input>) { if ($line =~ /^XXX$/) { print "\n"; } else { chomp $line; print $line; } } print "\n"; #may need a final one of these
    In general, I prefer to process files line by line instead of creating an in memory copy of the whole thing, but with computer memory what it is nowadays, this doesn't matter so much.
Re: breaking up a file by delimiter
by Anonymous Monk on Oct 07, 2018 at 08:20 UTC
    Use the built-in Perl feature for input record separator splitting, perhaps?
    use warnings; use strict; use Data::Dumper; { local $/ = "\nXXX\n"; while (my $record = <DATA>) { # remove the separator from the record $record =~ s{\Q$/\E$}{}; print Dumper $record; # process the record here } }; __DATA__ This is a test. XXX This is a multiline test. XXX
    $VAR1 = 'This is a test.'; $VAR1 = 'This is a multiline test.';
    This may be useful if your file becomes impractical to fit in the RAM and you only need to process it record-by-record.

      This works really well. Thanks!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1223630]
Approved by Athanasius
Front-paged by cavac
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2018-12-19 09:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    How many stories does it take before you've heard them all?







    Results (85 votes). Check out past polls.

    Notices?