Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Extracting selected fields form file record

by Anonymous Monk
on Feb 06, 2022 at 12:42 UTC ( [id://11141167]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello, is there a way in Perl I can extract with a one-line instruction some selected fields from a file record with arbitrary space separated fields (spaces before and after, e.g. " a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 \n"), like so?
(a,b,c,d,e,f) = [line.split()[i] for i in (0,1,3,5,7,9)]
Thank you

Replies are listed 'Best First'.
Re: Extracting selected fields form file record
by soonix (Canon) on Feb 06, 2022 at 13:04 UTC
      another way is filling in undef for the ignored fields

      my ($first,$second,undef,$third,undef,$fourth,undef,$fifth,undef,$sixth) = split / /, $line;

      please also note that split operates on regexes, so /\s+/ might be what is really wanted.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        A single space character (' ') is a special case. I admit that it is a bit buried in the split documentation. Of course, only at most OP knows whether leading whitespace should be kept…
Re: Extracting selected fields form file record
by kcott (Archbishop) on Feb 06, 2022 at 18:04 UTC
    "(a,b,c,d,e,f) = [line.split()[i] for i in (0,1,3,5,7,9)]"
    $ perl -Mstrict -Mwarnings -e '(a,b,c,d,e,f) = [line.split()[i] for i +in (0,1,3,5,7,9)]' Bareword "line" not allowed while "strict subs" in use at -e line 1. syntax error at -e line 1, near ")[" Execution of -e aborted due to compilation errors.

    So, step one would be to learn Perl. See "Perl introduction for beginners".

    "... extract with a one-line instruction ..."

    Ask yourself why you think this requirement is necessary. It rarely has any benefits. It will often reduce readability and, as such, make your code more error-prone.

    "... extract ... from a file record ..."

    For records with fixed-width records, use unpack". See the perlpacktut tutorial; the "Packing Text" section has an example showing exactly how to do this.

    For records with variable-width records, use split. Do be aware of these differences (the linked documentation has details):

    $ perl -E '$_ = " a0 a1 a2 \n"; my ($x, @y) = split; say "|$x|@y|";' |a0|a1 a2| $ perl -E '$_ = " a0 a1 a2 \n"; my ($x, @y) = split " "; say "|$x|@y|" +;' |a0|a1 a2| $ perl -E '$_ = " a0 a1 a2 \n"; my ($x, @y) = split / /; say "|$x|@y|" +;' ||a0 a1 a2 | $ perl -E '$_ = " a0 a1 a2 \n"; my ($x, @y) = split /\s+/; say "|$x|@y +|";' ||a0 a1 a2|
    "... extract ... some selected fields ..."

    There are a variety of ways to achieve this. The best one to choose will probably depend on how you want to subsequently process the selected fields. Here are a couple of examples:

    my @wanted = (extraction_function($string))[0, 1, 3]; my ($f1, $f2, undef, $f3) = extraction_function($string);

    — Ken

Re: Extracting selected fields form file record (updated)
by LanX (Saint) on Feb 06, 2022 at 18:56 UTC
    ehm ...

    > > spaces before and after

    the spaces before are IMHO best dealt by stripping them before splitting. °

    $line =~ s/^\s+//;

    Even your pseudo python code can't do this in a one-liner with split (IMHO).

    But more importantly your definition of "field" is fuzzy now.

    Please clarify

    • how do you allow empty fields?
    • are all whitespace characters as separator allowed (like tab...)?
    • are multiple whitespace characters as separator allowed?
    update

    provided there are no "empty fields" and "multiple whitespaces" are allowed as separators:

    You can use a regex like /(\S+)/g ( \S is non-whitespace, the opposite of \s)

    Debugger demo:

    DB<35> $line = " a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 \n" DB<36> x ($line =~ /(\S+)/g)[0,1,3,5,7,9] 0 'a0' 1 'a1' 2 'a3' 3 'a5' 4 'a7' 5 'a9' DB<37>

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    update

    °) or by using the magic of ' ' soonix showed us here.

      We can omit parentheses, i.e. capture group number 1 isn't necessary.
      my( $first, $second, $third, $fourth, $fifth, $sixth ) = ( $line =~ m/ +\S+/g )[ 0, grep { $_ % 2 == 1 } 1 .. 9 ];
      Thanks everyone. Using ' ' is working as I need on records from an ASCII file.
      And I'm going to load the records with File::Slurp, so I don't care of chomp:
      use File::Slurp qw(read_file); my @lines = read_file('/path/file',chomp=>1);

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11141167]
Approved by marto
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2024-04-26 09:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found