Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Cool way to parse Space Separated Value and CSV files

by greengaroo (Hermit)
on Apr 09, 2013 at 18:24 UTC ( [id://1027798]=CUFP: print w/replies, xml ) Need Help??

As a programmer and teacher of the Perl programming language, I often get destabilizing questions. In one of the last class I gave, while I was talking about hashes, someone asked me "What is it used for? When would I ever need that?" Of course, for me (and you too, probably) hashes are quite practical, but being told that, on the spot, I didn't know what to say, so I talked about the %ENV hash and made an example with it.

Today I found an interesting use for hashes. I wish I would have thought of it during my class but I didn't, so I would like to share it with you for the benefit of newer Perl programmers.

Imagine you have to read a Space Separated Value file or Comma Separated Value (CSV) file. It's easy because the fields are always in the same order. For example:

# firstname lastname age joe builder 9 bob plumber 66 dora squarepants 10 diego simpson 11

You can do this:

open( $l, "<file" ) || die "Error : $!"; my @lines = <$l>; close( $l ); foreach my $line ( @lines ) { # Skipping if the line is empty or a comment next if ( $line =~ /^\s*$/ ); next if ( $line =~ /^\s*#/ ); my ($firstname, $lastname, $age) = split( /\s+/, $line ); # then do whatever you have to }

But then someday someone give you a new file with the fields in a different order plus new extra fields you don't need. Here is the new file:

# lastname firstname age gender phone mcgee bobby 27 M 555-555-5555 kincaid marl 67 M 555-666-6666 hofhazards duke 22 M 555-696-6969

What do you do? Do you change your code with a if statement? Do you alter the file to change the order of the fields and remove the extra fields? No! You use hashes!

Here is the solution:

open( $l, "<file" ) || die "Error : $!"; my @lines = <$l>; close( $l ); my @keys = split( /\s+/, $lines[0] ); shift( @keys ); # to remove the # as the first field foreach my $line ( @lines ) { # Skipping if the line is empty or a comment next if ( $line =~ /^\s*$/ ); next if ( $line =~ /^\s*#/ ); my %hash; @hash{ @keys } = split( /\s+/, $line ); # then do whatever you have to }

Note that the first line in the file is important, it gives you the order of the fields. Even if it's not there when you receive the file, you can easily add it. Note the @hash{ } syntax. This is called a slice. You are slicing the hash using the array form, basically to access a list of element from the hash. The @keys array contains a list of keys in the same order written at the top of the file therefore, doing @hash{ @keys } is like doing @hash{ qw(lastname firstname age gender phone) } or @hash{ 'lastname', 'firstname', 'age', 'gender', 'phone' } except it doesn't matter if the fields in the file are not always in the same order as in the previous file.

The split of the line returns a list so doing this:

@hash{ @keys } = split( /\s+/, $line );

is the same as this:

@hash{'lastname', 'firstname', 'age', 'gender', 'phone' } = split( /\s+/, $line );

or this:

($hash{'lastname'}, $hash{'firstname'}, $hash{'age'}, $hash{'gender'}, $hash{'phone'}) = split( /\s+/, $line );

Also if some fields are not needed, you don't care. As long as all the required fields are there, your code will always work.

I hope this will be useful for you someday! Good luck!

A for will get you from A to Z; a while will get you everywhere.

Replies are listed 'Best First'.
Re: Cool way to parse Space Separated Value and CSV files
by Anonymous Monk on Apr 10, 2013 at 07:18 UTC

      Hashes are dictionaries

      That is a dam good explanation! I will use it in my class! Never thought of it! Thank you very much!

      A for will get you from A to Z; a while will get you everywhere.
Re: Cool way to parse Space Separated Value and CSV files
by johngg (Canon) on Apr 12, 2013 at 22:53 UTC

    Since you state that the first line in the file is important it might be as well to treat it differently by assigning it to a separate scalar variable. Also, when you split the header to get the column names you could save having to do the shift by assigning the first value to undef which can act as a sort of programmatic bit bucket.

    use strict; use warnings; use 5.014; use Data::Dumper; open my $inFH, q{<}, \ <<EOD or die qq{open: < HEREDOC: $!\n}; # lastname firstname age gender phone mcgee bobby 27 M 555-555-5555 kincaid marl 67 M 555-666-6666 # comment hofhazards duke 22 M 555-696-6969 EOD my( $header, @lines ) = <$inFH>; close $inFH or die qq{close: < HEREDOC: $!\n}; my( undef, @keys ) = split m{\s+}, $header; foreach my $line ( @lines ) { next if $line =~ m{(?x) ^ \s* (?: (?-x:#) | $ )}; my %hash; @hash{ @keys } = split m{\s+}, $line; print Data::Dumper->Dumpxs( [ \ %hash ], [ qw{ *hash } ] ); }

    The output.

    %hash = ( 'firstname' => 'bobby', 'lastname' => 'mcgee', 'phone' => '555-555-5555', 'age' => '27', 'gender' => 'M' ); %hash = ( 'firstname' => 'marl', 'lastname' => 'kincaid', 'phone' => '555-666-6666', 'age' => '67', 'gender' => 'M' ); %hash = ( 'firstname' => 'duke', 'lastname' => 'hofhazards', 'phone' => '555-696-6969', 'age' => '22', 'gender' => 'M' );

    The technique falls to pieces somewhat when your space-separated files contain fields or headers containing spaces, or a CSV file with commas dotted around. You would then reach for something like Text::CSV.

    I hope this is of interest.

    Cheers,

    JohnGG

      How to take input from user rather than hard coding for this solution

        I assume that you want to run the program on a user-specified file instead of the hardcoded file?

        Command line parameters are available in the @ARGV array, see perlvar.

        You can read user input from STDIN, like my $filename = <STDIN>;

        Once you have the filename, modify the open statement to use a filename instead of opening a here-document.

        A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://1027798]
Approved by LanX
Front-paged by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (12)
As of 2024-04-16 07:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found