Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^7: Parse .csv file from FTP using Perl

by muba (Priest)
on Jun 21, 2012 at 01:53 UTC ( [id://977532]=note: print w/replies, xml ) Need Help??


in reply to Re^6: Parse .csv file from FTP using Perl
in thread Parse .csv file from FTP using Perl

Alright, here we go. For these examples I assume you have a file some_csv.csv, with the following contents:

Name,Type,Age Lily,Cat,4 Buster,Dog,5 Tweety,Bird,2

The first approach is the 'naive' one. It does exactly what you want as specified, but it looks a bit messy.

use strict; use warnings; use Text::CSV; use subs qw(hlookup); print "The animal in line 2 is ", hlookup("some_csv.csv", "Age", 2), " + years old.\n"; print "The critter in line 4 is called ", hlookup("some_csv.csv", "Nam +e", 4), ".\n"; sub hlookup { my $csv_file = shift; my $field_name = shift; my $row_num = shift; my $csv = Text::CSV->new ( { binary => 1 } ) # should set binary + attribute. or die "Cannot use CSV: ".Text::CSV->error_diag (); open my $fh, "<:encoding(utf8)", $csv_file or die "test.csv: $!"; my $field_num; my $first_row = $csv->getline( $fh ); my @result; for (my $idx = 0; $idx <= @$first_row; $idx++) { if ($first_row->[$idx] eq $field_name) { $field_num = $idx; push @result, $first_row->[$field_num]; last; } } die "Field name '$field_name' not found on the first row of $csv_ +file" if not defined $field_num; while ( my $row = $csv->getline( $fh )) { push @result, $row->[$field_num]; } $csv->eof or $csv->error_diag(); close $fh; return $result[$row_num - 1]; # -1 since array indexes are 0-bas +ed, # and we count our lines 1-based }

The second approach used Text::CSV's ability to give fields names. What I did here was read the first row, tell Text::CSV that the values of the fields are actually the names for that field, and then I told Text::CSV to get me the field of the row I'm interested in. It's a little bit clearer, and most of the looping is delegated to Text::CSV. I like it when my modules do my heavy lifting for me.

use strict; use warnings; use Text::CSV; use subs qw(hlookup); print "The animal in line 2 is ", hlookup("some_csv.csv", "Age", 2), " + years old.\n"; print "The critter in line 4 is called ", hlookup("some_csv.csv", "Nam +e", 4), ".\n"; sub hlookup { my $csv_file = shift; my $field_name = shift; my $row_num = shift; my $csv = Text::CSV->new ( { binary => 1 } ) # should set binary + attribute. or die "Cannot use CSV: ".Text::CSV->error_diag (); open my $fh, "<:encoding(utf8)", $csv_file or die "test.csv: $!"; my $field_num; my $first_row = $csv->getline( $fh ); $csv->column_names(@$first_row); my $all = $csv->getline_hr_all($fh, $row_num - 2, 1); # - 2 since array indexes are 0-based, # and we count our lines 1-based, # and because we've already read the first # line. return $all->[0]->{$field_name} }

Both examples give me the output I will list below. I've also tried dabbling with DBD::CSV, because I thought it'd be nice if we could extract our information with SQL! But that's a module that allows you to set up entire relational database in CSV files, which is pretty much overkill for what we're trying to do here.

Anyway, I'm sure these aren't the only approaches to go at things, but it surely should get you going. And you owe me nothing, it was my pleasure. But if you insist, I can send you my paypal account through a private message ;)

The animal in line 2 is 4 years old. The critter in line 4 is called Tweety.

Replies are listed 'Best First'.
Re^8: Parse .csv file from FTP using Perl
by Sherlock Perl (Novice) on Jun 21, 2012 at 01:58 UTC
    Thank you so much. I will try both of the methods and choose whichever I find easier to understand. Do you think it would be easy to alter the code if for example the two rows with values start not from the beginning but from some other cell. For example they are in the range "d4:n5" or any other similar range? If you are in nyc your beer or whatever you drink is on me :)

      Oh yeah, it's definitely possible. Especially the second approach, because the ->getline_hr_all method allows you pretty much to limit the rows Text::CSV should look into. As for limitting things horizontally, that will prove to be a little more tricky, but it isn't impossible at all. I'll leave that up to you, though. You'll eventually end up with something that's a mix of both of my approaches, I think.

      Just read the documentation for Text::CSV - it's pretty elaborate but written in a clear and understandable manner. FYI: I never used that module until just now, so working with Text::CSV was new to me, and I figured it out just by following the documentation.

        Thank you again, I am sure I will figure it out. I will make sure to post the final version so people with similar questions can benefit :)
        Hi I tried the code and the 2nd one is really preferable (I even understand it better than the first option). However, I have been trying to change so that for example I give the code the reference "Cat" and it returns me "Dog". What I mean is that it searches the 2nd row for the position not the first one and I cannot do it yet. I have been reading the Text::CSV_XS file but I find it a bit hard to grasp (I guess beucase I am still a little confused with the Perl way of working/thinking). I understand that I need to make the change in either getline() OR getline_hr_all but not sure where and what. I think all I have to do is make getline read from the 2nd line for example but I don't know how to do it. Any suggestions? Thanks in advance!

        EDIT: I figured it out. What i did was instead of using getline() I used getline_all and then all I have to is offset as much as I want. Thank you all, if I have more questions I will feel free to ask :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://977532]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2024-03-19 04:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found