Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

CSV Parse on filehandle

by ruoso (Curate)
on Oct 05, 2005 at 18:58 UTC ( [id://497704]=CUFP: print w/replies, xml ) Need Help??

Well...

I have a lot of gzipped csv files with the data from a legacy system I have to import. I don't want to uncompress them on disk and I don't want to read them entirely in memory. Also, I have to support multiline rows in the csv files (enclosed by a string delimiter).

After seeking for the modules, I didn't see anyone which could receive a filehandle that is a pipe from gunzip, neighter that support multiline rows... So, I end with the following code:

This is the parser...

package CSVParse; use strict; use warnings; sub new { my $self = shift; $self = { string_delim => '"', escape_char => "\\", field_delim => ',', reg_delim => "\n" }; bless $self, "CSVParse"; $self->{fh} = shift; return $self; } # supomos que ele esteja no início de uma coluna! sub fetch_column { my $self = shift; my $context = 'raw_data'; my @contexts = (); my $data = undef; while (1) { # char buffer my $buf; read($self->{fh},$buf,1) or do { $self->{EOF} = 1; last; }; $self->{last_char_read} = $buf; if ($context eq 'escape') { $data = '' unless defined $data; $data .= $buf; $context = shift @contexts; } elsif ($context eq 'string') { if ($buf eq $self->{string_delim}) { $context = shift @contexts; } else { $data = '' unless defined $data; $data .= $buf; } } else { if ($buf eq $self->{escape_char}) { push @contexts, $context; $context = 'escape'; } elsif ($buf eq $self->{string_delim}) { push @contexts, $context; $context = 'string'; } elsif ($buf eq $self->{field_delim} || $buf eq $self->{reg_delim}) { # voltar um caractere seek($self->{fh},0,tell($self->{fh})-1); # sair do loop. last; } else { $data = '' unless defined $data; $data .= $buf; } } } return $data; } sub fetch_row { my $self = shift; if ($self->{EOF}) { return undef; } my @cols = (); # supomos que ele comece numa posição OK while (1) { my $col = $self->fetch_column(); last if $self->{EOF}; push @cols, $col; if ($self->{last_char_read} eq $self->{reg_delim}) { # sair do loop. last; }# elsif ($buf eq ($self->{field_delim})) { next; } else { nex +t; } } return \@cols; } sub parse_file { my $self = shift; my @rows = (); while (1) { my $cols = $self->fetch_row(); last unless defined $cols; push @rows,$cols; } return @rows; } 1;

And this is a sample code...

open my $tabelaclientes, "gunzip -c somefile.csv.gz|" || die $!; my $csv = CSVParse->new($tabelaclientes); while (1) { my $row = $csv->fetch_row(); last unless defined $row; for (@$row) { utf8::decode($_); } print join(",",@$row)."\n"; } close $tabelaclientes;
daniel

Replies are listed 'Best First'.
Re: CSV Parse on filehandle
by idsfa (Vicar) on Oct 06, 2005 at 16:24 UTC

    Or

    use Text::xSV; use PerlIO::gzip; open my $gzin, '<:gzip', "filename" or die $!; my $csv = Text::xSV->new('fh'=>$gzin); . . .

    See Text::xSV and PerlIO::gzip for more details.


    The intelligent reader will judge for himself. Without examining the facts fully and fairly, there is no way of knowing whether vox populi is really vox dei, or merely vox asinorum. -- Cyrus H. Gordon

      Hmmmm... well.. Now it's done anyway...

      but... c'mon, Text::xSV... what a intuitive name uh?

      daniel

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://497704]
Approved by graff
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2024-04-23 11:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found