Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Help parsing a complicated csv

by rmfin730 (Initiate)
on Apr 25, 2011 at 17:56 UTC ( #901226=perlquestion: print w/ replies, xml ) Need Help??
rmfin730 has asked for the wisdom of the Perl Monks concerning the following question:

Im some what a perl noob so bear with me on this one. Im trying to parse out a csv file and store the contents of the various columns in a hash. The problem is that everywhere I have looked the various solutions are talking about csv files where the top row is the column headings and the columns are all in one line kinda like:
heading 1, heading 2, heading 3 data, data, data data, data, data
I have a csv file were the different columns and headings are all over the place, such as:
heading 1, heading 2 data, data data, heading 3 data heading 4, heading 5 data, data
Not sure if this will make sense or not, but basically I want to be able to call the heading name and get data that is in its column, sometimes there will be 20 lines of data and sometimes it just one data item in the column. Once i am able to store the data and call it I want to be able to create an xml file using the data. All of the heading names are enclosed with <> if that helps. Any ideas on a simple way to do this?

Comment on Help parsing a complicated csv
Select or Download Code
Re: Help parsing a complicated csv
by holli (Monsignor) on Apr 25, 2011 at 18:20 UTC
    It's certainly possible. This is Perl after all :)

    However, some sample data would help.


    holli

    You can lead your users to water, but alas, you cannot drown them.
Re: Help parsing a complicated csv
by molecules (Monk) on Apr 25, 2011 at 18:21 UTC
Re: Help parsing a complicated csv
by Tux (Monsignor) on Apr 25, 2011 at 18:42 UTC

    The parsing of the CSV data as such should not be too hard a problem. Just use Text::CSV_XS or Text::CSV. Now the only thing left to do is to come up with a criterium (or criteria) than make the headers stand out/be different from "normal" data.

    my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1}); my @hdr; while (my $row = $csv->getline ($fh)) { # example: header lines start with A-Z, data doesn't if (!@hdr or $row->[0] =~ m/^[A-Z]/) { @hdr = @{$row}; next; } # just an example my %hash; @hash{@hdr} = @$row; }

    Enjoy, Have FUN! H.Merijn
Re: Help parsing a complicated csv
by linuxer (Deacon) on Apr 25, 2011 at 19:54 UTC

    Well, you could have checked the format of your text, because it propably doesn't look like you intended.

    If you would have used <c></c>-tags around your sample data, the format would be visible.

    Directly below the form fields, where you compose your questions, are some text and several links, which advise you how to mark up your question, code and data.

    With code-Tags your examples could look like this:

    your 1st example:

    heading 1, heading 2, heading 3 data, data, data data, data, data

    your 2nd example:

    heading 1, heading 2 data, data data, heading 3 data heading 4, heading 5 data, data

      I didn't even spot the change in field numbers in the OP! If that is a real criterium:

      my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1}); my @hdr; while (my $row = $csv->getline ($fh)) { # example: if ( # a change in number of columns @hdr != @$row or # first column matches header criterium $row->[0] =~ m/^[A-Z]/ ) { @hdr = @{$row}; next; } # just an example my %hash; @hash{@hdr} = @$row; }

      Enjoy, Have FUN! H.Merijn
        It worked! I think that I should be able to get the rest of what I need from here. Thanks a ton!!

        I may have spoke to soon. It appears that this is pulling my data and structuring it correctly when I output to a txt file.

        However I need to be able to pull the specific columns and im not sure how to do that. I want to be able to do something like:

        print hash->{header_name} and it will give me all of the keys under that column.

        So again, I have columns in a csv file that are kind of "stacked" on top of each other, meaning there is not a single header row at the top of the file, the header for each column is on different lines of the file.

        The headers are always enclosed in <>, so I want to scan through my csv pull out the headers and then put the corresponding values in that particular column into a hash that I can read out by doing something like hash->{header} this would give me all of the values in the column.

        Sometimes there are 100 rows under a header and sometimes there is just 1. Thanks again for your help, sorry if this doesnt make sense, it is kind of confusing.

        Let me try to explain again what the csv looks like...

        <header>, <header>, <header> value, value, value value, value value, value, <header>, <header> value, value value, value value,

        This goes on like this randomly through out the csv file, I hope that this little picture makes some more sense. I think that we are on the right track but not here yet! Thanks again for everyones help!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://901226]
Approved by holli
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (15)
As of 2014-08-27 13:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (238 votes), past polls