Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

read CSV file line by line and create hash table

by lshokri02 (Novice)
on Sep 19, 2014 at 14:53 UTC ( #1101217=perlquestion: print w/replies, xml ) Need Help??

lshokri02 has asked for the wisdom of the Perl Monks concerning the following question:

This is my first time writing perl and I want to take a CSV file which looks similar to this:

mask No. , block, size, base addr (hex), end addr (hex)--about 20 h +eader names fpg23, b002, 16384, 60000, 63FFF fpg23, b002, 16384, 800000, 803FFF fpg23, b003, 0, F00000, F00000 fpg23, b003, 4, F00000, F00004 fpg23, b003, 8, F00000, F00008 --so on for a variable amount of lines

so each line is represented as its own block, but I want to be able after reading the file, to sort the file by base address and be able to access other information in its line such as the end addr. my coworker said to use hashes, so I'm thinking of a multidimensional hash. I'm thinking that the column names are the keys of the hash, so can I sort a specific key (aka Base addr. (hex) key? this is my current code, from what I understand it is making the column name keys and it is reading line by line, but those values, are they private to that while loop? I believe I have to incorporate them in my hash table that I'm calling %reghash, just unclear of how to do that. I'd appreciate any help/feedback. Thank you!

#!/usr/bin/env perl use strict; use warnings; use diagnostics; use Text::CSV_XS; my $filename = 'test1.csv'; my %reghash; open(my $fh, '<', $filename) or die "Can't open $filename: $!"; my $csv = Text::CSV_XS->new ({ binary => 1, #allow special character. always set this auto_diag => 1, #report irregularities immediately }); $csv->column_names ($csv->getline ($fh)); #use header while (my $row = $csv->getline_hr ($fh)){ printf "Base Address: %s\n", $row->{"Base Addr. (Hex)"}; } close $fh;

Replies are listed 'Best First'.
Re: read CSV file line by line and create hash table
by choroba (Archbishop) on Sep 19, 2014 at 15:04 UTC
    You're on the right track. The $row won't survive the loop, but the hash will, as it is declared outside the loop.
    my $base = $row->{'Base Addr. (Hex)'}; $reghash{$base}{'End Addr.'} = $row->{'End Addr.'}; # etc.

    Your sample data seem to hint, though, that the Base Address is not always unique (F00000 appears 3 times). There can't be a duplicate key in a hash.

    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Ah I see, I do have a column that is unique numbers to each line, so I can use that as my key?

      my $base = $row->{'Base Addr. (Hex)'}; $reghash{$base}{'End Addr.'} = $row->{'End Addr.'};

      by this do you mean to say I should be repeating the process so it gets the values for all the 20 columns? Thank you for your input!

        Well, you could do a line for each column. But this is Perl, and There Is More Than One Way To Do It. The documentation for readline_hr() is silent on whether it returns a different hash for each read, but Text::CSV does in fact do this. If you're not nervous about relying on undocumented behavior, and if Text::CSV_XS behaves the same way (Text::CSV is the slower of the two, but does not require a compiler to install), you can simply assign $row to your hash element. But the devil is in the details. When I run what is more or less your code, I find that the data tend to have leading (and sometimes trailing) spaces. So the snippet that assigns the input to the hash becomes

        my $base = $row->{' base addr. (hex)'}; $reqhash{$base} = $row;

        Note that the hard-coded key you look things up in the hash with MUST match the key actually in the hash, both in case and in white space. A lookup on 'Base Addr. (hex)' will find nothing if your file actually contains ' base addr (hex)'. If at some point you are not getting the results you expect, print your data and see if it's what you expect. For example:

        use Data::Dumper; print Dumper( $row );

        For good form the 'use' should be at the top of your code with the other 'use' statements, but if you're just debugging I like to put it with the debugging code, so that when I delete it I delete everything. If you do this more than one place, you can 'use Data::Dumper' more than one place, since module loading is (usually) idempotent. There's probably a slight performance hit, but you are going to remove all the debugging code anyway.

        Also, this STILL does not address the problem of non-uniqueness of base addresses. You need to key by base address because that's what you want to sort by. If all you want is a sorted output you could do something like

        my $base = "$row->{' base addr. (hex)'} $row->{' end addr (hex)'}"; $reghash{$base} = $row;

        If you actually want to find and manipulate the individual data, you may have to go to nested hashes -- that is, something like

        my $base = $row->{' base addr. (hex)'}; my $end = $row->{' end addr (hex)'}"; $reghash{$base}{$end} = $row;

        Doing this requires you to know both base address and end address to access a specific row.

        You're doing a lot right in your code (three-argument open(), lexical file handles, checking status, 'use strict', 'use warnings'). Good going! These may already have saved you time and grief, and will save you more when you write more Perl.

Re: read CSV file line by line and create hash table [IGNORE]
by Bloodnok (Vicar) on Sep 19, 2014 at 15:10 UTC
    Personally, unless it's a 'how do I do that' sort of a question and having realised/learned that CPAN truly _is_ my friend - I'd use Text::CSV ...

    Hmmm, very many thanx to choroba for pointing out what I self-evidently failed to spot - that the OP already uses it - doh!!

    A user level that continues to overstate my experience :-))
      How is Text::CSV better than Text::CSV_XS that the OP already uses? (Unless it's a subjective preference with no possible explanation).
      لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: read CSV file line by line and create hash table
by james28909 (Deacon) on Sep 20, 2014 at 01:18 UTC
    might i add my opinion here? you could manually read the file $bytes at a time and then store the data into a variable... its what i do with some files. take for instance in some files i work with, you read 8 bytes store into variable, read 8 more bytes and store into a different variable, then read 16 bytes and store that in a seperate variable. then you can call the variable at any time you need it. though i do not know if that would be best for this scenario, but if the file is aligned exactly the same throughout the file, you can just set it up top read bytes at a time. hope this helps :)

    if you need example script id be more than glad to give if needed.

    hope this is helpful and not off topic :)>

    i do alot of work with header information, particularly ps3 flash information. in the header it describes the filesize, file location and file name in the header. and i loop thru that info to extract the actual data further in the file. if you need me to post some examples let me know :)
    if your trying to extract data, just do it per each loop. or in other words, get filename size and name ect and then extract the data, then loop again. if you have a count of some sort it works a treat. otherwise there is the until loop which works as well, but a count works better. actually you could probably look thru my older post to get an idea of what i mean.
      check this out, works great if yoru extracting files referenced by the header ;)
      my $fileLocation = ''; my $fileSize = ''; my $fileName = ''; my $chunk = ''; #my $entry_Count = seek and read the count inside the file; $entry_Count =~ s/(.)/sprintf("%02x",ord($1))/egs; #print "There are $entry_Count entries in this file"; seek( $infile, 0x10, 0 ) || die "cannot seek file: $!"; for ( my $i = 1 ; $i <= $entry_Count ; $i++ ) { read( $infile, $fileLocation, 0x08 ); read( $infile, $fileSize, 0x08 ); read( $infile, $fileName, 0x20 ); $fileLocation =~ s/(.)/sprintf("%02x",ord($1))/egs; $fileSize =~ s/(.)/sprintf("%02x",ord($1))/egs; $fileName =~ s/\0+$//; print ("Found $fileName"); open( my $file, '>', "extracted/$fileName" ) || die "Cannot op +en $fileName $!"; binmode($file); seek( $infile, hex($fileLocation), 0 ); read( $infile, $chunk, hex($fileSize) ); syswrite( $file, $chunk ); #print ($file $chunk); close($file); } print "Files/Data extracted";
      the above will read a file in, and depending on an entry count (if one is defined in the file you are using) will extract the data that the header references. with the header being $entry_Count long ofcourse
      it also looks as if your working with a file 16 mb in size. could you upload one of your files and let me take a look? :)
      what was i downvoted for? it really disheartens me to get downvoted for just trying to help someone. and also nto to even get any feedback at all as to why i got the downvote :l

      i know i offer very limited help, but when it comes to something i think i can help with i dont hesitate and i give my opinion/advice whole heartedly. atleast tell me why you gave me a downvote <.<

      also it was already said once in this thread that.. and i quote... "But this is Perl, and There Is More Than One Way To Do It". and all i did was offer an alternative. a way to display all infos from the file header OR you could even extract it based off header references, and with a little reconstruction of the example script i posted, you could accomidate the needed data quite well.

      also the example script has a little code in it that other monks helped me with as well.

        First, relax, downvotes happen, you've got to wait a few days to see the final tally. Also see Why did I get downvoted?

        However, while it's true that TIMTOWTDI, your code isn't actually "one way to do it", since lshokri02 wants to read a CSV file, and your code does something very different (plus, it doesn't compile). Off-topic posts are not generally a bad thing, but this one should probably be more clearly described as such, otherwise it may send the person looking for answers down the wrong path. This is especially true for your first post, since it begins by describing a method that would not be particularly useful in reading a CSV file.

        (Also, posting 5 nodes all in reply to each other probably isn't necessary, as you could have edited your node. It makes it look like you didn't take the time to think about your post before submitting it.)

        you know what lol, i dont even care anymore. it is merely your opinion to think the post i made deserves a downvote, unless you can give me actual physical proof as to why you downvote me, i could care less. downvote all my posts :)
        p.s. have fun

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1101217]
Front-paged by GotToBTru
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (2)
As of 2022-05-28 17:10 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (99 votes). Check out past polls.