Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

[Resolved] Parse a file into a hash

by kazak (Beadle)
on Apr 05, 2012 at 09:07 UTC ( #963617=perlquestion: print w/replies, xml ) Need Help??
kazak has asked for the wisdom of the Perl Monks concerning the following question:

Hi 2 everyone. I need to parse some file into a hash, but the thing is that this file is a list, similar to this:

key1 value1

key2 value2



key3 value5

key4 value6




As you can see quantity of possible keys and posiible values isn't matching. And on top of this I need following:

-If the key for a certain string doesn't exist previous value must be used

I mean:

key1 value1

key2 value2

key2 value3

key2 vasue4

If the value for certain string doesn't exist, it should be repalced with empty value:

key5 <empty>

key6 <empty>

So the resulting hash should be:

key1 value1

key2 value2

key2 value3

key2 vasue4

key3 value5

key4 value6

key4 value7

key5 empty

key6 empty

I hope someone can help me with this.

Thanks in advance, kazak.

Replies are listed 'Best First'.
Re: Parse a file into a hash
by moritz (Cardinal) on Apr 05, 2012 at 09:13 UTC
Re: Parse a file into a hash
by raybies (Chaplain) on Apr 05, 2012 at 13:01 UTC

    kazak, we're going to need a whole lot more information than this in order to help you. Sometimes in the process of giving out that information, the answers become obvious, so don't be afraid to share.

    My first question would be, how do I determine whether a data entry in a file is a key or a value?

    Then I would think about how to write a regex that could detect a key, a value, and a key/value pair.

    From your example data given, it's not clear why key5 and key 6 are empty, when you put value 5 and value 6 in your example. Shouldn't they go together? If not, how do you determine that a data value goes with a key?

    Is the data somehow identifiable?

    Is key5 and 6 empty because they appear alone? Is that what makes them empty?

    Here's a thought:

    Suppose you could detect key or value, would this pseudocodish outer loop help?

    my %hashnew; my $current_key = undef; my $current_value = undef; while (<DATA>) { my ($key, $value) = assign_key_value_from_current_line ($_); $current_key = $key if defined $key; $hashnew{$current_key} = $value; }

    You'd then have to create the sub assign_key_value_from_current_line such that if only a key appears on the line, the value is set to undef and returned, and if the value appears then undef is stuffed into the first returned value for the key, and the system would use the first key. You may want an additional check in the main loop for the case where there's no key yet, but a bunch of values that would be tossed on the ground. In the case where both appear, the current key is reassigned and the new value likewise is recorded.

    If you can't figure out difference between key and value, then it may be impossible, but from what you've given I don't know. Also start small, if you can't get all of the solution all at once, perhaps try a few simple steps, perhaps you can organize your data, or get it "part way" completed. In such a case, you might discover that it's good enough for what you're trying to do.

    And as moritz notes, it'd be nice to see what you've tried already.

      Thank you for your reply, yes it helped a bit. File I need to parse is a .csv file with a fileds that must be converted to an ACLs, there are two fields: User name , asset name . One user can have either one or multiple asset names, but user name is unique. So unique user name I'm trying to use as a key and asset names I trying to use as a values of these unique keys. So one user John Doe may have either one asset name: John Doe = { "KJhkh23"} or multiple: John Doe = { "KJhkh23", "0jUfh4631",....."N"}. File was populated manually and irregulary, I mean in "User Name" field we may have one name "John Doe" but in field " Asset name" we may have a column of values. The problem is to define a first field as a key(John Doe) and the second as a value (KJhkh23), and if on the next line we can't detect a key, assume that this asset belongs to last available key (John Doe) . I'm new to perl so this code may look like a total mess for coders.
      #!/usr/bin/perl -w use warnings; use strict; my @tmp; ### Start Configuration my $src_file = "Servers.csv"; ### End Configuration my %stack = (); my $current_key = undef; my $current_value = undef; open( SRC, "<", $src_file ); while (<SRC>) { chomp; s/#.*//; s/\"//g; s/\;/\-/; push @tmp, $_ if $_ =~ m/^\;/ and next; my ($key,$value) = split /\-/, $_; push @tmp, $value; if (defined $key) { $current_key = $key; push @tmp, $value; $stack{$current_key} = @tmp; } close(SRC);

        You should probably show us some sample data, so we can tell what "can't detect a key" means. But in general, you're talking about creating a hash of arrays. So for every key/value pair, you'll want to do something like this:

        push @{$stack{$key}}, $value;

        That pushes the value onto the array referenced by the key within the hash. Later, you'll be able to go through them with:

        for my $key (keys %stack){ for my $value (@{$stack{$key}}){ # do stuff with $key and $value } }

        Aaron B.
        My Woefully Neglected Blog, where I occasionally mention Perl.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://963617]
Approved by marto
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2018-07-15 23:37 GMT
Find Nodes?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?

    Results (328 votes). Check out past polls.