Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Consolidating data from lots of different sources

by blackadder (Hermit)
on Jun 08, 2005 at 21:42 UTC ( #464850=perlquestion: print w/ replies, xml ) Need Help??
blackadder has asked for the wisdom of the Perl Monks concerning the following question:

Dear Gods

I wonder if you have a moment.

I have a data sheet containing the following column headers;
Srv_Name SAN_Info Criticality_rating Commission Date
The data inside the table is of this nature;
Srv_0A,xxxxxxxxx,,xxxxxxxxxxx Srv_0A,,, Srv_0G,,, Srv_0B,xxxxxxxxxx,,xxxxxxxxx Srv_0A,,,xxxxxxxxxxxxxx Srv_0G,xxxxxxxxxxx,xxxxxxxxxxxxxx, Srv_0B,,xxxxxxxxxxxxxx,
And I need to consolidate the information pertaining to one entry in a single distinct record. Producing an output such as that;
Srv_0A,xxxxxxxxxxxxx,xxxxxxxxxxxx,xxxxxxxxxxxxxxxxxx Srv_0G,xxxxxxxxxxxxx,xxxxxxxxxxxx, Srv_0B, xxxxxxxxxxxxx,xxxxxxxxxxxx,xxxxxxxxxxxxxxxxxx
And All the entries that were not matched with any thing will be stored else were. I have spend a lot of thinking time but havenít managed to come up with any ideas.

Your divine intervention is highly regarded....Thanks.
Blackadder

Comment on Consolidating data from lots of different sources
Select or Download Code
Re: Consolidating data from lots of different sources
by Transient (Hermit) on Jun 08, 2005 at 21:47 UTC
    Is this a file or a database? What have you actually tried? split? push? Hashes of Arrays?
Re: Consolidating data from lots of different sources
by tlm (Prior) on Jun 08, 2005 at 21:52 UTC

    The following code assumes that the different data lines are consistent (i.e. for a given ID there is a unique non-empty value for each field):

    use strict; use warnings; my %data; while ( <DATA> ) { my ( $id, @fields ) = split /\s*,\s*/, $_, -1; for my $i ( 0..$#fields ) { $data{ $id }[ $i ] = $fields[ $i ] if length $fields[ $i ]; } } __END__

    Update:I've attempted to fix one of the problems pointed out by ikegami below (specifically, added the -1 argument to split). I still focus only on the problem of consolidating from various lines, which I think is what the OP was having problems with, and ingore the issue of output.

    the lowliest monk

      Your solution doesn't output anything.

      It leaves (any trailing whitespace including the) newlines in the data.

      And only creates two fields for Srv_0G, causing the trailing comma to be omitted from the output (which I suppose could be fixed at print-time).

Re: Consolidating data from lots of different sources
by ikegami (Pope) on Jun 08, 2005 at 21:54 UTC

    Your data contains

    Srv_0A,xxxxxxxxx,,xxxxxxxxxxx Srv_0A,,, Srv_0A,,,xxxxxxxxxxxxxx

    Should that be

    Srv_0A,xxxxxxxxx,,xxxxxxxxxxx Srv_0A,,, Srv_0A,,xxxxxxxxxxxxxx,

    If so,

    use strict; use warnings; my %data; while (<DATA>) { s/^\s+//; s/\s+$//; my @fields = split(/\s*,\s*/); my $server = shift(@fields); $data{$server} ||= ['', '', '']; # Merge data. foreach (0..$#fields) { $data{$server}[$_] = $fields[$_] if length $fields[$_]; } } print(join(',', $_, @{$data{$_}}), "\n") foreach keys(%data); __DATA__ Srv_0A,Axxxxxxxx,,Bxxxxxxxxxx Srv_0A,,, Srv_0G,,, Srv_0B,Cxxxxxxxxx,,Dxxxxxxxx Srv_0A,,Exxxxxxxxxxxxx, Srv_0G,Fxxxxxxxxxx,Gxxxxxxxxxxxxx, Srv_0B,,Hxxxxxxxxxxxxx,

    If not, how do you merge the two Commission Dates?

Re: Consolidating data from lots of different sources
by Zaxo (Archbishop) on Jun 08, 2005 at 21:57 UTC

    A hash of hashes will do that nicely,

    my %hoh; my @labels = qw/SAN_Info Criticality_rating Commission_Date/; while (<>) { my ($key, @dat) = split ','; $hoh{$key}{$labels[$_] ||= $dat[$_] for 0..$#labels; }
    The ||= assignment prevents a filled slot from being overwritten by undef.

    After Compline,
    Zaxo

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://464850]
Approved by Zaxo
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (9)
As of 2014-12-22 09:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (114 votes), past polls