Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Space delimted to CSV, Index and data extraction loop

by SixShot (Novice)
on Jul 29, 2011 at 22:54 UTC ( #917569=perlquestion: print w/ replies, xml ) Need Help??
SixShot has asked for the wisdom of the Perl Monks concerning the following question:

Monks, I must first apologise as l am new to Perl and have been thrown headlong into the world of Perl to generate scripts for a new job. I hope you will be able to offer some help. I am desperately trying to gather experience rapidly and have embarked over a 10 day period to learn Perl and l hope you will be forgiving for my request for help, but l am struggling to grasp everything about this language.

I have data in s+ delimitated format:

<br/> __________ __________ Header1 Header2 Header3 Header4 + Header5 Header6 __________ __________ Header1 Header2 Header3 Header4 + Header5 Header6 Time Date Header1 Header2 Header3 Header4 + Header5 Header6 Days MM/DD/YYYY UNIT UNIT UNIT UNIT + UNIT UNIT Name AA-AA Name1 BAABAB 0.000000 1/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 31.000000 2/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 59.000000 3/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 90.000000 4/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 120.00000 5/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 151.00000 6/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 181.00000 7/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 212.00000 8/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 274.00000 9/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 305.00000 10/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 336.00000 11/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 367.00000 12/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 Name AB_AB Name1 ABABAB 0.000000 1/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 31.000000 2/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 59.000000 3/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 90.000000 4/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 120.00000 5/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 151.00000 6/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 181.00000 7/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 212.00000 8/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 274.00000 9/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 305.00000 10/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 336.00000 11/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 367.00000 12/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 Name AC_AC Name1 BBAABB 0.000000 1/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 31.000000 2/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 59.000000 3/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 90.000000 4/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 120.00000 5/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 151.00000 6/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 181.00000 7/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 212.00000 8/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 274.00000 9/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 305.00000 10/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 336.00000 11/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 367.00000 12/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000

This file varies, with varying formats for the Name and Name1 data block headers, varying numbers of data blocks and ive limited the data blocks to one year but the data extends for 50 years in each block. I have dealt with file input and dealing with the headers (see below). I am now at the loop that begins extracting the blocks of data and applies an integar index to the block headers 'name' and 'name1'... and l am stuck. I need the data to be in two csv files, a data file ($outfile)

Name_ID,Name1_ID,Header1,Header2,Header3,Header4,Header5,Header6,Date_ +EN 1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,01/01/2007 1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,02/01/2007 1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,03/01/2007 1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,04/01/2007 1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,05/01/2007 1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,06/01/2007 1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,07/01/2007 1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,08/01/2007 1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,09/01/2007 1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,10/01/2007 1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,11/01/2007 1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,12/01/2007 2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,01/01/2007 2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,02/01/2007 2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,03/01/2007 2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,04/01/2007 2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,05/01/2007 2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,06/01/2007 2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,07/01/2007 2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,08/01/2007 2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,09/01/2007 2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,10/01/2007 2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,11/01/2007 2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,12/01/2007 3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,01/01/2007 3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,02/01/2007 3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,03/01/2007 3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,04/01/2007 3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,05/01/2007 3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,06/01/2007 3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,07/01/2007 3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,08/01/2007 3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,09/01/2007 3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,10/01/2007 3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,11/01/2007 3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,12/01/2007

And a names file ($outnamefile), which l want as a log of the integar index ID's applied to the block headers:

Name,Name_ID,Name1,Name1_ID AA-AA,1,BAABAB,1 AB-AB,2,ABABAB,2 AC-AC,3,BBAABB,3

I am stuck with the logic of doing this loop in Perl and desperatly need help from some more experienced Perl mongers. Pls help, it would be most appreciated and would also be a great help for me to become more familar with Perl syntax.

command line arguement: -n 254 -o "." -r "file" -f ".+" -g ".+"

#!/usr/bin/perl use File::Spec; use Getopt::Std; #use strict; use IO::File; use Switch; use Time::HiRes qw(gettimeofday tv_interval); #get better than 1 secon +d resolution $| = 1; #Force output my $VERSION = 2.00; my $timerStart = gettimeofday(); # -------------------------------------------------------------------- +-- # Process Inputs # -------------------------------------------------------------------- +-- my %inputs = (); my %params = (); getopts('o:r:p:t:n:f:g:d', \%inputs); &get_params(\%inputs, \%params) || die "Error Getting Parameters\n"; &check_params(\%params); # -------------------------------------------------------------------- +-- # Find and check input files - extract sss type # -------------------------------------------------------------------- +-- my $infile = $params{filename}; # -------------------------------------------------------------------- +-- my $outfile = File::Spec->catfile($params{outfolder},$params{rootname} +); my $outnamefile = File::Spec->catfile( $params{outfolder}, $params{roo +tname} ); my $outLineCount = $outfile . '_db_proc_linecount.txt'; $outfile = $outfile . '_db_proc_data.csv'; $outnamefile = $outnamefile . "_db_proc_names.csv"; # -------------------------------------------------------------------- +-- print "Input SSS File: [$infile]\n"; print "Output SSS File: [$outfile]\n"; print "Output Names File: [$outnamefile]\n"; # -------------------------------------------------------------------- +-- my $hdrmap; my %hdrTypes = ( "fields"=>\&get_fields_area, #"flow"=>\&get_fields_flow, #"gather"=>\&get_fields_gather, "regions"=>\&get_fields_region, "plan"=>\&get_fields_plan); die "Unable to match sss type. " unless defined($hdrTypes{$params{ssst +ype}}); $hdrmap = $hdrTypes{$params{ssstype}}->(); my $namemap = &get_fields_names($params{ssstype}); # ----------------------------------------------- # Load and Process Headers # ----------------------------------------------- open (I,"<$infile") or die "Unable to open file $infile.\n"; open (O,">$outfile") or die "Unable to open file $outfile.\n"; open (ON,">$outnamefile") or die "Unable to open file $outnamefile.\n" +; open (OLCT, ">$outLineCount") or die "Unable to open file $outLineCoun +t.\n"; # ----------------------------------------------- #--------------------------------------------------------------------- +-- # Concatinate headers #--------------------------------------------------------------------- +-- for ($i=0; $i<3; $i++) { $line = <I>; chomp ($line); $line =~ s/^\s+//; $line =~ s/\s+$//; @sp = split (/\s+/,$line); for ($j=0; $j<scalar(@sp); $j++) { @hdrs[$j] = $hdrs[$j]." ".$sp[$j]; } } print "[" . join(",", @hdrs) . "]\n"; #--------------------------------------------------------------------- +-- foreach my $hdr (@hdrs){trim(\$hdr)} # ----------------------------------------------- # Associate Header with matching index # ----------------------------------------------- print "CHECKING HEADERS ------------\n"; foreach my $key (sort keys %{$hdrmap}) { my ( $index )= grep { $hdrs[$_] eq $hdrmap->{$key}[0] } 0..$#hdrs; if (defined ($index)) { $hdrmap->{$key}[1] = $index; print "HEADER MATCH: " . "\t" . $key . "\t" . $hdrmap->{$key}[ +0] . "\t[" . $hdrmap->{$key}[1]. "]\n"; } else { print "HEADER NOT FOUND: " . "\t" . $key . "\t" . $hdrmap->{$k +ey}[0] . "\n"; } } print "-----------------------------------------------\n"; my @flist = (); my @fnamelist = (); &assign_names($hdrmap, \@flist); &assign_names($namemap, \@fnamelist); print "[" . join(",", @flist) . "]\n"; # ----------------------------------------------- print O join(",", @flist) . ",RUN_ID\n"; print ON join(",", @fnamelist) . ",RUN_ID\n"; exit(); # ----------------------------------------------- # Map Unit Conversions if needed # ----------------------------------------------- my $l_units = <I>; chomp ($l_units); print "UNITS -- > [$l_units]\n"; sub get_params($$) { # ------------------------------------------- # Input Hash from getopts and Run parameter hash # ------------------------------------------- my $inr = shift(@_); my $pr = shift(@_); # ------------------------------------------- # Setup default parameters # ------------------------------------------- $pr->{'debug'} = 0; #Extra output $pr->{'runid'} = -1000; #Dummy ID $pr->{'filter'} = ".*"; $pr->{'datefilter'} = ""; $pr->{'filename'} = "NA"; #$pr->{'runfolder'} = "."; $pr->{'outfolder'} = "."; $pr->{'ssstype'} = ""; # interpreted from filename # ------------------------------------------- # Get Values from inputs # ------------------------------------------- foreach my $key (keys %{$inr}) { $pr->{'debug'} = $inr->{$key} if $key eq 'd'; $pr->{'runid'} = $inr->{$key} if $key eq 'n'; $pr->{'filter'} = $inr->{$key} if $key eq 'f'; $pr->{'datefilter'} = $inr->{$key} if $key eq 'g'; $pr->{'filename'} = $inr->{$key} if $key eq 'r'; #$pr->{'runfolder'} = $inr->{$key} if $key eq 'p'; $pr->{'outfolder'} = $inr->{$key} if $key eq 'o'; $pr->{'ssstype'} = $inr->{$key} if $key eq 't'; } return 1; } sub check_params() { my $pr = shift; die "Require RunID if not in debug mode\n" if ($pr->{debug} == 0 a +nd $pr->{runid} == -1000); (-e $pr->{filename}) or die "Unable to find input filename: $pr->{ +filename}\n"; (-e $pr->{outfolder}) or die "Unable to find outfolder: $pr->{outf +older}\n"; $pr->{filter} = ".*" if $pr->{filter} eq ""; (my $volume,my $dirs,my $rootname) = File::Spec->splitpath($params +{filename}); $rootname =~ s/\.sss$//; my @sp = split(/_/,$rootname); $params{rootname} = $rootname; $params{ssstype} = $sp[-1] if $params{ssstype} eq ""; # ----------------------------------------------- print "Run Parameters: \n"; foreach my $key (keys %{$pr}) { print "$key -> [$pr->{$key}]\n"; } # ----------------------------------------------- return 1; } sub get_fields_plan() { my $tblFields = shift; $tblFields->{'DATE_EN'} = ["__________ __________ Date",-1]; $tblFields->{'HEADER1'} = ["Header1 Header1 Header1",-1]; $tblFields->{'HEADER2'} = ["Header2 Header2 Header2",-1]; $tblFields->{'HEADER3'} = ["Header3 Header3 Header3",-1]; $tblFields->{'HEADER4'} = ["Header4 Header4 Header4",-1]; $tblFields->{'HEADER5'} = ["Header4 Header5 Header5",-1]; $tblFields->{'HEADER6'} = ["Header6 Header6 Header6",-1]; return $tblFields; } sub assign_headers() { my $hdrmap = shift; my $headline = shift; my $sptagref = shift; # Default for tab sep +aration $$sptagref = '\s*,\s*' if $$headline=~ m/,/; my @hdrs = split(/$$sptagref/, $$headline); foreach my $hdr (@hdrs){trim(\$hdr)} # ----------------------------------------------- # Associate Header with matching index # ----------------------------------------------- print "CHECKING HEADERS ------------\n"; foreach my $key (sort keys %{$hdrmap}) { my ( $index )= grep { $hdrs[$_] eq $hdrmap->{$key}[0] } 0..$#h +drs; if (defined ($index)) { $hdrmap->{$key}[1] = $index; print "HEADER MATCH: " . "\t" . $key . "\t" . $hdrmap->{$k +ey}[0] . "\t[" . $hdrmap->{$key}[1]. "]\n"; } else { print "HEADER NOT FOUND: " . "\t" . $key . "\t" . $hdrmap- +>{$key}[0] . "\n"; } } print "-----------------------------------------------\n"; } sub assign_names() { my $hdrmap = shift; my $flist = shift; @$flist = (); foreach my $key (sort keys %{$hdrmap}) { my $i = $hdrmap->{$key}[1]; push(@{$flist}, $key) if ($i != -1); } } # -------------------------------------------------------------- # Perl trim function to remove whitespace from the start and end of th +e string # -------------------------------------------------------------- sub trim() { my $sref = shift; $$sref =~ s/^\s+//; $$sref =~ s/\s+$//; }

Comment on Space delimted to CSV, Index and data extraction loop
Select or Download Code
Re: Space delimted to CSV, Index and data extraction loop
by Khen1950fx (Canon) on Jul 30, 2011 at 10:27 UTC
    I concentrated on just getting your script in bounds. So far, so good. The first thing that you want to do is to check and double-check for errors, warnings. Look for simple things such as:

    Have you declared all your variables?
    Are all your subroutines defined?
    Are your variables properly scoped?
    Have you checked for anything redundant?
    Is any part of your code unreachable?
    (Can it be removed?)

    Second, you must know exactly what your variables, subroutines, packages, and modules used are. module_info will tell you exactly what you are doing. For example, I ran module_info -a on your script:
    Name: /root/Desktop/rework.pl Version: v2.0.0 Directory: File: /root/Desktop/rework.pl Core module: no Modules used: English File::Spec Getopt::Std IO::File Switch Time::HiRes autodie strict version warnings Packages created: main 1312019949.17447 Subroutines defined: main assign_headers assign_names check_params get_fields_plan get_params trim
    Third, run some checks to see how bad the damage is:).
    perl -c script.pl perl -w script.pl perl -MO=Lint script.pl perltidy script.pl
    Here's your script as I fixed it, minus the file part:
    #!/usr/bin/perl use strict; use warnings; use File::Spec; use Getopt::Std; use IO::File; use Switch; use Time::HiRes qw(gettimeofday tv_interval); use version 0.77; our $VERSION = qv("v2.0.0"); use English qw(-no_match_vars); local $OUTPUT_AUTOFLUSH = 1; print my $timer_start = gettimeofday(), "\n"; my(%inputs) = (); my(%params) = (); getopts('o:r:p:t:n:f:g:d', \%inputs); die "Error Getting Parameters\n" unless get_params(\%inputs, \%params); check_params(\%params); my $outfile = File::Spec->catfile($params{'outfolder'},$params{'rootname'}); my $outnamefile = File::Spec->catfile($params{'outfolder'}, $params{'rootname'} ); my $outlinecount = $outfile . '_db_proc_linecount.txt'; $outfile = $outfile . '_db_proc_data.csv'; $outnamefile = $outnamefile . '_db_proc_names.csv'; print "Output SSS File: $outfile\n"; print "Output Names File: $outnamefile\n"; my $hdrmap; my(%hdr_types) = ( fields =>\&get_fields_area, regions =>\&get_fields_region, plan =>\&get_fields_plan ); die 'Unable to match sss type. ' unless defined($hdr_types{$params{ssstype}}); $hdrmap = $hdr_types{$params{ssstype}}(); my $namemap = get_fields_plan($params{ssstype}); #open I, '<', $infile or die "Unable to open file $infile.\n"; open my $O, '>', $outfile or die "Unable to open file $outfile.\n"; open my $ON, '>', $outnamefile or die "Unable to open file $outnamefil +e.\n"; #open OLCT, '<', $outlinecount or die "Unable to open file $outlinecou +nt.\n"; foreach (my $i=0; $i<3; $i++) { my $line = <$O>; chomp ($line); $line =~ s/^\s+//; $line =~ s/\s+$//; my @sp = split (/\s+/,$line); foreach (my $j=0; $j<scalar(@sp); $j++) { my @hdrs; @hdrs = $hdrs[$j] . q{} . $sp[$j]; } } print "[" . join(",", my @hdrs) . "]\n"; foreach my $hdr (@hdrs){ trim(\$hdr); print "CHECKING HEADERS ------------\n"; } foreach my $key (sort keys %{$hdrmap}) { my ( $index )= grep { $hdrs[$_] eq $hdrmap->{$key}[0] } 0..$#hdrs +; if (defined ($index)) { $hdrmap->{$key}[1] = $index; print "HEADER MATCH: " . "\t" . $key . "\t" . $hdrmap->{$key +}[0] . "\t[" . $hdrmap->{$key}[1]. "]\n"; } else { print "HEADER NOT FOUND: " . "\t" . $key . "\t" . $hdrmap->{ +$key}[0] . "\n"; } } print "-----------------------------------------------\n"; my @flist = (); my @fnamelist = (); assign_names($hdrmap, \@flist); assign_names($namemap, \@fnamelist); print "[" . join(",", @flist) . "]\n"; print $O join(",", @flist) . ",RUN_ID\n"; print $ON join(",", @fnamelist) . ",RUN_ID\n"; exit(); my $l_units = <$O>; chomp ($l_units); print "UNITS -- > [$l_units]\n"; use autodie qw(:close); close($O); close($ON); sub get_params { my $inr = shift; my $pr = shift; $pr->{'debug'} = 1; $pr->{'runid'} = -1000; $pr->{'filter'} = ".*"; $pr->{'datefilter'} = ""; $pr->{'filename'} = "NA"; $pr->{'outfolder'} = "."; $pr->{'ssstype'} = ""; foreach my $key (keys %{$inr}) { $pr->{'debug'} = $inr->{$key} if $key eq 'd'; $pr->{'runid'} = $inr->{$key} if $key eq 'n'; $pr->{'filter'} = $inr->{$key} if $key eq 'f'; $pr->{'datefilter'} = $inr->{$key} if $key eq 'g'; $pr->{'filename'} = $inr->{$key} if $key eq 'r'; $pr->{'outfolder'} = $inr->{$key} if $key eq 'o'; $pr->{'ssstype'} = $inr->{$key} if $key eq 't'; } return 1; } sub check_params { my $pr = shift; die "Require RunID if not in debug mode\n" if ($pr->{debug} == 1 a +nd $pr->{runid} == -1000); (-e $pr->{filename}) or die "Unable to find input filename: $pr->{ +filename}\n"; (-e $pr->{outfolder}) or die "Unable to find outfolder: $pr->{outf +older}\n"; $pr->{filter} = ".*" if $pr->{filter} eq ""; my ($volume, $dirs, $rootname) = File::Spec->splitpath($params{fil +ename}); $rootname =~ s/\.sss$//; my @sp = split(/_/,$rootname); $params{rootname} = $rootname; $params{ssstype} = $sp[-1] if $params{ssstype} eq ""; print "Run Parameters: \n"; foreach my $key (keys %{$pr}) { print "$key -> [$pr->{$key}]\n"; } return 1; } sub get_fields_plan { my $tblFields = shift; $tblFields->{'DATE_EN'} = ["__________ __________ Date",-1]; $tblFields->{'HEADER1'} = ["Header1 Header1 Header1",-1]; $tblFields->{'HEADER2'} = ["Header2 Header2 Header2",-1]; $tblFields->{'HEADER3'} = ["Header3 Header3 Header3",-1]; $tblFields->{'HEADER4'} = ["Header4 Header4 Header4",-1]; $tblFields->{'HEADER5'} = ["Header4 Header5 Header5",-1]; $tblFields->{'HEADER6'} = ["Header6 Header6 Header6",-1]; return $tblFields; } sub assign_headers { my $hdrmap = shift; my $headline = shift; my $sptagref = shift; $$sptagref = '\s*,\s*' if $$headline=~ m/,/; my @hdrs = split(/$$sptagref/, $$headline); foreach my $hdr (@hdrs){ trim(\$hdr); print "CHECKING HEADERS ------------\n"; } foreach my $key (sort keys %{$hdrmap}) { my ( $index )= grep { $hdrs[$_] eq $hdrmap->{$key}[0] } 0. +.$#hdrs; if (defined ($index)) { $hdrmap->{$key}[1] = $index; print "HEADER MATCH: " . "\t" . $key . "\t" . $h +drmap->{$key}[0] . "\t[" . $hdrmap->{$key}[1]. "]\n"; } else { print "HEADER NOT FOUND: " . "\t" . $key . "\t" . $hdrma +p->{$key}[0] . "\n"; } } print "-----------------------------------------------\n"; } sub assign_names { my $hdrmap = shift; my $flist = shift; @$flist = (); foreach my $key (sort keys %{$hdrmap}) { my $i = $hdrmap->{$key}[1]; push(@{$flist}, $key) if ($i != -1); } } sub trim { my $sref = shift; $$sref =~ s/^\s+//; $$sref =~ s/\s+$//; }
Re: Space delimted to CSV, Index and data extraction loop
by Not_a_Number (Parson) on Jul 30, 2011 at 11:15 UTC

    Here's one way of doing the actual parsing bit (ignoring headers, reading from __DATA__ and printing to the screen for simplicity):

    use strict; use warnings; my $name_id = 0; my %seen_name; <DATA> for 1 .. 5; # Remove headers while ( my $line = <DATA> ) { next unless $line =~/\w/; if ( $line =~ /Name/ ) { $name_id = parse_name( $line ); } else { my @tmp = split ' ', $line; shift @tmp; my $date = shift @tmp; print join ',', $name_id, $name_id, @tmp, $date; print "\n"; } } print "\n________Names file________\n\n"; for my $key ( sort { $seen_name{$a} <=> $seen_name{$b} } keys %seen_na +me ) { my @names = split /,/, $key; my $val = $seen_name{$key}; print join ",$val,", @names, "\n"; } sub parse_name { my $line = shift; my $name = join ',', ( split ' ', $line )[1, 3]; if ( $seen_name{$name} ) { $name_id = $seen_name{$name}; } else { $name_id += 1; $seen_name{$name} = $name_id; } return $name_id; }
Re: Space delimted to CSV, Index and data extraction loop
by planetscape (Canon) on Jul 30, 2011 at 12:47 UTC
    have embarked over a 10 day period to learn Perl

    You'll need more than 10 days. ;-)

    After 10 years, I'm still learning... But welcome, make yourself at home, and don't forget to check out our very fine Tutorials, esp. Getting Started with Perl

    HTH,

    planetscape
Re: Space delimted to CSV, Index and data extraction loop
by SixShot (Novice) on Jul 30, 2011 at 13:31 UTC

    From a fav song of mine..."You get by with a little help from your friends". My appreciation cannot be expressed for your quick and most helpful replies. I am off and running again and l will update or ask for more advice as l move on. Hmmm 10 days l know is asking a lot im like luke skywalker when he first uses the force.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://917569]
Approved by Tanktalus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (9)
As of 2014-09-02 22:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (32 votes), past polls