Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Calculate jackknife error from of each column of a multi-column file

by pyari_billi (Initiate)
on Dec 15, 2020 at 08:03 UTC ( #11125219=perlquestion: print w/replies, xml ) Need Help??

pyari_billi has asked for the wisdom of the Perl Monks concerning the following question:

Hello Dear Monks I am a perl newbie. I am trying to calculate the jacknife average and error of each column in a multi-column file. My example data file look like this:
$ cat data.HW2 1.1 2.1 3.1 4.1 1.2 2.2 3.2 4.2 1.3 2.3 3.3 4.3 1.4 2.4 3.4 4.4
My attempted solution is to define arrays that will eventually be the size same as the number of columns (in this case 4) and iterate over them line by line:
cat jackkinfe.pl #! /usr/bin/perl use warnings; use strict; my @n=0; my @x; my $j; my $i; my $dg; my @x_jack; my @x_tot=0; my $cols; my $col_start=0; # read in the data while(<>) { my @column = split(); $cols=@column; foreach my $j ($col_start .. $#column) { $x[$n[$j]][$j] = $column[$j]; $x_tot[$j] += $x[$n[$j]][$j]; $n[$j]++; } } # Do the jackknife estimates for ($j=$col_start; $j<$cols; $j++) { for ($i = 0; $i < $n[$j]; $i++) { $x_jack[$i][$j] = ($x_tot[$j] - $x[$i][$j]) / ($n[$j] - 1); } # Do the final jackknife estimate my @g_jack_av=0; my @g_jack_err=0; for ($i = 0; $i < $n[$j]; $i++) { $dg = $x_jack[$i][$j]; $g_jack_av[$j] += $dg; $g_jack_err[$j] += $dg**2; } $g_jack_av[$j] /= $n[$j]; $g_jack_err[$j] /= $n[$j]; $g_jack_err[$j] = sqrt(($n[$j] - 1) * abs($g_jack_err[$j] - $g_jack_ +av[$j]**2)); printf "%e %e ", $g_jack_av[$j], $g_jack_err[$j]; } printf "\n";
It gives me the following two warnings:
$cat data.HW2 | perl jackknife.pl Use of uninitialized value within @n in array element at cols_jacknife +.pl line 19, <> line 1. Use of uninitialized value within @n in array element at cols_jacknife +.pl line 20, <> line 1.
It is complaining at the following two lines:
$x[$n[$j]][$j] = $column[$j]; $x_tot[$j] += $x[$n[$j]][$j];
But I want to set the size of @n dynamically depending on the size of the data file. How do I remove this warning? Any other suggestions on my perl usage are also welcome and much appreciated since I am trying to learn the best practices.

Replies are listed 'Best First'.
Re: Calculate jackknife error from of each column of a multi-column file
by hippo (Chancellor) on Dec 15, 2020 at 09:59 UTC

    Hello, pyari_billi and welcome to the monastery.

    The warning is alerting you to the fact that by the time your code gets to line 19 there is no value in $n[$j] because the only thing you have put in @n is 0. You must store a value in a variable (scalar/array/hash) before you can extract it.

    But I want to set the size of @n dynamically depending on the size of the data file.

    You can either do that as you go along (arrays can be expanded or shrunk dynamically in perl) or you can perform 2 passes of the file. The former is usually preferable. Something like:

    foreach my $j ($col_start .. $#column) { $n[$j] //= 1; # Choose whatever suitable value you want here - it +could even be an expression $x[$n[$j]][$j] = $column[$j]; # ... etc.

    The value you choose to set will depend on your algorithm, of course. Perhaps you want to use $.? Anyway, I hope this makes the problem more clear to you.


    🦛

Re: Calculate jackknife error from of each column of a multi-column file
by BillKSmith (Prior) on Dec 15, 2020 at 19:31 UTC
    Your program would be much simpler if you reversed the order of your subscripts. Each column of your data is a set of data. It is convenient to have each set stored as its own array.
    use strict; use warnings; use List::Util qw(sum); my @raw; my $n; while (my $line = <DATA>) { my @cols = split / /, $line; foreach my $j (0..$#cols) { push @{$raw[$j]}, $cols[$j]; } } { local $, = ' '; print process_column($_), "\n" foreach @raw; } sub process_column { my @x = @{$_[0]}; my $n = @x; my $x_tot = sum( @x ); my @x_jack = map { ($x_tot - $_) / ($n - 1) } @x; my $g_jack_av; my $g_jack_err; #foreach my $dg (@x) { foreach my $dg (@x_jack) { #UPDATE $g_jack_av += $dg; $g_jack_err += $dg**2; } $g_jack_av /= $n; $g_jack_err /= $n; $g_jack_err = sqrt(($n-1)*abs($g_jack_err-$g_jack_av**2)); return ($g_jack_av, $g_jack_err); } __DATA__ 1.1 2.1 3.1 4.1 1.2 2.2 3.2 4.2 1.3 2.3 3.3 4.3 1.4 2.4 3.4 4.4

    OUTPUT

    1.25 0.193649167310372 2.25 0.193649167310365 3.25 0.193649167310338 4.25 0.193649167310365

    OUTPUT (with correction)

    1.25 0.064549722436785 2.25 0.0645497224367747 3.25 0.0645497224367334 4.25 0.064549722436816

    UPDATE: Corrected error reported by AnomalousMonk in Re^2: Calculate jackknife error from of each column of a multi-column file

    Bill

      Your code produces the same average values as the the OPed code, but the error values differ: 0.193649167310365 versus 6.454972e-002 for the OPed code.

      I'm not familiar with jackknife averages/errors (update: in fact, I don't even know if pyari_billi's code yields the correct error values), so I don't know if this is significant.

      However, I certainly agree that the OPed code can be considerably simplified.


      Give a man a fish:  <%-{-{-{-<

Re: Calculate jackknife error from of each column of a multi-column file
by jwkrahn (Monsignor) on Dec 15, 2020 at 09:46 UTC
    How do I remove this warning?
    foreach my $j ($col_start .. $#column) { $n[$j] += 0; # add this to remove warning msg $x[$n[$j]][$j] = $column[$j]; $x_tot[$j] += $x[$n[$j]][$j]; $n[$j]++; }
    Any other suggestions on my perl usage are also welcome

    The lines:

    my @n=0; my @x_tot=0; my @g_jack_av=0; my @g_jack_err=0;

    Should be:

    my @n; my @x_tot; my @g_jack_av; my @g_jack_err;

    As you don't have to assign a value to the first array element.

Re: Calculate jackknife error from of each column of a multi-column file
by Marshall (Canon) on Dec 16, 2020 at 01:58 UTC
    I am very confused as to what this thing is supposed to do!
    I ran just the first part of the code and this is what I get.
    The n array is not multi-dimensional. There is not data in that array anyway in this first loop. $x[$n[$j]][$j] doesn't make sense to me.
    Update: I guess $x[$n[$j]//=0][$j] = $column[$j]; "works", but I still don't quite get it.
    #! /usr/bin/perl use warnings; use strict; use Data::Dumper; my @n=0; my @x; my $j; my $i; my $dg; my @x_jack; my @x_tot=0; my $cols; my $col_start=0; # read in the data while(<DATA>) { my @column = split(); $cols=@column; foreach my $j ($col_start .. $#column) { $x[$n[$j]][$j] = $column[$j]; $x_tot[$j] += $x[$n[$j]][$j]; $n[$j]++; } } print "\n"; print "THE X ARRAY IS\n"; foreach my $rowref (@x) { print "@$rowref\n"; } print "\n"; print "THE N ARRAY IS:\n"; print "@n"; =PRINTS: Use of uninitialized value within @n in array element at line 22, <DAT +A> line 1. Use of uninitialized value within @n in array element at line 23, <DAT +A> line 1. Use of uninitialized value within @n in array element at line 22, <DAT +A> line 1. Use of uninitialized value within @n in array element at line 23, <DAT +A> line 1. Use of uninitialized value within @n in array element at line 22, <DAT +A> line 1. Use of uninitialized value within @n in array element at line 23, <DAT +A> line 1. THE X ARRAY IS 1.1 2.1 3.1 4.1 1.2 2.2 3.2 4.2 1.3 2.3 3.3 4.3 1.4 2.4 3.4 4.4 THE N ARRAY IS: 4 4 4 4 =cut __DATA__ 1.1 2.1 3.1 4.1 1.2 2.2 3.2 4.2 1.3 2.3 3.3 4.3 1.4 2.4 3.4 4.4

      Let's do $x[$n[$j]][$j] in bits:

      foreach my $j ($col_start .. $#column) { my $nIndex = $n[$j]; $x[$nIndex][$j] = $column[$j]; $x_tot[$j] += $x[$nIndex][$j]; $n[$j]++; }

      The 'uninitialized value' error can then be fix by:

      foreach my $j ($col_start .. $#column) { my $nIndex = $n[$j] // 0; $x[$nIndex][$j] = $column[$j]; $x_tot[$j] += $x[$nIndex][$j]; $n[$j]++; }
      Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
        Yes, I see now that $x[$n[$j]//=0][$j] = $column[$j]; is the answer. Thanks! The nested brackets confused my brain. I highly suspect that there is a more simple formulation of this part of the algorithm. For example, transforming a square matrix is not that hard and then you go row by row to get the column sums.
Re: Calculate jackknife error from of each column of a multi-column file
by karlgoethebier (Abbot) on Dec 16, 2020 at 21:33 UTC

    Ach so

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

Re: Calculate jackknife error from of each column of a multi-column file
by thechartist (Monk) on Dec 18, 2020 at 01:40 UTC

    You might have an easier time using the PDL module for the matrix operations you need to do. It will be easier to name which element from the column vector you need to omit in order to average the other elements.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11125219]
Approved by haukex
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (4)
As of 2021-05-16 17:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Perl 7 will be out ...





    Results (152 votes). Check out past polls.

    Notices?