<?xml version="1.0" encoding="windows-1252"?>
<node id="625532" title="Computing Covariance Matrices with PDL" created="2007-07-08 16:45:25" updated="2007-07-08 12:45:25">
<type id="1980">
snippet</type>
<author id="565709">
lin0</author>
<data>
<field name="doctext">
</field>
<field name="snippetdesc">
&lt;p&gt;A [http://en.wikipedia.org/wiki/Covariance_matrix|Covariance Matrix] is a matrix of [http://en.wikipedia.org/wiki/Covariance|covariances] (the measure of how much two random variables vary together) between elements of a vector.&lt;/p&gt;
&lt;p&gt;In this snippet, I present [http://en.wikipedia.org/wiki/Estimation_of_covariance_matrices|how to compute a covariance matrix] using the [http://pdl.perl.org/|Perl Data Language]. The input is a piddle (see comment below for a definition) in which each row represents an input vector and  each column represents a dimension of the input vector. The output is a piddle that holds the covariance matrix.&lt;/p&gt;

&lt;p&gt;What are Piddles?&lt;/p&gt;
&lt;p&gt;They are a new data structure defined in the [http://pdl.perl.org/|Perl Data Language]. As indicated in [id://598007]:&lt;/p&gt;
&lt;blockquote&gt;&lt;i&gt;Piddles are numerical arrays stored in column major order (meaning that the fastest varying dimension represent the columns following computational convention rather than the rows as mathematicians prefer). Even though, piddles look like Perl arrays, they are not. Unlike Perl arrays, piddles are stored in consecutive memory locations facilitating the passing of piddles to the C and FORTRAN code that handles the element by element arithmetic. One more thing to note about piddles is that they are referenced with a leading $&lt;/i&gt;&lt;/blockquote&gt;
&lt;p&gt;Cheers,&lt;/p&gt;

&lt;p&gt;[lin0]&lt;/p&gt;</field>
<field name="snippetcode">
&lt;CODE&gt;
#!/usr/bin/perl
use warnings;
use strict;
use PDL;

# ================================
# covariance: 
#
#   $Sigma = covariance( $X )
#
#   computes the Sample Covariance Matrix of
#   a sample X1...Xn of p-dimensional vectors
# ================================
sub covariance {
    my ( $X ) = @_;
    
    my $Diff = $X - average( $X-&gt;xchg(0,1) );
    
    my $Sigma = ( 1 / ( $X-&gt;getdim(1) - 1 ) )
                * transpose( $Diff ) x $Diff;
    
    return $Sigma;
}
&lt;/CODE&gt;</field>
</data>
</node>
