Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

RFC: Getting Started with PDL (the Perl Data Language)

by lin0 (Curate)
on Feb 02, 2007 at 20:06 UTC ( [id://598007]=perlmeditation: print w/replies, xml ) Need Help??

Greetings Fellow Monks,

I have written several posts related to the Perl Data Language (PDL) but I have not provided an introduction to PDL. I thought it was time to fix that. I started writing a tutorial on PDL. As I was writing, I notice that it would become too long for one post. So, I decided to split it up into three posts. This one will be an introduction to PDL and operations with Piddles. The second one will be on data visualization. And the third one will show you an example on data analysis. I hope you enjoy the tutorials and please help me improve them by sharing your insightful comments.

Cheers,

lin0

Update:

Fixed a typo. Fixed some problems with spoiler tags. Broke up long code lines


Table of Contents

Introduction

Imagine you, Just Another Perl Hacker, were assigned to this new project that involves heavy numerical computation. Most of your peers recommend you to use C or C++. Some others recommend you the language of the Snake -sorry I forgot the name ;-). What can you do if you really want to keep using Perl? Using PDL is the answer. The Perl Data Language (PDL) is a package that gives Perl the ability to compactly store and speedily manipulate the large N-dimensional data sets that are common in scientific and other data intensive programming tasks. To achieve such a great performance, PDL uses C (and sometimes Fortran) to efficiently handle multidimensional data sets. For the rest of the tutorial, I will assume you have a working copy of PDL already installed. Once you have PDL installed, you can use it in Perl scripts by simply declaring: use PDL;. If you have not yet installed PDL, I will recommend you to have a look at the Appendix for some of the pre-requisites for having PDL installed without problems. And if you get into troubles, I recommend you to ask in one of the PDL mailing lists or here at the Monastery. Now, we are ready to start.

The Interactive Shell

A very useful interactive shell (perldl) is provided with your copy of PDL. To start perldl, you just have to type perldl in a terminal window (or a command prompt window if you are using a Windows system). perldl allows you to directly type PDL commands and see their results. One key feature of perldl is that it gives you access to online help. By typing help followed by a function name you get the online documentation for that function (note that you could alternatively type ? followed by the function name). With the online help and the PDL Cheat Sheet you can access most of the PDL documentation. But what happens when you don't know the name of the function and you cannot figure it out from the PDL Cheat Sheet? The command apropos comes to your help. By typing apropos followed by a concept will give you a list of functions that have that concept in the first line of their description (note that you could alternatively type ?? followed by the concept). Let us use those two commands:

Type:

perldl>? inv

What did you get?

Now, type:

perldl>?? inverse

What did you get?

For more information on perldl, you can have a look at the perldl page of the PDL Documentation. If you are interested in configuring perldl to load modules, set shell variables, etc. during start-up, you might use the perldlrc configuration file as described in the thread of the PDL Cheat Sheet

Before moving to the next section, I have to recommend you an invaluable functionality of perldl: the online demos. Just type:

perldl>demo

and you will see all the different demos that are available. For instance, let's try:

perldl>demo PDL

What did you get?

Playing with Piddles

One thing you need to know about PDL is that it introduces a new data structure usually called a “piddle”. Piddles are numerical arrays stored in column major order (meaning that the fastest varying dimension represent the columns following computational convention rather than the rows as mathematicians prefer). Even though, piddles look like Perl arrays, they are not. Unlike Perl arrays, piddles are stored in consecutive memory locations facilitating the passing of piddles to the C and FORTRAN code that handles the element by element arithmetic. One more thing to note about piddles is that they are referenced with a leading $. In the rest of this section I will use the interactive shell.

Creating Piddles

The easiest way to create piddles is by using the function pdl followed by the piddle. For example:

perldl> $scalar_piddle = pdl 42 perldl> $one_dimensional_piddle = pdl(1,2,3) perldl> $two_dimensional_piddle = pdl([1,2,3],[4,5,6])

To print information about the piddles you just created, you can simply type:

perldl> ? vars

The corresponding output is:

PDL variables in package main:: Name Type Dimension Flow State Mem ---------------------------------------------------------------- $scalar_piddle Double D [] P 0.01Kb $one_dimensional_piddle Double D [3] P 0.02Kb $two_dimensional_piddle Double D [3,2] P 0.05Kb

Another way of creating piddles is using functions such as zeroes (create a piddle with all the elements equal to zero), ones (create a piddle with all the elements equal to one), and identity (create a piddle with the elements in the main diagonal equal to one).

perldl> $pdl_of_zeroes = zeroes(10,2) perldl> p $pdl_of_zeroes [ [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] ] perldl> $pdl_of_ones = ones(5,3) perldl> p $pdl_of_ones [ [1 1 1 1 1] [1 1 1 1 1] [1 1 1 1 1] ] perldl> $pdl_identity = identity(4,4) perldl> p $pdl_identity [ [1 0 0 0] [0 1 0 0] [0 0 1 0] [0 0 0 1] ]

There are many more functions for creating piddles. Some of them are: random, grandom, randsym, sequence, xvals, yvals, zvals, xlinvals, ylinvals, zlinvals, rvals, axisvals, allaxisvals.

Piddles by default are of type double. But, they could be byte, ushort, short, float, long. You could also create piddles by doing a type conversion. For example,

perldl>$new_pdl_of_zeroes = zeroes(byte, 10, 2)
This piddle requires less memory than $pdl_of_zeroes. What would its memory requirement be?

Doing Arithmetic Operations

The following Perl operators work in PDL as they do in Perl. The only difference is that in PDL, they act element-by-element on the whole piddle

+ - * / ** > < >= <= == != << >> & | ^ += -= *= /= %= **= >>= <<= &= |= ^= <=> ! % ++ --

Similarly, standard Perl mathematical functions also act element-by-element on the whole piddle.

abs acos acosh asin asinh atan atan2 atanh cos cosh sin sinh tan tanh sqrt ceil floor rint exp log log1

Note that to perform a matrix multiplication, you should use the operator: x

Emptying Piddles

Simply assign a null piddle:

perldl>$pdl_of_zeroes = null;

a 'null' piddle is a kind of empty piddle, which can grow to appropriate dimensions to store a result.

Getting Piddles' Information

To get the number of elements in a piddle, you can use:

perldl>$n = nelem($new_pdl_of_zeroes);

or alternatively:

perldl>$n = $new_pdl_of_zeroes->nelem();

This last notation is called the method notation. Piddles are implemented as Perl objects. Objects can have internal functions called methods. Methods can only be used on the class of object they belong to. Many of PDL's functions are available as methods too. The method notation is very common when you have to gather information from piddles or when you have to access values in piddles as it makes more visual sense to have the method call after the variable.

To get the piddle dimensions as a Perl list, you can use:

perldl>@dims = dims($new_pdl_of_zeroes);

or alternatively:

perldl>@dims = $new_pdl_of_zeroes->dims;

To get formatted string with information about a piddle, you can use:

perldl>$info = $new_pdl_of_zeroes->info;

info allows you to specify the type of information you want to extract by using an optional argument with the format: "%<width><letter>". The width is optional and the letter is one of:

T Type</li> D Formatted Dimensions F Dataflow status S Some internal flags (P=physical,V=Vaffine,C=changed) C Class of this piddle, i.e. "ref $pdl" A Address of the piddle struct as a unique identifier M Calculated memory consumption of this piddle's data area

For example, to get the approximate memory consumption of $new_pdl_of_zeroes, you could use:

perldl> p $new_pdl_of_zeroes->info("Mem : %M");

the output is:

Mem : 0.02Kb

Note the use of p. In perldl, p is a shortcut for print.

Accessing Values in Piddles

Before showing you how to access values in a piddle, let's create a new piddle:

perldl> p $piddle = sequence(10,12);

output:

[ [ 0 1 2 3 4 5 6 7 8 9] [ 10 11 12 13 14 15 16 17 18 19] [ 20 21 22 23 24 25 26 27 28 29] [ 30 31 32 33 34 35 36 37 38 39] [ 40 41 42 43 44 45 46 47 48 49] [ 50 51 52 53 54 55 56 57 58 59] [ 60 61 62 63 64 65 66 67 68 69] [ 70 71 72 73 74 75 76 77 78 79] [ 80 81 82 83 84 85 86 87 88 89] [ 90 91 92 93 94 95 96 97 98 99] [100 101 102 103 104 105 106 107 108 109] [110 111 112 113 114 115 116 117 118 119] ]

To get the element (5,3) , as a scalar, you could use:

perldl> p at($piddle, 5,3);

or alternatively:

perldl> p $piddle->at(5,3);

Now, let's access one or more element (or what is called a slice) at the time. One way of getting rectangular slices of piddles is using the slice function. For example, to access the last row and first two columns of $piddle, you could type:

perldl> p $slice_1a = slice($piddle, "0:2,-1");

output:

[ [110 111 112] ]

or alternatively:

perldl> p $slice_1b = $piddle->slice("0:2,-1");

output:

[ [110 111 112] ]

To access all the rows and the last two columns of $piddle, you could type:

perldl> p $slice_2a = slice($piddle,"-1:-2,:");

output:

[ [ 9 8] [ 19 18] [ 29 28] [ 39 38] [ 49 48] [ 59 58] [ 69 68] [ 79 78] [ 89 88] [ 99 98] [109 108] [119 118] ]

or alternatively:

perldl> p $slice_2b = $piddle->slice("-1:-2,:");

output:

[ [ 9 8] [ 19 18] [ 29 28] [ 39 38] [ 49 48] [ 59 58] [ 69 68] [ 79 78] [ 89 88] [ 99 98] [109 108] [119 118] ]
You can also access rectangular slices in a piddle by using NiceSlices (to use NiceSclices in a Perl script, you need to declare: use PDL::NiceSlice;). Here are some examples:
perldl> p $second_column = $piddle(1,:);

output:

[ [ 1] [ 11] [ 21] [ 31] [ 41] [ 51] [ 61] [ 71] [ 81] [ 91] [101] [111] ]
perldl> p $second_row = $piddle(:,1);

output:

[ [10 11 12 13 14 15 16 17 18 19] ]
perldl> p $first_to_third_columns = $piddle(0:2,:);

output:

[ [ 0 1 2] [ 10 11 12] [ 20 21 22] [ 30 31 32] [ 40 41 42] [ 50 51 52] [ 60 61 62] [ 70 71 72] [ 80 81 82] [ 90 91 92] [100 101 102] [110 111 112] ]
perldl> p $even_columns = $piddle(0:-1:2,:);

output:

[ [ 0 2 4 6 8] [ 10 12 14 16 18] [ 20 22 24 26 28] [ 30 32 34 36 38] [ 40 42 44 46 48] [ 50 52 54 56 58] [ 60 62 64 66 68] [ 70 72 74 76 78] [ 80 82 84 86 88] [ 90 92 94 96 98] [100 102 104 106 108] [110 112 114 116 118] ]
perldl> p $odd_columns = $piddle(1:-1:2,:);

output:

[ [ 1 3 5 7 9] [ 11 13 15 17 19] [ 21 23 25 27 29] [ 31 33 35 37 39] [ 41 43 45 47 49] [ 51 53 55 57 59] [ 61 63 65 67 69] [ 71 73 75 77 79] [ 81 83 85 87 89] [ 91 93 95 97 99] [101 103 105 107 109] [111 113 115 117 119] ]
perldl> p $even_rows = $piddle(:,0:-1:2);

output:

[ [ 0 1 2 3 4 5 6 7 8 9] [ 20 21 22 23 24 25 26 27 28 29] [ 40 41 42 43 44 45 46 47 48 49] [ 60 61 62 63 64 65 66 67 68 69] [ 80 81 82 83 84 85 86 87 88 89] [100 101 102 103 104 105 106 107 108 109] ]
perldl> p $odd_rows = $piddle(:,1:-1:2);

output:

[ [ 10 11 12 13 14 15 16 17 18 19] [ 30 31 32 33 34 35 36 37 38 39] [ 50 51 52 53 54 55 56 57 58 59] [ 70 71 72 73 74 75 76 77 78 79] [ 90 91 92 93 94 95 96 97 98 99] [110 111 112 113 114 115 116 117 118 119] ]
perldl> p $one_element_slice = $piddle(5,5);

output:

[ [55] ]

What can you do to access irregular slices? You can us the function dice. Here are some examples:

To get the elements that are simultaneously in columns 1 and 3 and rows 0 and 7, we type:

perldl> p $dice1 = dice($piddle, [1,3],[0,7]);

output:

[ [ 1 3] [71 73] ]

To get the elements that are simultaneously in all columns and rows 0, 1 and 8, we type:

perldl> p $dice2 = $piddle->dice(X,[0,1,8]);

output:

[ [ 0 1 2 3 4 5 6 7 8 9] [10 11 12 13 14 15 16 17 18 19] [80 81 82 83 84 85 86 87 88 89] ]

One thing about slices is that they do not occupy additional memory space. They are just representations of portions of the original piddle. Modifying a slice will modify the original piddle. Note: to assign values to slices, you must use the operator: .=. For example:

perldl> p $dice2 .= -99;

output:

[ [-99 -99 -99 -99 -99 -99 -99 -99 -99 -99] [-99 -99 -99 -99 -99 -99 -99 -99 -99 -99] [-99 -99 -99 -99 -99 -99 -99 -99 -99 -99] ]

The original piddle ($piddle) now has the following values:

perldl> p $piddle [ [-99 -99 -99 -99 -99 -99 -99 -99 -99 -99] [-99 -99 -99 -99 -99 -99 -99 -99 -99 -99] [ 20 21 22 23 24 25 26 27 28 29] [ 30 31 32 33 34 35 36 37 38 39] [ 40 41 42 43 44 45 46 47 48 49] [ 50 51 52 53 54 55 56 57 58 59] [ 60 61 62 63 64 65 66 67 68 69] [ 70 71 72 73 74 75 76 77 78 79] [-99 -99 -99 -99 -99 -99 -99 -99 -99 -99] [ 90 91 92 93 94 95 96 97 98 99] [100 101 102 103 104 105 106 107 108 109] [110 111 112 113 114 115 116 117 118 119] ]

To access a more general subset of a piddle, you could use the functions which and index. For example, if you wanted to assign a value of 255 to those elements of $piddle that originally had a value greater than 43 and less than 72, you could proceed as follows:

perldl> p $indx = intersect( which($piddle>43), which($piddle<72) );

output:

[44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71]

Note that which will return a one dimensional piddle with the indices of non-zero values of the mask you passed. Because our piddle is two dimensional, which flattens the piddle and then find the indices. which($piddle>43) will find the indices of the elements with a value greater than 43. which($piddle<72) will find the indices of the elements with a value less than 72. The function intersect finds the intersection of the two piddles.

p $piddle->flat->index( $indx ) .= 255; #assigning the new values

output:

[255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255]

Note that because which returns a one dimensional piddle, you have to flatten $piddle (by using the function flat) before using the function index. The new value of $piddle is:

perldl> p $piddle [ [-99 -99 -99 -99 -99 -99 -99 -99 -99 -99] [-99 -99 -99 -99 -99 -99 -99 -99 -99 -99] [ 20 21 22 23 24 25 26 27 28 29] [ 30 31 32 33 34 35 36 37 38 39] [ 40 41 42 43 255 255 255 255 255 255] [255 255 255 255 255 255 255 255 255 255] [255 255 255 255 255 255 255 255 255 255] [255 255 72 73 74 75 76 77 78 79] [-99 -99 -99 -99 -99 -99 -99 -99 -99 -99] [ 90 91 92 93 94 95 96 97 98 99] [100 101 102 103 104 105 106 107 108 109] [110 111 112 113 114 115 116 117 118 119] ]

Sample Perl Script

In this section, I present you a more elaborate example on how to access general subsets of a piddle. To have more fun, let's get one image from the web, read the image into a piddle and substitute the brightest pixels with very dark pixels (I chose to get the pixels with a value greater than 220 and to assign them a value of zero. However, you can certainly play with those numbers). Without further ado, here is the script.

#!/usr/bin/perl use warnings; use strict; use LWP::Simple; use PDL; use PDL::Graphics::PGPLOT; # getting a brain image from the ITK (http://itk.org) CVS website my $img_url = "http://www.itk.org/cgi-bin/viewcvs.cgi/*checkout*/Examp +les/Data/BrainProtonDensitySliceBorder20.png?rev=1.2&root=Insight"; my $savefilename = "brain.png"; my $status = getstore($img_url,$savefilename); print $status."\n" if is_success($status); # reading the image and creating a copy of it $brain = rim( “brain.png” ); $brain2 = $brain->copy; #plotting the two images in two different windows $dev = '/XSERVE'; $win = PDL::Graphics::PGPLOT::Window->new( { Dev => $dev } ); $win->imag( $brain ); $win2 = PDL::Graphics::PGPLOT::Window->new( { Dev => $dev } ); $win2->imag( $brain2 ); # The pixel values in the images are in the range [0, 255] # (with zero representing black and 255 representing white). # # Let's change all the pixel values greater than 220 to be # equal to zero. This will create some black spots in the # image $indx = which( $brain > 220 ); $brain2->flat->index( $indx ) .= 0; # Let's plot the new image $win2->imag( $brain2 ); # Now, we free up the windows so that you can close them. $win->close(); $win2->close();

Appendices

To learn more:

PDL Module Dependencies

PDL is a Perl interface to a number of useful libraries for scientific programming. If you want a smooth installation (either using tools as CPAN.pm or PPM or doing a manual installation), it is a good idea to install all the required libraries before installing PDL. To help you identify what you would need, here is a list of pre-requisites:

  • Perl 5.6.1+ (5.8.x recommended)
  • ExtUtils::MakeMaker (latest version)
  • C-compiler & Fortran compiler
  • PGPLOT libraries & C-binding (For PGPLOT)
  • OpenGL or Mesa (For TriD)
  • netpbm package (For PDL::IO::Pic)

You can find a more detailed list of pre-requisites at the PDL Project dependencies web page. For people using Debian or a Linux Distribution based on Debian, there is a list of dependencies available at Debian's site.

Replies are listed 'Best First'.
Re: RFC: Getting Started with PDL (the Perl Data Language)
by zentara (Archbishop) on Feb 03, 2007 at 11:34 UTC
    I think this is great. I do have a question/suggestion though.

    I have just been looking at the computational accuracy of the c language, specifically how accurate an angle can you precisely work with. It turns out that there is a 15 decimal point limit to PI, even if you declare it as a long double. So I was wondering, wouldn't it be good to mention the computational accuracy of PDL. For instance, can it use PI to 50 decimal places and still be accurate? Can a PDL data type hold huge numbers, or can piddles be a Math::BigInt, etc.

    I've been reading that Fortran (on which PDL is based) is superior to c in this regard, but is PDL?


    I'm not really a human, but I play one on earth. Cogito ergo sum a bum

      Hi zentara

      The core of PDL is written in the C programming language (see PDL::Internals). Fortran is only used in some libraries such as PGPLOT (used for plotting) and Slatec (used for manipulating matrices, calculate Fast Fourier Transforms, fit data using polynomials, interpolate data, etc.). About the piddles' data types, here is a table that will show you the data types already defined (in the file Basic/Core/Types.pm.PL):

      PDL Type Real C Type ---------- -------------- byte unsigned char short short ushort unsigned short long long int longlong long long int float float double double

      Note that you can add new types as explained in the PDL documentation. You might also want to read the documentation on PDL::PP to learn how to add your own routines.

      In short, I believe the computational accuracy of PDL to be closer to that of C than to the computational accuracy of Fortran.

      Cheers,

      lin0

        Dare i ask why they decided to change the data type names? I'm assuming it has to do something with how the data type is represented in perl?

        meh.
      It turns out that there is a 15 decimal point limit to PI, even if you declare it as a long double

      How does that come about ? 8 bytes are allocated for a double, and 12 bytes for a long double - I would therefore have envisaged that extra precision could be attained using the "long double".

      Cheers,
      Rob
        I'm not a c compiler expert, but have a look at double precision pi It may be the difference between computational precision and "%Lf" display precision. I started looking into it, when I questioned the way GLib defined pi as a constant. They defined it out to 50 decimal places, but any use of it in it's long double is limited to 15 decimal places.

        One of the c gurus in that thread said that c's precision is only gauranteed to 10 decimal places, but with IEEE standards, it's common to see 15. I see 15.

        Eventually, I found mpfr which lets you set how many digits of precision you want to use.

        If you can show me a simple c program that computes and displays values out to 50 decimal places, with normal c, I would be greatful. Everything I see truncates it (pi) to 3.(15 decimal places).

        For example, in this code, the value of pi is correctly printed as a string on the first line of output, but the subsequent lines all have garbage after the 15th decimal place.

        #include <gtk/gtk.h> /* gcc -o test test.c `pkg-config --cflags --libs gtk+-2.0` */ int main (){ /* how the headers define G_PI */ /* #define G_PI 3.1415926535897932384626433832795028841971693993751 + */ long double PI = G_PI; g_print("3.1415926535897932384626433832795028841971693993751\n"); g_print("%0.50e\n",G_PI); g_print("%0.50Lf\n",G_PI); g_print("%0.50Lf\n",PI); g_print("%0.50Lg\n",PI); g_print("%0.50Le\n",PI); return 0; }
        Output:
        ^^^^^^^^^^^ -> unstable after 15 dec +imal 3.1415926535897932384626433832795028841971693993751 3.14159265358979311599796346854418516159057617187500e+00 -0.00000000000000000000000000000000000000000000000000 3.14159265358979311599796346854418516159057617187500 3.141592653589793115997963468544185161590576171875 3.14159265358979311599796346854418516159057617187500e+00 ^^^^^^^^^^^^ -> unstable after 15 dec +imal

        I'm not really a human, but I play one on earth. Cogito ergo sum a bum

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://598007]
Approved by liverpole
Front-paged by liverpole
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (6)
As of 2024-03-19 05:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found