Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Dataflow programming on CPU and GPU using AI::MXNet

by bliako (Monsignor)
on Dec 18, 2019 at 12:04 UTC ( [id://11110332]=perlmeditation: print w/replies, xml ) Need Help??

Computational pipelines, often called Dataflows, are just Graphs describing the interaction of Data with Operators and other Data to produce more Data.

a*b + c*d is a Dataflow.

And so is a (feed forward) Neural Network: read input, distribute input matrix to first layer, pass it through activations, sum outputs and distribute to second layer and so on until the output layer. Deep Neural Networks are so complex, deep and convoluted that they thought dataflow programming (which is not new as a field) will aid their training and use. And so TensorFlow and MXNet and others were created.

The really unfortunate thing is that most of these frameworks are investing heavily on Python interfaces even if internally they use C/C++ for obvious reasons :) . The fact that Python is a TIOOWTDI regime will lead this field to serious cargo-culting (e.g. "In a sparse network, it’s more likely that neurons are actually processing meaningful aspects of the problem." which I have seen before when working with Neural Networks, mid-90's), script-kidding practices and eventual stagnation. Of course the field will recover. And faster and sooner, if and when Python is replaced or other high-level-er (script) languages are equally supported. Nothing stops the machine taking over ...

In Perl, there is an excellent set of modules written by Sergey Kolychev under AI::MXNet based on Apache's MXNet. Note that it is active and very recently updated: Feb 23, 2019 !!! That's a good sign.

My cpu and I have spent a lot of cycles trying to install the pre-requisite libraries of Apache's MXNet written in C/C++ and offering also CUDA capabilities. Two hints: do not use github repo version and use Makefile (instead of that cmake).

Now that I have it all installed (MXNet libraries, Perl module and also R package - unrelated to this post) I would like to share with you all just what size of doors this package opens by introducing a basic dataflow operating on scalars and matrices both on CPU and GPU! The very fact that this package offers, on the side, GPU capabilities within Perl makes it, in my opinion, very very promising. I can't see much offering GPU at CPAN at the moment and here we have one package which opens both GPU and Deep Learning worlds to Perl hackers. (I have no affiliation with S.Kolychev whatsoever)

So, here is some code to get you started on implementing a pipeline to calculate e=a*b+c*d. At first these AI::MXNet::Symbol (e.g. a,b,c,d,e) will be scalars represented as 1-dimensional AI::MXNet::NDArray (which is somewhat similar to PDL's arrays):

use strict; use warnings; use AI::MXNet qw(mx); # specify which context we want this dataflow to be executed in # for CPU use mx->cpu(0); (note, (0) does not mean cpuid or something, + it is not used) # for GPU use mx->gpu(0); (0 denotes gpu device id) my $ctx = mx->cpu(0); # create 1D arrays with values: 1, 2, 3, 4 for a b c d respectively # these are DATA my $a_data = mx->nd->array([1], ctx => $ctx); my $b_data = mx->nd->array([2], ctx => $ctx); my $c_data = mx->nd->array([3], ctx => $ctx); my $d_data = mx->nd->array([4], ctx => $ctx); # these are SYMBOLS my $a = mx->symbol->Variable('A'); my $b = mx->symbol->Variable('B'); my $c = mx->symbol->Variable('C'); my $d = mx->symbol->Variable('D'); # these is the EXPRESSION to evaluate # basically our dataflow graph (but still no data on it, just descript +ion) my $e = ($a*$b) + ($c*$d); print "e=".$e."\n"; # this is how we associate data with symbols and specify # whether we want to run this on CPU or GPU my $exe = $e->bind( ctx => $ctx, # this is how we bind data to symbols so our dataflow graph ca +n be "executed" # and a result comes out # Note: create the arrays on the same device as executing them +, i.e. context, ctx, # must be the same here and above in creating the arrays. args => {'A'=>$a_data, 'B'=>$b_data, 'C'=>$c_data, 'D'=>$d_data} ); # propage inputs to the output(s) $exe->forward(1); # we need the first (and only one at this case) output as PDL array print "output: ".$exe->outputs->[0]->aspdl."\n";

And this is the result:

output: [14]

Now, let's replace 1-dimensional data with 2x2 arrays! Just create your data as thus:

# 2d array data, initialised to these rows (edit: updated to specify c +tx too) my $a_data = mx->nd->array([[1,2],[3,4]], ctx => $ctx); my $b_data = mx->nd->array([[5,6],[7,8]], ctx => $ctx); my $c_data = mx->nd->array([[9,10],[11,12]], ctx => $ctx); my $d_data = mx->nd->array([[13,14],[15,16]], ctx => $ctx);

But, wait because the * operator on matrices denotes element-to-element multiplication and not the actual matrix multiplication. For that we use:

my $e = $a->dot($b) + $c->dot($d);

And the result is:

[ [286 308] [366 396] ]

If not evaluating ginormous matrix expressions using a Dataflow framework (which will also parallelise when parallelisation is possible) was not enough we have here another huge door opening for you number crunchers and Perl hackers: the ability to do this using the GPU via CUDA! Just replace the context above with the GPU-related code. And there is light! In this way, CUDA can be used via Perl not only for doing neural network stuff but for any kind of computation which you can express in a Dataflow. For example, load two images as AI::MXNet::NDArray and multiply them on the GPU! Or do some signal processing by loading an mp3 voice file as NDArray! (this paragraph was a 5' later addition)

One thing that does not add up is that AI::MXNet::NDArray says: "However, NDArray is row-major, unlike the PDL that is column-major.". But PDL is row-major too! So have that in mind.

Sources+more reading:

If that does not keep you excited over the Festivus, I don't know what will ...

Thank you Sergey Kolychev ++

bw, bliako

ps. some last 5 min additions and cosmetic changes. Also note the comments in the code provided for more information.

Replies are listed 'Best First'.
Re: Dataflow programming on CPU and GPU using AI::MXNet
by Ea (Chaplain) on Dec 19, 2019 at 13:45 UTC
    Do you think you will be able to make a series of posts about how to use AI::MXNet? It's something that I would appreciate reading.

    I've been meaning to dip my toes into PDL as a way of working my way into Data Science-y things without needing to do everything according to Python recipes. I think Ovid's post has stirred up some interest in exploring and promoting tools in Perl. There's a lot out there in CPAN already. It just needs some love and blogging, by which I mean taking the time to explain how to do fun things with numbers, etc.

    Which is just what we do.

    such as 100 PDL exercises and 101 PDL exercises

    Ea

    Sometimes I can think of 6 impossible attributes before breakfast.

      Do you think you will be able to make a series of posts about how to use AI::MXNet?

      Yes, that's my intention. I will post here when I make more progress. The above was my first visit to MXNet-land. And I wanted to stress the fact that 1) you get GPU access on the side and 2) The underlying Dataflow "engine" by MXNet does parallelisation and lazy evaluation as it sees fit without the user being concerned, so this is quite higher up than more traditional ways to do such computations. And applications using these features do not have to be within neural networks...

      If anyone has any real applications outside neural networks, involving tensor/matrix/vector operations then we can work out a demo of the above 2 features. This (i mean non-neural-network apps) will expand the user-base of AI::MXNet

      Next stop neural networks. Possibly sooner than a month.

      bw, bliako

        PDL natively has forward dataflow (if you opt in by using $pdl->doflow - see the next PDL CPAN version for an updated PDL::Dataflow doc that lays all this out, or check out the GitHub version in Basic/Pod), and has for decades.

        Lazy evaluation currently only happens with slice-like operations, but there are plans to switch to lazy evaluation in order to have not only loop-fusion by creating custom operations that would e.g. do the a*b + c*d with only one round-trip from RAM through the CPU to RAM, but also GPU processing. See https://github.com/PDLPorters/pdl/issues/349 for discussion and/or to participate!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://11110332]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2024-03-28 16:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found