Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

The Ovidian Transform

by Ovid (Cardinal)
on Dec 31, 2001 at 13:21 UTC ( #135353=perlmeditation: print w/ replies, xml ) Need Help??

How to gain a 50% performance increase over the Schwartzian Transform

Note: Where else could I get away with a title as cheeky as this one? :) If anyone can make suggestions to improve the C code, that would be welcome. I'd also like to point out that the title is a joke. There is nothing revolutionary about using C to resolve Perl performance bottlenecks. Also, this mode was more an exploration of Inline::C and not an attempt to speed up the Schwartzian Transform.

I hate forgetting things that I once knew. I returned to college when I was in my mid-twenties because I hated the fact that I had forgotten how to speak French¹. You can understand, therefore, why it bugs me that I no longer know how to program in C. I started programming in the early 80s, writing games in BASIC. BASIC, as most of you know, runs as fast and elegantly as a dead gazelle. In order to speed up the critical portions of my programs, I taught myself assembler. It was tedious, but I had to do it. You can imagine how delighted I was when I learned C in college.

That was 15 years ago. I haven't programmed in C since. I've been trying to improve my resume and started Java classes. I'm going to keep taking them because knowing Java will tremendously expand my marketability, but I really don't care for Java². So, I was browsing through Powell's technical bookstore here in Portland, Oregon, and picked up a copy of O'Reilly's "Practical C". It was a godsend.

Relearning C is fine and dandy, but putting it to some practical use is finer and, um, dandier. So, I went to the CPAN and downloaded Inline. The Inline module allows you to embed other programming languages, such as C, Java, or Python, in Perl. If you have extensive libraries written in another language, it's great to be able to simply reuse them rather than rewrite them in Perl. In the case of Inline::C (which is bundled with Inline), you can also gain significant performance benefits.

If you've ever tried to embed C in Perl without Inline, I feel sorry for you. I never considered it due to the daunting nature of XS and SWIG. With Inline, it's almost trivial. I installed Inline using Cygwin on a Windows 98 machine. The install was flawless, though it took longer to install than any previous module. One of the sections of make test took 504 wallclock seconds! However, having such a large module install without problems on a Cygwin box told me that someone had done an excellent job of things.

After the install, I ran pod2html on Inline.pod, C.pod, and C-Cookbook.pod. Reading through them was fairly straightforward. Inline.pod tells you to go ahead and skip to the Cookbook if you want to get started right away, but I wouldn't. The documentation can be a bit dry at times, but it's very useful and the first time some of your embedded C fails to compile, you'll be grateful for reading the documentation because that is chock full 'o tips and tricks of which you should be aware.

After working through many of the examples in the Cookbook, I decided it was time to try writing something myself. My first inclination, to be fair, was to try writing CGI::Inline. Many people balk at the performance of the CGI module, but since I need to walk before I can run (and I don't have as much free time as I would like), I figured I would would start with something small. The Schwartzian Transform (ST) seemed like a perfect candidate for this. The ST is one of the fastest sorting methods available, so trying to speed it up a bit with C seemed like a nice challenge. The basic structure of the ST is this:

@new_list = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, expensive_function( $_ ) ] } @old_list;

Basically, you take an array of data and turn every element into a 2 item array reference with first item in the reference being the original data and the second item in the array being the data you want to sort on (in a sortable form, of course). Then, you sort the array by the second item and map the original data back to an array.

Hmm... what should I optmize? Well, if &expensive_function isn't crying out for some help, I shouldn't be programming - of course, some would argue that I shouldn't be programming anyway, but I try not to listen to the voices coming from the toaster.

I decided to tackle the traditional problem of sorting strings with an embedded date. I created an arbitrary data format like the following:

a_bunch_of_random_letters|19670620|and_some_more_random_letters

I need to grab that date as efficiently as possible. I used the following Perl snippet:

sub get_date_perl { my $date = substr( $_[0], index( $_[0], '|' ) + 1, 8 ); return join '', unpack( "A4A2A2",$date ); }

Using substr, index, and unpack is about as efficient as you can get. Now, I need to figure out how to do that with Inline. Passing a scalar to an Inline function and getting back a scalar is fairly simple. I merely define the the correct return type of the function (SV*), pass it a string and return a new scalar value. I do this with this:

SV* get_date( char* str ) { /* some stuff goes here */ return newSVpv(date,8); }

I merely call the function with a char* foo argument and padding it with a NUL is handled for me. Getting the date from this is trivial. Here's the full function as I would embed it in Perl.

#!/usr/bin/perl -w use strict; use Inline C =><<'END_OF_C_CODE'; #include <string.h> SV* get_date( char* str ) { char date[9]; /* the date to return (with an extra byte for th +e NUL) */ int index = 0; /* index of character in string + */ /* note the lack of error checking. I have a guaranteed date forma +t (in generated test data). You wouldn't do this in the real world. +*/ while ( str[index++] != '|' ); strncpy( date, &str[index], 8); return newSVpv( date, 8 ); } END_OF_C_CODE

If you prefer, you can also add the C code after the __DATA__ token or in another module or file. Now, the first time this is run, it will take quite some time as Inline has to go through all of the steps necessary to compile the C and make it accessible to Perl. On subsequent runs, Inline generates a checksum for the C code and, if the checksum is the same, Inline will not recompile, as it assumes that the C has not changed. You can override this behavior if you wish.

Once this has been done, you call the C function like you would call any other function:

my $date = get_date( $string );

Now that I have my Perl code and C code to test against, I run some benchmarks. There's no point in repeating them here as the two ran neck and neck. Repeatedly. I had taken too simple of a task and Perl was too efficient. I decided to complicate things a bit. Many Americans use a MMDDYYYY date format. Now, I not only have to grab the date, I need to rearrange the digits into a sortable YYYYMMDD format. Here's the Perl:

sub get_date_perl { my $date = substr( $_[0], index( $_[0], '|' ) + 1, 8 ); # skipping the scalars and doing return join '', unpack... # did not result in a performance improvement. my ( $month, $day, $year ) = unpack( "A2A2A4",$date ); return $year.$month.$day; }

Here's the C code I whipped up:

SV* get_date (char* str) { char date[9]; /* the date to return */ char new_date[9]; /* this will be the date after reordering */ int index = 0; /* index of character in string */ while ( str[index++] != '|' ); strncpy( date, &str[index], 8); new_date[0] = date[4]; new_date[1] = date[5]; new_date[2] = date[6]; new_date[3] = date[7]; new_date[4] = date[0]; new_date[5] = date[1]; new_date[6] = date[2]; new_date[7] = date[3]; return newSVpv(new_date,8); }

Now we have something just a tad more interesting. It turns out that we also have something that's over 20% faster in C:

Benchmark: timing 50 iterations of Inline, Perl... Inline: 41 wallclock secs (40.72 usr + 0.00 sys = 40.72 CPU) @ 1 +.23/s (n=50) Perl: 53 wallclock secs (53.21 usr + 0.00 sys = 53.21 CPU) @ 0 +.94/s (n=50)

Now, that's nice and all, but I was hoping for something a bit more dramatic. Taking a closer look at the ST, we see that the last two steps are a sort, which returns a list, and a map, which returns a list. Wouldn't it be nice if the sort just returned what we wanted? Imagine doing this:

@new_list = map_sort { $a->[1] <=> $b->[1] } map { [ $_, expensive_function( $_ ) ] } @old_list;

Of course, that won't work, but I started thinking a bit more about this. My initial idea was to write a bit of C code that would read the array off the stack, create a struct with the original data and the date, sort on the date, and just return the original data. Then, I had a weird inspiration. Now, my original idea is probably better, as the following is filled with dangers and will not always be appropriate for your task, but it should work in many cases.

I remembered that bikeNomad had asked How to make bi-modal variables like $! ?. After some discussion, and taking an idea from tye, he came up with this code snippet. It occurred to me that I could strip the date, make the original value bi-modal, sort on the numeric value and return the string value all in one go! I call this (pardon the hubris) "The Ovidian Transform" :) It just has one map and one sort. Here are the benchmarks:

Benchmark: timing 50 iterations of Inline, Ovidian, Perl... Inline: 41 wallclock secs (40.72 usr + 0.00 sys = 40.72 CPU) @ 1 +.23/s (n=50) Ovidian: 34 wallclock secs (34.71 usr + 0.00 sys = 34.71 CPU) @ 1 +.44/s (n=50) Perl: 53 wallclock secs (53.21 usr + 0.00 sys = 53.21 CPU) @ 0 +.94/s (n=50)

I ran them several times and got consistent results. A performance improvement of about 50% over the Schwartzian Transform. Full code is below. Enjoy!

#!/usr/bin/perl -w use strict; use Inline 'C'; use Benchmark; use vars qw/ $test_data @perl_data @inline_data @ovid_data /; my $date = 'abc|03152001|'; my $c_date = get_date( $date ); my $p_date = get_date_perl( $date ); if ( $c_date eq $p_date ) { print "Dates match: '$c_date'\n"; } else { print "Dates do not match.\nC: '$c_date'\nPerl: '$p_date'\nTermina +ting program.\n"; exit; } print "Generating test data.\n"; $test_data = rand_data(); timethese(50, { 'Perl' => ' @main::perl_data = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, main::get_date_perl( $_ ) ] } @$main::test_data; ', 'Inline' => ' @main::inline_data = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, main::get_date( $_ ) ] } @$main::test_data; ', 'Ovidian' => ' @main::ovid_data = sort { $a+0 <=> $b+0 } map { main::get_date_c( $_ ) } @$main::test_data; ' }); # change to 1 if you want to print this to verify results if ( 0 ) { open PERL, "> perl.txt" or die $!; open INLINE, "> inline.txt" or die $!; open OVID, "> ovid.txt" or die $!; print PERL join "\n", @perl_data; print INLINE join "\n", @inline_data; print OVID join "\n", @ovid_data; close OVID; close INLINE; close PERL; } sub get_date_c { my $data = shift; my $date = get_date( $data ); my $dual_var; set_both( $dual_var, $data, $date+0 ); return $dual_var; } sub get_date_perl { my $date = substr( $_[0], index( $_[0], '|' ) + 1, 8 ); my ( $month, $day, $year ) = unpack( "A2A2A4",$date ); return $year.$month.$day; } sub rand_data { my @years = qw/ 1999 2000 2001 /; my @months = ( '01' .. '12' ); my @days = ( '01' .. '28' ); # since this is just sample data, # I'm not too worried about getting # this perfect my @sample; for ( 1 .. 10000 ) { my $garbage = rand_letters().'|'; $garbage .= $months [ random( 12 ) ]; $garbage .= $days [ random( 28 ) ]; $garbage .= $years [ random( 3 ) ]; $garbage .= '|'.rand_letters(); push @sample, $garbage; } return \@sample; } sub rand_letters { my $foo = ''; for ( 1 .. int( 100 * rand ) + 1 ) { $foo .= ( 'a'..'z' )[ random( 26 ) ]; } return $foo; } sub random { my $int = shift; return int ( $int * rand ); } __END__ __C__ void set_both(SV* variable, SV* string, SV* numeric) { SvPV(string, PL_na); if(!SvPOKp(string) || (!SvNOKp(numeric) && !SvIOKp(numeric)) ) { croak("Usage: set_both variable,string,numeric"); } sv_setsv(variable,string); if(SvNOKp(numeric)) { sv_setnv(variable,SvNV(numeric)); } else { sv_setiv(variable,SvIV(numeric)); } SvPOK_on(variable); } SV* get_date (char* str) { char date[9]; /* the date to return */ char new_date[9]; /* this will be the date after reordering */ int index = 0; /* index of character in string */ while ( str[index++] != '|' ); strncpy( date, &str[index], 8); new_date[0] = date[4]; new_date[1] = date[5]; new_date[2] = date[6]; new_date[3] = date[7]; new_date[4] = date[0]; new_date[5] = date[1]; new_date[6] = date[2]; new_date[7] = date[3]; return newSVpv(new_date,8); }

1.Perversely, I have forgotten French again. I was in Paris buying a pack of cigarettes and asked the cashier if she had any matches (les allumettes). I said "Pardon, avez-vous des alouttes?" (Pardon me, do you you have any larks?). It was most confusing.

2.It's bloated, slow, has irritating limitations like not being able to subclass String to add methods, etc. Very annoying. On the other hand, it has taught me quite a bit about OO programming and has a tighter type-checking system than Perl. And when was the last time you wrote an applet in Perl?

Cheers,
Ovid

Update: I'm trying to remember which monks helped me with some C questions. wog was one. /msg Ovid ... if you also helped. I'd like to give credit where credit is due!

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Comment on The Ovidian Transform
Select or Download Code
(tye)Re: The Ovidian Transform
by tye (Cardinal) on Dec 31, 2001 at 21:28 UTC

    There are lots of ways to get about a 50% speed increase over an ST. Creating all of those tiny anonymous arrays isn't a speedy task. It does lead to rather simple and non-error-prone code, however.

    One alternative that is as general purpose is to build a separate array of keys to sort on and then sort a list of indices from that:

    my @sorted= do { my @sorton= map { get_date($_) } @data; my @indices= sort { $sorton[$a] cmp $sorton[$b] } 0..@data; @data[@indices]; };
    Another alternative that can be slightly faster but is a bit harder to customize is to combine the key to sort on with the original data into a single string in a reversible manner and use:
    my @sorted= map { restore_orig_data($_) } sort map { prepend_sort_key($_) } @data;
    See also http://www.sysarch.com/perl/sort_paper.html or even http://www.perl.com/doc/FMTEYEWTK/sort.html.

            - tye (but my friends call me "Tye")

      Good one, tye. I've had great success with your second alternative, prepending the sort key to the original data, making a single string and sorting that.

      It has been at *least* 50% faster than the "standard" ST technique, in the instances where I've used it (your mileage may vary, of course :). Mostly, those instances were cases where the data was complex enough that sorting was slow to begin with...

      Update: While this approach probably cannot compete with the raw speed of C, the cost-benefit ratio is far more favorable (to me), given the size of the learning curve I would require in order to be effective using XS (which is essentially what using C in this context boils down to).

      dmm

      You can give a man a fish and feed him for a day ...
      Or, you can
      teach him to fish and feed him for a lifetime
Re: The Ovidian Transform
by hsmyers (Canon) on Dec 31, 2001 at 21:34 UTC
    You might try:
    SV* get_date (char* str) { char date[9]; /* the date to return */ char new_date[9]; /* this will be the date after reordering */ while ( *str++ != '|' ); strncpy( date, str, 8); new_date[0] = date[4]; new_date[1] = date[5]; new_date[2] = date[6]; new_date[3] = date[7]; new_date[4] = date[0]; new_date[5] = date[1]; new_date[6] = date[2]; new_date[7] = date[3]; return newSVpv(new_date,8); }
    Might make a small difference…

    –hsm

    "Never try to teach a pig to sing…it wastes your time and it annoys the pig."
      // ... while ( *str++ != '|' ); strncpy( date, str, 8); // ...

      Given typical current C optimizers, this is unlikely to produce a measurable improvement, although it is definitely more succinct and therefore more elegant (IMHO).

      dmm

      You can give a man a fish and feed him for a day ...
      Or, you can
      teach him to fish and feed him for a lifetime

      While the lack of error checking would certainly be foolish, you can micro-optimize this even more.

      The while() loop isn't smart enough to compile down to a 'scasb' instruction, whereas if you use 'memchr', the compiler is smart enough to play some games with it. While I haven't checked the code strncpy() uses, I know for a fact that memcpy() will reduce to either a 'rep movsb' sequence, or more likely for only 8 bytes, 2 loads and 2 stores (4 instructions). If you're on an Alpha, it'll be 2 instructions, a single load and a single store.

      strncpy() is portable and is good in that respect. memcpy() is less so, since it presumes certain things about the processor architecture.

      /me is glad to see Ovid start hacking Inline::C. It's a lot of fun. Perl is good, but not always perfect. Perl combined with Inline::C can let you do some truly scarey things.

      --Chris

      e-mail jcwren

      Why try to speed up something that you don't know is responsible for your speed problems? Even more, why try to speed up something if you don't know you HAVE speed problems?

      More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason -- including blind stupidity.
          -- W. A. Wulf, "A Case Against the GOTO"

      Words to live by.

          -- Chip Salzenberg, Free-Floating Agent of Chaos

        Well the snap answer is “Why not?” But more to the point why write code that is less efficient than need be? In fact, you can't even get away with the canonical 'lazy' as a reason, since what I wrote takes less typing. And the declaration of an un-needed variable doesn't do a whole lot for clarity—particularly in a routine which already has the potential for confusion, i.e. the two container arrays, where the second is used for the cross-wired transfer, something that might at casual glance seem a puzzle. So since it is clearer, faster, and shorter, I'd argue a win! Re: Mr. Wulf, nice quote, but in this context it somehow reminds me of
        A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines. With consistency a great soul has simply nothing to do. – Ralph Waldo Emerson “Self Reliance”

        –hsm

        "Never try to teach a pig to sing…unless of couse if it's New Year's Eve! Happy New Year to All!."

        I really have to go with chip on this one. I played around with optimization for about 1/2 an hour and found that nothing I could do affected the performance by more than 1% or so.

        I did not have as much time as I would have liked to spend on it, so I did not run profiling tools on that section, nor look at the actual code generation. I focused on that small section because I have a lot of experience with C, and frequently have to do optimizations of that nature because of the microcontrollers I work with (microcontroller compilers typically produce far worse code than GCC, since GCC benefits from a wide developer base, and most microcontroller compilers are 1 to 5 person shops).

        My other interest in speeding it up was a trade off with making it fewer lines. Those 12 or so lines just seemed nasty. If it can be reduced to fewer lines, it often becomes more maintainable. Given that, I have the following replacement:

        SV* get_date (char* str) { char new_date[9]; while (*str++ != '|'); memcpy (new_date, str+4, 4); memcpy (new_date+4, str, 4); return newSVpv(new_date,8); }

        Normally, Inline::C uses -O2 optimization, which is fairly agressive about fragments like that. It's smart enough to recognize that memcpy()s can be replaced with several instructions if the length to move reduces to a compile time constant.

        One would question if the above is actually more maintainable, since the byte copy of new_date[0] = date[4]; is more indicative of what the operation is intending to do. If one is aiming for speed and readability, the new_date[0] = date[4]; relies more on the working knowledge of the compiler.

        No matter what the case, this isn't really the area that needs work to speed this puppy up.

        --Chris

        e-mail jcwren

      If you're not counting on some sort of optimization based on alignment why do a string copy at all?
      while ( *str++ != '|' ); date[0] = str[4]; date[1] = str[5]; date[2] = str[6]; date[3] = str[7]; date[4] = str[0]; date[5] = str[1]; date[6] = str[2]; date[7] = str[3];
      Or am I not remembering C correctly?

        p

        You are correct the strncpy is superfluous. The string terminator is missing but newSVpv() has been called with an explicit length so it doesn't matter.

        All in all I would prefer jcwren's methodology.

        --
        John (wishes there was a C-monks).

Re: The Ovidian Transform
by jmcnamara (Monsignor) on Jan 02, 2002 at 21:47 UTC

    I played around with various ways of speeding up the Perl function. This was the fastest:
    sub get_date_perl2 { $_[0] =~ m/[|](\d{4})(\d{4})[|]/; return $2.$1; }
    I added this to the benchmarks as Perl2:
    Benchmark: timing 50 iterations of Inline, Ovidian, Perl, Perl2... Inline: 19 wallclock secs (18.72 usr + 0.01 sys = 18.73 CPU) @ 2 +.67/s Ovidian: 17 wallclock secs (17.08 usr + 0.08 sys = 17.16 CPU) @ 2 +.91/s Perl: 25 wallclock secs (24.48 usr + 0.08 sys = 24.56 CPU) @ 2 +.04/s Perl2: 24 wallclock secs (23.36 usr + 0.14 sys = 23.50 CPU) @ 2 +.13/s Rate Perl Perl2 Inline Ovidian Perl 2.04/s -- -4% -24% -30% Perl2 2.13/s 5% -- -20% -27% Inline 2.67/s 31% 25% -- -8% Ovidian 2.91/s 43% 37% 9% --
    However, I wouldn't put this in production code for a 5% saving. I would probably use split and unpack for clarity.

    --
    John.

Re: The Ovidian Transform
by BrowserUk (Pope) on Aug 15, 2002 at 05:22 UTC

    I know its a long time since, but tye's latest super-search threw this node at me earlier today whilst I was looking at sorting.

    Silly comment and all, but if you do (have?) ever make this a module, you should consider changing the name to

    • "The Ovidian Order".

    In tribute to the sneakiest organisation that Garak never belonged to. He would have approved of your tailored approach!


    What's this about a "crooked mitre"? I'm good at woodwork!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://135353]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2014-10-02 02:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (43 votes), past polls