I've finally got around to extending to my long-running Perl vs C++ Performance series
by timing some Roman to Decimal Rosetta PGA-TRAM code on Ubuntu.
Generating the Test Data
You'll need to install the Roman module from CPAN
(or simply copy Roman.pm locally)
to generate the test data by running:
# gen-roman.pl
use strict;
use warnings;
use Roman;
for my $n (1..1000) {
for my $i (1..3999) {
my $r = int(rand(2)) ? uc(roman($i)) : lc(roman($i));
print "$r\n";
}
}
with:
perl gen-roman.pl >t1.txt
which will generate a test file t1.txt containing 3,999,000 Roman Numerals.
Running the Benchmark
With that done, you can run rtoa-pgatram.pl below (derived from Rosetta PGA-TRAM) with:
$ perl rtoa-pgatram.pl t1.txt >pgatram.tmp
which produced on my laptop:
rtoa pgatram start
read_input_files : 1 secs
roman_to_arabic : 7 secs
output : 0 secs
total : 8 secs
rtoa-pgatram.pl
# rtoa-pgatram.pl
# Example run: perl rtoa-pgatram.pl t1.txt >pgatram.tmp
#
# Convert a "modern" Roman Numeral to its arabic (decimal) equivalent.
# The alpabetic input string may be assumed to always contain a valid
+Roman Numeral in the range 1-3999.
# Roman numerals may be upper or lower case.
# Error handling is not required.
# For example:
# input "XLII" should produce the arabic (decimal) value 42
# input "mi" should produce the arabic (decimal) value 1001
use 5.010; # Needed for state
use strict;
use warnings;
use List::Util qw(reduce);
sub read_input_files {
my $files = shift; # in: reference to a list of files containin
+g Roman Numerals (one per line)
my @list_ret; # out: reference to a list of the Roman Numer
+als in the files
for my $fname ( @{$files} ) {
open( my $fh, '<', $fname ) or die "error: open '$fname': $!";
while (<$fh>) {
chomp;
push @list_ret, uc($_);
}
close($fh) or die "error: close '$fname': $!";
}
return \@list_ret;
}
# Function roman_to_arabic
# Input: reference to a list of valid Roman Numerals in the range 1..
+3999
# Output: reference to a list of their arabic (decimal) values
sub roman_to_arabic {
my $list_in = shift; # in: reference to a list of valid Roman Nu
+merals
my @list_ret; # out: a list of their integer values
state %rtoa = ( M=>1000, D=>500, C=>100, L=>50, X=>10, V=>5, I=>1 )
+;
for (@{$list_in}) {
push @list_ret, reduce { $a+$b-$a%$b*2 } map { $rtoa{$_} } split
+//, uc($_);
}
return \@list_ret;
}
@ARGV or die "usage: $0 file...\n";
my @rtoa_files = @ARGV;
warn "rtoa pgatram start\n";
my $tstart1 = time;
my $aref1 = read_input_files( \@rtoa_files );
my $tend1 = time;
my $taken1 = $tend1 - $tstart1;
warn "read_input_files : $taken1 secs\n";
my $tstart2 = time;
my $aref2 = roman_to_arabic($aref1);
my $tend2 = time;
my $taken2 = $tend2 - $tstart2;
warn "roman_to_arabic : $taken2 secs\n";
my $tstart3 = time;
for my $n ( @{$aref2} ) { print "$n\n" }
my $tend3 = time;
my $taken3 = $tend3 - $tstart3;
my $taken = $taken1 + $taken2 + $taken3;
warn "output : $taken3 secs\n";
warn "total : $taken secs\n";
I was relieved that this ran a little faster than rtoa-roman.pl, which is just a copy of rtoa-pgatram.pl above
that uses Roman's arabic function instead of rtoa-pgatram.pl's pgatram algorithm; that is with:
push @list_ret, reduce { $a+$b-$a%$b*2 } map { $rtoa{$_} } split//,
+ uc($_);
above replaced with:
use Roman;
...
push @list_ret, arabic($_);
$ perl rtoa-roman.pl t1.txt >roman.tmp
rtoa roman start
read_input_files : 1 secs
roman_to_arabic : 11 secs
output : 1 secs
total : 13 secs
$ diff roman.tmp pgatram.tmp
Please feel free to reply with alternative Perl roman_to_arabic subroutines,
especially if they are faster.
Roman to Arabic subroutines in other languages are also welcome.
C++ Version
// rtoa-pgatram.cpp
// Compile with:
// g++ -o rtoa-pgatram -std=c++20 -Wall -O3 rtoa-pgatram.cpp
// or:
// clang++ -o rtoa-pgatram -std=c++20 -Wall -O3 rtoa-pgatram.cpp
// or:
// g++ -o rtoa-pgatram -std=c++20 -Wall -O3 -I "$HOME/local-fast_io/
+fast_io/include" rtoa-pgatram.cpp
// to use the locally installed fast_io header-only library
#include <cctype>
#include <cstring>
#include <iostream>
#include <string>
#include <vector>
#include <numeric>
#include <chrono>
#include <iomanip>
// See [id://11149504] for more info on the fast_io C++ library
// #include <fast_io.h>
// -------------------------------------------------------------------
+---------
typedef std::chrono::high_resolution_clock high_resolution_clock;
typedef std::chrono::high_resolution_clock::time_point time_point;
typedef std::chrono::milliseconds milliseconds;
double elaspe_time(
time_point cend,
time_point cstart)
{
return double (
std::chrono::duration_cast<milliseconds>(cend - cstart).count()
) * 1e-3;
}
// -------------------------------------------------------------------
+---------
using vec_str_type = std::vector<std::string>;
using vec_int_type = std::vector<int>;
#define MAX_LINE_LEN_L 63
// Read an input file of Roman Numerals and append them to a list
// Return the number of Roman Numerals appended
static int read_input_file(
const char* fname, // in: file name containing a list of
+Roman Numerals
vec_str_type& vec_ret) // out: a vector of Roman Numeral strin
+gs
{
FILE* fh;
char line[MAX_LINE_LEN_L + 1];
int cnt = 0;
fh = ::fopen(fname, "r");
if ( fh == NULL ) {
std::cerr << "Error opening '" << fname << "' : " << strerror(er
+rno) << "\n";
return 0;
}
while ( ::fgets( line, MAX_LINE_LEN_L, fh ) != NULL ) {
size_t len = ::strlen(line);
if (len > 0 && line[len-1] == '\n') line[--len] = '\0'; // remov
+e newline
vec_ret.emplace_back(line);
++cnt;
}
::fclose(fh);
return cnt;
}
// ---------------------------------------------------------------
// Though there are less than 256 initializers in this ascii table,
// the others are guaranteed by ANSI C to be initialized to zero.
static const int romtab[256] = {
0,0,0,0,0,0, 0, 0, 0, 0, // 00- 09
0,0,0,0,0,0, 0, 0, 0, 0, // 10- 19
0,0,0,0,0,0, 0, 0, 0, 0, // 20- 29
0,0,0,0,0,0, 0, 0, 0, 0, // 30- 39
0,0,0,0,0,0, 0, 0, 0, 0, // 40- 49
0,0,0,0,0,0, 0, 0, 0, 0, // 50- 59
0,0,0,0,0,0, 0, 100, 500, 0, // 60- 69
0,0,0,1,0,0, 50,1000, 0, 0, // 70- 79
0,0,0,0,0,0, 5, 0, 10, 0, // 80- 89
0,0,0,0,0,0, 0, 0, 0, 100, // 90- 99
500,0,0,0,0,1, 0, 0, 50,1000, // 100-109
0,0,0,0,0,0, 0, 0, 5, 0, // 110-119
10,0,0,0,0,0, 0, 0, 0, 0 // 120-129
};
// Return the arabic number for a roman letter c.
// Return zero if the roman letter c is invalid.
inline int urtoa(int c) { return romtab[c]; }
inline int accfn(int t, char c) {
return t + urtoa(c) - t % urtoa(c) * 2;
}
inline int roman_to_dec(const std::string& s) {
return std::accumulate( s.begin(), s.end(), 0, accfn );
}
int main(int argc, char* argv[])
{
if (argc < 2) {
std::cerr << "usage: rtoa-pgatram file...\n";
return 1;
}
// Get the list of input files from the command line
int nfiles = argc - 1;
char** fname = &argv[1];
std::cerr << std::setprecision(3) << std::setiosflags(std::ios::fix
+ed);
time_point cstart1, cend1, cstart2, cend2, cstart3, cend3;
// Read the input files into roman_list
vec_str_type roman_list;
cstart1 = high_resolution_clock::now();
int cnt = 0;
for (int i = 0; i < nfiles; ++i) {
cnt += read_input_file( fname[i], roman_list );
}
cend1 = high_resolution_clock::now();
double ctaken1 = elaspe_time(cend1, cstart1);
std::cerr << "read_input_files : " << cnt << " items\n";
std::cerr << "read file time : " << std::setw(8) << ctaken1 << "
+ secs\n";
// Convert roman to decimal
vec_int_type arabic_list;
cstart2 = high_resolution_clock::now();
for ( auto const& r : roman_list ) {
arabic_list.emplace_back( roman_to_dec(r) );
}
cend2 = high_resolution_clock::now();
double ctaken2 = elaspe_time(cend2, cstart2);
std::cerr << "roman_to_dec time : " << std::setw(8) << ctaken2 << "
+ secs\n";
// Output to stdout
cstart3 = high_resolution_clock::now();
for ( auto const& i : arabic_list ) {
std::cout << i << '\n';
// fast_io::io::println(i);
}
cend3 = high_resolution_clock::now();
double ctaken3 = elaspe_time(cend3, cstart3);
std::cerr << "output time : " << std::setw(8) << ctaken3 << "
+ secs\n";
return 0;
}
$ g++ -o rtoa-pgatram -std=c++20 -Wall -O3 rtoa-pgatram.cpp
$ ./rtoa-pgatram t1.txt >cpp.tmp
read_input_file : 3999000 items
read file time : 0.264 secs
roman_to_dec time : 0.095 secs
output time : 0.247 secs
$ diff roman.tmp cpp.tmp
Update: when built with the fast_io C++ library, the output time is reduced:
output time : 0.098 secs
For details on building C++ with the header-only fast_io library, search for fast_io in this node.
References
See Also
- Math::Roman - arbitrary sized Roman numbers and conversion from and to Arabic
Updated rtoa-pgatram.cpp to allow a list of files in argc/argv[] rather than just one (thanks marioroy).
Also made minor improvements to rtoa-pgatram.cpp (including how to build it with the fast_io library).
Removed rtoa-roman.pl since it is just rtoa-pgatram.pl with its roman_to_arabic function replaced by Roman's arabic function.
Re: Risque Romantic Rosetta Roman Race
by tybalt89 (Monsignor) on May 10, 2023 at 22:39 UTC
|
Maybe hash operations are slow, so let's do away with them :)
#!/usr/bin/perl
use strict; # https://perlmonks.org/?node_id=11152065
use warnings;
{
my @r2d;
@r2d[qw(1 12 2 13 3 34 4 35 5 56 6 57 7)] =
qw(1 4 5 9 10 40 50 90 100 400 500 900 1000);
use List::Util qw( sum );
sub roman_to_dec3
{
sum @r2d[ shift =~ tr/ivxlcdmIVXLCDM/1-71-7/r =~ /5?[67]|3?[45]|1?[2
+3]|1/g ]
}
}
my @testdata = ( qw( XLII LXIX mi cd xcix mdclxvi cdxliv
i ii iii iv v vi vii viii ix x xi xii xiii xiv xv) );
for my $r (@testdata) {
print "$r: ", roman_to_dec3($r), "\n";
}
| [reply] [d/l] |
|
Now that, ladies and gentlemen, is a proper hack. Bravo!
I am so glad you are here, coming up with approaches like this. They inspire me to think more laterally and I'm sure that even pondering this problem for a week I would not have come up with such a solution.
| [reply] |
Re: Risque Romantic Rosetta Roman Race - MCE Hash Reduce
by marioroy (Prior) on May 10, 2023 at 10:24 UTC
|
I combined reading, processing, and output into a small chunking routine utilizing MCE. The parallel Perl code may run faster than C, depending on the number of logical CPU cores. This includes tybalt89's optimization.
I updated rtoa-pgatram.pl to include the same optimization. What I find interesting is the time difference compared to real time for the C demonstration. The delta time is likely global cleanup. Parallelization involved running on physical and logical CPU cores via the taskset utility. Hence, the reason not seeing 8x comparing one thread and eight threads. For reference, running on eight physical cores completed in 0.814 seconds (7.2x).
$ time ./rtoa-pgatram t1.txt >cpp.tmp
read_input_file : 3999000 items
read file time : 0.136 secs
roman_to_dec time : 0.116 secs
output time : 0.187 secs
real 0m0.450s
user 0m0.416s
sys 0m0.034s
$ time perl rtoa-pgatram.pl t1.txt >pgatram.tmp
rtoa pgatram start
read_input_files : 1 secs
roman_to_arabic : 4 secs
output : 1 secs
total : 6 secs
real 0m6.259s
user 0m6.131s
sys 0m0.128s
$ time perl rtoa-pgatram-mce.pl t1.txt >mce.tmp
rtoa pgatram start
time 1 thread : 5.808 secs
time 8 threads : 1.151 secs ( 5.05x)
time 16 threads : 0.643 secs ( 9.03x)
time 32 threads : 0.347 secs (16.74x)
time 64 threads : 0.225 secs (25.81x)
1 thd 8 thds 16 thds 32 thds 64 thds
real 0m5.835s 0m1.178s 0m0.671s 0m0.375s 0m0.252s
user 0m5.832s 0m9.064s 0m8.874s 0m8.935s 0m9.698s
sys 0m0.008s 0m0.024s 0m0.030s 0m0.073s 0m0.136s
Thank you, for the interesting series. This allowed me to practice using MCE.
$ cksum cpp.tmp pgatram.tmp mce.tmp
1397892467 18888000 cpp.tmp
1397892467 18888000 pgatram.tmp
1397892467 18888000 mce.tmp
| [reply] [d/l] [select] |
|
marioroy, I remain in awe of your MCE masterwork, the most impressive contribution to CPAN in the past ten years IMHO.
> I updated rtoa-pgatram.pl to include the same optimization
Is this the tybalt89 optimization? ... or is there another optimization I missed?
It takes 32 logical cores for your Perl/MCE version to catch up to my C++ version 1.0. Is that right?
If so, you might need to work a little harder, because I just updated my C++ version in the root node
with an option to apply a one-line change utilizing the quirky C++ fast_io library (which I think you know about ;-) ...
no surprise the output time dropped from 0.247 secs to 0.098 secs.
After the long running Rosetta Code: Long List is Long saga (which I think you know about ;-),
there are many more C++ tricks yet to be tried in this node,
such as OpenMP, Boost,
abseil, Judy Arrays ...
though I honestly don't feel inclined to try for a repeat saga. :)
| [reply] |
|
Is this the tybalt89 optimization? ... or is there another optimization I missed?
Yes.
It takes 32 logical cores for your Perl/MCE version to catch up to my C++ version 1.0. Is that right?
That was done using 16 physical and 16 logical CPU cores via taskset -c 0-15,32-47. BTW, I captured the UNIX time to include any global cleanup. It now takes the entire CPU (64 logical threads) for Perl MCE 1.0 to run faster. :) The Perl time includes launching Perl, loading modules, spawning and reaping workers (~ 0.06 secs).
# captured UNIX time
C++ 1.0 : 0.450s
C++ fast_io : 0.291s
Perl MCE 64 thds : 0.252s
I tried also, an ARRAY for indexed-based lookups. But, that runs slower. Edit: Tried unpack, tip by tybalt89. ARRAY lookup is now faster.
# HASH
my %rtoa = ( M=>1000, D=>500, C=>100, L=>50, X=>10, V=>5, I=>1 );
# ARRAY, characters M D C L X V I
my @rtoa; @rtoa[qw( 77 68 67 76 88 86 73 )] = qw( 1000 500 100 50 10 5
+ 1 );
Perl MCE 64 thds : 0.252s @rtoa{ split //, uc($_) };
Perl MCE 64 thds : 0.297s @rtoa[ map ord, split //, uc($_) ];
Perl MCE 64 thds : 0.192s @rtoa[ unpack 'c*', uc($_) ];
| [reply] [d/l] [select] |
|
|
|
|
|
|
Re: Risque Romantic Rosetta Roman Race
by choroba (Cardinal) on May 10, 2023 at 14:59 UTC
|
my %from_roman = (
I => 1,
V => 5,
X => 10,
L => 50,
C => 100,
D => 500,
M => 1000,
);
sub from_roman (_) {
my ($roman) = @_;
my $n = 0;
while ($roman =~ s/(I[VXLCDM]|X[LCDM]|C[DM])//) {
my ($minus, $plus) = split //, $1;
$n += $from_roman{$plus} - $from_roman{$minus};
}
$n += $from_roman{$_} for split //, $roman;
$n
}
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] [select] |
Re: Risque Romantic Rosetta Roman Race
by tybalt89 (Monsignor) on May 10, 2023 at 08:44 UTC
|
Since you already have timing code set up, could you please replace
map { $rtoa{$_} } split//, uc(shift)
with
@rtoa{ split//, uc(shift) }
and see if it is any faster?
| [reply] [d/l] [select] |
|
@rtoa{ split//, uc(shift) }
to:
@rtoa{ split//, uc($_) }
and it ran significantly faster:
$ perl rtoa-pgatram-tybalt.pl t1.txt >tybalt.tmp
rtoa pgatram start
read_input_files : 1 secs
roman_to_arabic : 4 secs
output : 1 secs
total : 6 secs
$ diff tybalt.tmp pgatram.tmp
| [reply] [d/l] [select] |
|
{
my %rtoa = ( M=>1000, D=>500, C=>100, L=>50, X=>10, V=>5, I=>1 );
sub roman_to_dec {
reduce { $a+$b-$a%$b*2 } @rtoa{ split//, uc(shift) }
}
}
| [reply] [d/l] |
|
@rtoa{ split//, uc } # save uc from having to process an argument by
+letting it just default to $_
| [reply] [d/l] |
|
|
|
| [reply] |
Re: Risque Romantic Rosetta Roman Race
by tybalt89 (Monsignor) on May 10, 2023 at 17:01 UTC
|
#!/usr/bin/perl
use strict; # https://perlmonks.org/?node_id=11152065
use warnings;
{
my %r2d = qw(m 1000 cm 900 d 500 cd 400 c 100 xc 90 l 50 xl 40 x 10 ix
+ 9 v 5 iv 4 i 1);
use List::Util qw( sum );
sub roman_to_dec2
{
sum @r2d{ lc(shift) =~ /c?[md]|x?[cl]|i?[xv]|i/g }
}
}
my @testdata = ( "XLII", "LXIX", "mi" );
for my $r (@testdata) {
print "$r: ", roman_to_dec2($r), "\n";
}
Speed unknown, however the cuteness IS known.
EDIT: Added missing 'cd 400' to hash setup.
| [reply] [d/l] |
Re: Risque Romantic Rosetta Roman Race
by tybalt89 (Monsignor) on May 13, 2023 at 16:42 UTC
|
Grumble... grumble... I hate doing speed optimization...
Golf is at least more logical, with speed opts strange things make a difference.
{
my @rtoa;
@rtoa[unpack 'c*', 'MDCLXVImdclxvi'] = (qw(1000 500 100 50 10 5 1))
+ x 2;
sub roman_to_dec {
my $n = 0;
$n += $_ - $n % $_ * 2 for @rtoa[ unpack 'c*', shift ];
$n;
}
}
Faster than the reduce version, and faster than the uc(shift) version...
| [reply] [d/l] |
|
So fast, tybalt89! Faster than Hash Reduce, Hash Regex, and Array Reduce. I added Array ForLoop to the results.
| [reply] [d/l] |
Re: Risque Romantic Rosetta Roman Race
by tybalt89 (Monsignor) on May 12, 2023 at 16:27 UTC
|
{
my %r2d;
@r2d{qw(
i ii iii iv v vi vii viii ix
x xx xxx xl l lx lxx lxxx xc
c cc ccc cd d dc dcc dccc cm
m mm mmm
)} = map { my $n = $_; map $n * $_, 1 .. 9 } 1, 10, 100, 1000;
use List::Util qw( sum );
sub roman_to_dec3
{
sum @r2d{ (lc shift) =~ /
ix|iv|iii|ii|i|
viii|vii|vi|v|
xxx|xx|xl|xc|x|
lxxx|lxx|lx|l|
cm|cd|ccc|cc|c|
dccc|dcc|dc|d|
mmm|mm|m|
./gx }
}
}
Tests faster than your 'reduce' version on my machine...
| [reply] [d/l] |
|
# max_workers => 1
time reduce : 4.581 secs
time regex : 4.658 secs
| [reply] [d/l] [select] |
|
| [reply] |
|
|
|