Nowadays AIs tend to use TensorFlow.
Sadly Perl's AI::TensorFlow module can't create new models. It can only load pre-trained models. (Well, it probably can if you know all the deepest magic of Tensorflow, C, MX and Libtensorflow.pm. I don't)
AI::MXNetwas a perfectly good solution until cuda 12 broke the entire build- and tool chains and Apache retired the MXNet project.
So visit this node for a Dockerfile that builds and runs AI::MXNet and get Perl to create your next AI!
Last week i released a simple graphics demo for the Linux terminal (Fun with terminal color).
The low framerate and the mostly static graphics bothered me a bit. So, i , uhm did it yet again. Another demo, this time using Inline::CPP and massaged versions of tinyraytracer and tinyraycaster to provide some actual graphical content. As a matter of fact, Inline::CPP didn't work for my borrowed(*) code, and my understanding of CPP is a 20 years out of date. So i override the Inline::CPP RecDescent module to ignore my bugs. Hey, it's not production code, just a demo...
As in the last demo, your Terminal needs to support full RGB colors and have a size of at least 270x60 in size (characters, not pixels). SDL is sort-of-optional this time; the demo will run without sound if it can't load SDL. And as said above, you'll need to install Inline::CPP as well.
And the YouTube video: https://www.youtube.com/watch?v=MWcuI2SXA-A. OBS and YT compression did munge the quality a bit, though. Probably my fault for not understanding the OBS settings...
use warnings::colored;
warn "warning"; # yellow
system "non-existant-command"; # red
say "test"; # none
eval { die "caught" }; # none
say $@; # none
die "died"; # red
Yesterday i thought i might fix some bugs in my imagecat routine (printing images to the terminal). But somehow i, uhm, improved things a little too far. If you're now thinking "Oh no, he did it again!", i'm afraid, yes, i did.
This time, i made a 3 minute animated demo with sound, running at (some) FPS in a Linux terminal.
You can find a download of the current version as a tarball or you could clone my public mercurial repo if you prefer that. (Edit: Link to mercurial has changed/updated)
The demo needs quite a few CPAN modules, i'm afraid, including the somewhat hard-to-install SDL bindings. (For SDL, you might have to change a checksum in the Alien::SDL Build.PL and ignore some test errors in SDL as well.) I'm using SDL for the sound output.
Also, your Terminal needs to support full RGB colors /see the Imagecat - show color images in a terminal thread) and have a size of at least 270x60 in size (characters, not pixels).
If you want to avoid the hassle of getting it to work on your system, you can watch the YouTube video instead.
Was looking for a decent algorithm for determining the ETA of a long-running process, but everything on CPAN uses simplistic and inaccurate algorithms. Found this great article Benchmarking I/O ETA algorithms and converted the Acceleration algorithm to perl. And yes, it would be better to extract the state components into an object.
use Time::HiRes qw(time);
sub eta {
my ($cur, $total, $time) = @_;
return unless $cur and $time;
state ($last_progress, $last_time, $last_v, $last_eta);
state (@v, @eta, $window_size, $window_idx);
state $init = do {
($last_progress, $last_time, $last_v, $last_eta) = (0, 0, 0, -
+1);
($window_size, $window_idx) = (10, 0);
};
state $sub_v_weight = sub { 1 + $_[0] };
state $sub_eta_weight = sub { $_[0] ? 2 * $_[1] : 1 };
state $sub_weighted_avg = sub {
my ($sub_weight, $avg, $total_weight, $w) = (shift, 0, 0, 0);
for my $i (0 .. $#_) {
# first version messed up the index.
my $j = ($i + @_ - $window_idx - 1) % @_;
$w = $sub_weight->($j, $w);
$avg += $w * $_[$i];
$total_weight += $w;
}
return $avg / $total_weight;
};
my $v = ($cur - $last_progress) / (($time - $last_time) || 1);
$v[$window_idx] = $v;
$v = $sub_weighted_avg->($sub_v_weight, @v);
if ($v and $last_v) {
my ($min_v, $max_v) = $v < $last_v ? ($v, $last_v) : ($last_v,
+ $v);
$v = $last_v + ($v - $last_v) * $min_v / $max_v;
}
my $a = ($v - $last_v) / ($last_time ? ($time - $last_time) : 1);
my $r = $total - $cur;
my $eta = $last_eta;
if ($a and 0 < (my $d = ($v * $v + 2 * $a * $r))) {
$eta = (sqrt($d) - $v) / $a;
}
elsif ($v) { $eta = $r / $v }
$eta[$window_idx] = $eta;
$eta = $sub_weighted_avg->($sub_eta_weight, @eta);
($last_progress, $last_time, $last_v, $last_eta, $window_idx)
= ($cur, $time, $v, $eta, ($window_idx + 1) % $window_size);
return $eta > 0 ? $eta : 0;
}
A program written in a hurry some time ago to munge file paths generally for file systems for Unix(-like) OSen & specifically for FAT32.
Learned the hard way that NTFS would allow file names to be written to FAT32 even if some characters are outside of FAT32 specification. Problematic characters seemed to be en- & em-dash, fancy quotes, pipe, Unicode "?", & possibly few others (web pages saved with title as the file name). Mounting FAT32 file system on FreeBSD with specific codepage(s), or "nowin95" or "shortnames" mount options did not help (mount_msdosfs(8)). Munging it was then🤷🏽♂️
Many emoji have embedded characters which are difficult, or impossible, to see;
for example, zero-width joiners, variation selectors, skin tone modifiers.
In some cases, glyphs are so similar that it's difficult to tell them apart; e.g. 🧑 & 👨.
I wrote uparse to split emoji, strings containing emoji, and in fact any strings with Unicode characters,
into their component characters.
#!/usr/bin/env perl
BEGIN {
if ($] < 5.007003) {
warn "$0 requires Perl v5.7.3 or later.\n";
exit;
}
unless (@ARGV) {
warn "Usage: $0 string [string ...]\n";
exit;
}
}
use 5.007003;
use strict;
use warnings;
use open IO => qw{:encoding(UTF-8) :std};
use constant {
SEP1 => '=' x 60 . "\n",
SEP2 => '-' x 60 . "\n",
FMT => "%s\tU+%-6X %s\n",
NO_PRINT => "\N{REPLACEMENT CHARACTER}",
};
use Encode 'decode';
use Unicode::UCD 'charinfo';
for my $raw_str (@ARGV) {
my $str = decode('UTF-8', $raw_str);
print "\n", SEP1;
print "String: '$str'\n";
print SEP1;
for my $char (split //, $str) {
my $code_point = ord $char;
my $char_info = charinfo($code_point);
if (! defined $char_info) {
$char_info->{name} = "<unknown> Perl $^V supports Unicode
+"
. Unicode::UCD::UnicodeVersion();
}
printf FMT, ($char =~ /^\p{Print}$/ ? $char : NO_PRINT),
$code_point, $char_info->{name};
}
print SEP2;
}
Here's a number of example runs.
All use <pre> blocks;
a very few didn't need this but I chose to go with consistency.
Works with ASCII (aka Unicode: C0 Controls and Basic Latin)
$ uparse X XY "X Z"
============================================================
String: 'X'
============================================================
X U+58 LATIN CAPITAL LETTER X
------------------------------------------------------------
============================================================
String: 'XY'
============================================================
X U+58 LATIN CAPITAL LETTER X
Y U+59 LATIN CAPITAL LETTER Y
------------------------------------------------------------
============================================================
String: 'X Z'
============================================================
X U+58 LATIN CAPITAL LETTER X
� U+9 <control>
Z U+5A LATIN CAPITAL LETTER Z
------------------------------------------------------------
$ uparse 🇨🇭
============================================================
String: '🇨🇭'
============================================================
🇨 U+1F1E8 REGIONAL INDICATOR SYMBOL LETTER C
🇭 U+1F1ED REGIONAL INDICATOR SYMBOL LETTER H
------------------------------------------------------------
Handles codepoints not yet assigned; or not supported with certain Perl versions
$ uparse `perl -C -e 'print "X\x{1fa7c}X"'`
============================================================
String: 'X🩼X'
============================================================
X U+58 LATIN CAPITAL LETTER X
🩼 U+1FA7C CRUTCH
X U+58 LATIN CAPITAL LETTER X
------------------------------------------------------------
$ uparse `perl -C -e 'print "X\x{1fa7c}X"'`
============================================================
String: 'X🩼X'
============================================================
X U+58 LATIN CAPITAL LETTER X
� U+1FA7C <unknown> Perl v5.30.0 supports Unicode 12.1.0
X U+58 LATIN CAPITAL LETTER X
------------------------------------------------------------
$ uparse `perl -C -e 'print "X\x{1fa7d}X"'`
============================================================
String: 'XX'
============================================================
X U+58 LATIN CAPITAL LETTER X
� U+1FA7D <unknown> Perl v5.39.3 supports Unicode 15.0.0
X U+58 LATIN CAPITAL LETTER X
------------------------------------------------------------
This is my somewhat generic framework to process mails in specific folders in Outlook. The concrete use case here is to find and save PDFs that haven't been processed yet.
The script could also move mails or even reply to them, but the intention is to co-exist with human users of this shared mailbox, so the script scans several mail folders for files with an unknown name.
For more information on the object model (and especially the MailItem and Folder class), see the MS Outlook object model.
Many epubs come with unprofessional CSS that will not display correctly on some ebook readers. For instance, the font size may be illegibly small on a mobile device, or the user may have dark mode turned on, but the CSS specifies element foreground colors according to an assumed (but not specified) white background, so there is little or no contrast with the actual black background. I recently wrote a script to detect epubs with those problems, then one to detect and fix them.
My first attempt at this used EPUB::Parser, but I soon found that it didn't (as far as I could tell) have the functionality I needed to get at the internal CSS files and edit them. So I fell back on Archive::Zip (which EPUB::Parser uses) -- an epub is a zip file containing css, html, and xml files (and sometimes jpg's, etc.).
Here, I present two of the trickier functions; inverse_color() is passed a CSS color value of some kind (which can be a wide array of formats), calculates a complementary color, and returns it. It makes use of functions from Graphics::ColorUtils to map CSS color names to rgb values. It is called by fix_css_colors() when it finds a CSS block containing a color: attribute but no background-color: attribute.
sub inverse_color {
my $color = shift;
die "Missing argument to inverse_color()" unless $color;
state $color_names;
if ( not $color_names ) {
#set_default_namespace("www");
$color_names = available_names();
}
$color =~ s/^\s+//;
$color =~ s/\s+$//;
if ( $color =~ /^#[[:xdigit:]]{3}$/ ) {
$color =~ s/#//;
my $n = hex $color;
my $i = 0xFFF - $n;
my $inverse = sprintf "#%03x", $i;
return $inverse;
} elsif ( $color =~ /^#[[:xdigit:]]{6}$/ ) {
$color =~ s/#//;
my $n = hex $color;
my $i = 0xFFFFFF - $n;
my $inverse = sprintf "#%06x", $i;
return $inverse;
} elsif ( $color =~ /rgb \s* \( \s* ([0-9]+) \s* , \s* ([0-9]+) ,
+\s* ([0-9]+) \s* \) /x ) {
my ($r, $g, $b) = ($1, $2, $3);
my $n = $r * 65536 + $g * 256 + $b;
printf "converted %s to %06x\n", $color, $n if $verbose;
my $i = 0xFFFFFF - $n;
my $inverse = sprintf "#%06x", $i;
return $inverse;
} elsif ( $color =~ /rgba \s* \( \s* ([0-9]+) \s* , \s* ([0-9]+) ,
+ \s* ([0-9]+) \s* , \s* ([0-9.]+) \s* \) /x ) {
my ($r, $g, $b, $alpha) = ($1, $2, $3, $4);
my $inverse = sprintf "rgba( %d, %d, %d, %0.2f )", 255 - $r, 255 -
+ $g, 255 - $b, 1 - $alpha;
return $inverse;
} elsif ( $color =~ /hsl \s* \( \s* ([0-9]+) \s* , \s* ([0-9]+)%
+, \s* ([0-9]+)% \s* \) /x ) {
my ( $hue, $saturation, $lightness ) = ($1, $2, $3);
my $hue2 = ($hue + 180) % 360;
my $sat2 = 100 - $saturation;
my $light2 = 100 - $lightness;
my $inverse = sprintf "hsl( %d, %d%%, %d%% )", $hue2, $sat2, $ligh
+t2;
return $inverse;
} elsif ( $color =~ /hsla \s* \( \s* ([0-9]+) \s* , \s* ([0-9]+)%
+ , \s* ([0-9]+)% \s* , \s* ([0-9.]+) \s* \) /x ) {
my ( $hue, $saturation, $lightness, $alpha ) = ($1, $2, $3, $4);
my $hue2 = ($hue + 180) % 360;
my $sat2 = 100 - $saturation;
my $light2 = 100 - $lightness;
my $alpha2 = 1 - $alpha;
my $inverse = sprintf "hsl( %d, %d%%, %d%%, %0.2f )", $hue2, $sat2
+, $light2, $alpha2;
return $inverse;
} elsif ( $color =~ /currentcolor/i ) {
warn "Should have removed currentcolor in fix_css_colors()";
} elsif ( $color =~ /inherit/i ) {
return "inherit";
} elsif ( $color_names->{ "www:". $color} or $color_names->{ $colo
+r} ) {
my $hexcolor = name2rgb( $color );
if ( not $hexcolor ) {
$hexcolor = name2rgb( "www:" . $color );
if ( not $hexcolor ) {
die "Can't resolve color name $color";
}
}
$hexcolor =~ s/#//;
my $i = 0xFFFFFF - hex($hexcolor);
my $inverse = sprintf "#%06x", $i;
return $inverse;
} else {
die "Color format not implemented: $color";
}
}
sub fix_css_colors {
my ($csstext, $css_fn, $epub_fn) = @_;
return if not $csstext;
my $errors = 0;
my $corrections = 0;
my $printed_filename = 0;
say "Checking $epub_fn:$css_fn for bad colors\n" if $verbose;
# this might be a good use of negative lookbehind?
my @css_blocks = split /(})/, $csstext;
for my $block ( @css_blocks ) {
if ( $block =~ m/color: \s* ( [^;]+ ) \s* (?:;|$) /x ) {
my $fgcolor = $1;
print "found color: $fgcolor\n" if $verbose;
if ( $fgcolor =~ m/currentcolor/i ) {
$block =~ s/(color: \s* currentcolor \s* ;? \s* ) \n* //xi;
print "Stripping out $1 as it is a pleonasm\n" if $verbose;
$corrections++;
next;
}
if ( $block !~ m/background-color:/ ) {
my $bgcolor = inverse_color( $fgcolor );
$block =~ s/(color: \s* [^;}]+ \s* (?:;|$) )/background-color:
+ $bgcolor;\n$1/x;
print "corrected block:\n$block\n}\n" if $verbose;
$corrections++;
}
}
}
if ( $corrections ) {
my $new_css_text = join "", @css_blocks;
return $new_css_text;
} else {
return undef;
}
}
A while ago I wrote a podcatcher in Perl. In the last few days I've finally gotten around to cleaning it up a bit, finishing the documentation, and getting it out where people can use it (on my website for now -- maybe I'll try to submit it to CPAN at some point).
The full code (and associated files) can be found at http://jimhenry.conlang.org/software/podcatcher.zip and the documentation (including per-function summaries) at http://jimhenry.conlang.org/software/podcatcher.html
Here, I'll just briefly discuss one of the functions that gave me some trouble, given the variety of podcast RSS feeds out there and how weirdly (sometimes invalidly) formatted some of them are.
This function is passed an RSS feed as a single string and attempts to extract the podcast episode URLs from it. First it tries to parse the RSS using XML::RSS::LibXML. Then, if that worked, it tries to find episodes in <enclosure> tags, then if that fails, it tries looking in <media:content> tags. If it failed to parse the RSS file, or if it parsed and failed to find any podcasts in the appropriate tags, it does a brute force regular expression match on the whole RSS file to find anything that starts with http and ends with one of the file extensions we're looking for (which is configurable).
sub get_mp3_links_from_string {
my $pagecontent = shift;
my @episodes;
my $parser = XML::RSS::LibXML->new;
# for some bizarre reason, putting curly brackets around this eval
+ generates
# syntax errors. use q// instead.
eval q/ $parser->parse($pagecontent) /;
if ( $@ ) {
writelog "Could not parse page as XML/RSS: $@\n";
$parser = undef;
}
if ( $parser ) {
foreach my $item (@{ $parser->{items} }) {
my $ep;
if ( defined $item->{enclosure} ) {
if ( $ep = $item->{enclosure}{url} and $ep =~ m!$extension_reg
+ex$! ) {
push @episodes, { url => $ep };
} elsif ( $ep = $item->{media}{content}{url} and $ep =~ m!$ext
+ension_regex$! ) {
push @episodes, { url => $ep };
}
next if not $ep;
} else {
next;
}
if ( $config{description} ) {
$episodes[ $#episodes ]->{title} = $item->{title};
$episodes[ $#episodes ]->{description} = $item->{description};
}
} # end for each <item>
} # end if we have a valid parse
unless ( @episodes ) {
writelog "Found no $config{extensions} files by parsing XML, check
+ing via regex for any $config{extensions} links in any context\n";
my @mp3s = uniq ( $pagecontent =~ m/(http[^\s>]+$extension_re
+gex)/gi );
return undef unless ( @mp3s );
foreach ( @mp3s ) {
push @episodes, { url => $_ };
}
}
return \@episodes; # @mp3s;
}
The MCE Sandbox repository is where I try writing fast code using Perl MCE + Inline::C, Math::Prime::Util, and the C/C++ libprimesieve library. The demos and examples folders are new for the 2023 update. I learned Codon, a Python-like language that compiles to native code.
.Inline/ Where Inline::C is configured to cache C object file
+s.
bin/
algorithm3.pl Practical sieve based on Algorithm3 from Xuedong Luo
+ [1].
primesieve.pl Calls the primesieve.org C API for generating primes
+.
primeutil.pl Utilizes the Math::Prime::Util module for primes.
demos/
primes1.c Algorithm3 in C with OpenMP directives.
primes2.codon Algorithm3 in Codon, a Python-like language.
primes3.c Using libprimesieve C API in C
primes4.codon Using libprimesieve C API in Codon
examples/ Progressive demonstrations.
practicalsieve.c single big loop
segmentsieve.c segmented variant, faster
rangesieve.c process range; start stop
prangesieve.c parallel rangesieve in C
cpusieve.codon parallel rangesieve in Codon (CPU)
gpusieve.codon parallel rangesieve in Codon (GPU)
pgpusieve.codon using Codon @par(gpu=True) syntax
cudasieve.cu using NVIDIA CUDA Toolkit
lib/
Sandbox.pm Common code for the bin scripts.
CpuAffinity.pm CPU Affinity support on Linux.
src/
algorithm3.c Inline::C code for algorithm3.pl.
bits.h Utility functions for byte array.
output.h Fast printing of primes to a file descriptor.
primesieve.c Inline::C code for primesieve.pl.
sandbox.h Header file, includes bits.h, output.h, sprintull.h.
sprintull.h Fast base10 to string conversion.
typemap Type-map file for Inline::C.
Snippets of code should be wrapped in
<code> tags not<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).