PAR
2 direct replies — Read more / Contribute
|
by rinceWind
on Jul 06, 2004 at 12:15
|
|
PAR has saved my day! It will go a long way towards helping me in my crusade of Perl advocacy in the present $CLIENT (who shall remain nameless) environment.2
The situation is that we have three Unix (Solaris) environments: dev, test and live. Although Solaris comes with a version of perl pre-installed, the admins have restricted access to it on all but the dev machine (under some historical edicts from senior management).
Now, as part of my activities in development, I have been producing a nice set of perl tools (as is my want), to get my job done. Many of these would benefit production, especially for production support, which is closer to what my current role is.
PAR for the course
+----+
foo.pl===>| pp |====>foo
+----+
Just like cc, pp turns your Perl file (and all called modules) into an executable. It's not actually compiling them, just packaging them up (and compressing).
My colleagues feel a lot happier about the (potentially) tamper proof nature of executables, and I can deploy my perl 'scripts' into test and live environments. Funnily enough, they haven't had a problem with deploying .ksh or .awk scripts.
Caveat removed (thanks are due to PodMaster for helping me sort this one out.
Considered: MUBA move to meditations: not actually a review: it's more like giving comliments but that's it
Unconsidered: ysth - Keep/Edit/Delete = 6/9/0 - moving from reviews isn't supported
Update:
I do feel that bibo and MUBA have a point, and this node should be more of a review (see also this node). Here goes...
Peeping under the hood
Although PAR delivers applications that work out of the box1, it is worth examining what the output of pp actually is.
The resulting executable is a self extracting zip file, and tools such as winzip or unzip -l can reveal the contents of the file. Besides the various perl modules called, you will find script/foo.pl (this being your source script) and script/main.pl. Note that you don't see Perl itself or any of the magic glue that makes your application run.
script/main.pl is a small bootstrap script that calls your script, and it looks like this:
my $zip = $PAR::LibCache{$ENV{PAR_PROGNAME}} || Archive::Zip->new(__FI
+LE__);
my $member = eval { $zip->memberNamed('script/foo.pl') }
or die qq(Can't open perl script "script/foo.pl": No such file
+ or directory ($zip));
PAR::_run_member($member, 1);
The first time a PAR application is run (or a new version of the application), it is unzipped into temporary directories: $TEMP/par_$USER/cache_nnnnnnnnnnn
the nnnnnnnnnnn here is an MD5 checksum, hence a new version of the PAR application will generate a new cache directory.
The first time the application is run, there will be a significant startup time, as the zip kit is unpacked. It is worthwhile explaining this to users.
Run time requires
Unfortunately, PAR (and Module::ScanDeps) will not detect all modules, especially those whose name is determined at run time. If you are missing a module or two, you need to include the modules in the build - either by adding explicit use or require statements in the script (useful for Tk widget modules), or by specifying -M to pp.
Parlez vous?
There are circumstances where you don't want to bundle everything into a single executable. You may have several scripts (CGI scripts for example) calling the same bunch of modules, and you want to distribute this.
The PAR install delivers another utility called parl (PAR loader), which takes mostly the same parameters as perl. This provides the ability to run a perl script and call in one or more PAR libraries (built with pp, but called *.par).
Again, a Perl install is not needed on the target machine. You need to deliver parl, the scripts and the PAR libraries. On the Windows platform, you could package this up with Install Shield.
tkpp
For those averse to command lines, there is a script provided called tkpp, which provides a very thin GUI Tk wrapper around pp. Far from it being a responsive GUI, it freezes while building the application. I find no benefit in using this over the command line, other than saving having to remember the command line options.
Notes:
1 I like things that work straight out of the box.
I am a great fan of Knoppix - a Linux distro that works straight off the CD.
2 This was the subject of a lightning talk I gave at YAPC::Europe 2004, the slides of which are here
|
Archive::Tar
3 direct replies — Read more / Contribute
|
by graff
on Jun 01, 2004 at 00:06
|
|
Requires IO::Zlib in order to handle compressed tar files.
For just about any operating system you'll ever use, you should be able
to find some command-line utility or GUI application that knows how to
read unix tar files (even those that have been compressed), list the
names and sizes of directories and files that they contain, and extract
their contents. Many of these utilities also know how to create tar
files from a chosen directory tree or list of files. So why does anyone
need a Perl module that can do this?
Well what if you're on a system where the available tools don't support
the creation of a tar file? In this case, Archive::Tar will allow you
to create such a tool yourself, so when some remote colleague says "just
send me a tar file with the stuff you're working on", you can do that --
rather than embarrass yourself by replying with something like "I only
have pkzip / WinZIP / Stuff-it / Toys-R-Us Archiver / ... Will that
format be okay?"
(To be honest, that's not really much of a problem these days -- most
archiving file formats have supporting utilities ported to most OS's,
and the definitive command-line "tar" utility is available and fully
functional for every version of Microsoft OS and MacOSX, as well as
being intrinsic to every type of unix system. There must be one for
Amiga as well...)
But there is one feature of the definitive "tar" tool that can be a bit
limiting at times: whether creating or extracting tar file contents, it
is completely, inescapably faithful to its source. In creation,
whatever is on disk goes directly into the tar set, sticking to the path
and file names provided; on extraction, whatever is in the tar set goes
right into directories and files on disk, again, sticking to the names
provided. There are ways to (de)select particular paths/files, but
that's about all the flexibility you get. Usually, this is exactly what
everyone wants, but sometimes you might just wish you could do something
a little different when you create or extract from a tar file, like:
- - rename files and/or directories
- - simplify an overly complicated directory layout
- - sort files into a more elaborate directory layout
- - modify file content, or skip certain files
- - do any (combination) of the above based on any available
information, including path/name, date, size, ownership, permissions
and/or content, or even some other source, like a database or log file.
There's also a situation not envisioned when "tar" was first conceived
decades ago, but common today: you may want to accumulate a set of
resources from the web and save them all in one tar file, without ever
bothering to write each one to a separate local disk file first -- tar
is just a really handy format for this sort of thing.
Unfortunately, as of this writing (Archive::Tar v1.08), the flexibility
you get with this module is limited by one major design issue: the
entire content of a tar set (all the uncompressed data files contained
in the tar file) must reside in memory. Presumably, this approach was
chosen so that both compressed and uncompressed tar files would be
handled in the same way.
If people only dealt with uncompressed tar files, then the module could
be designed to scan a tar image and get the path names, sizes, other
metadata, and the byte offsets to each individual data file in the set,
so that only the indexing info would be memory resident. But you can't
do that very well when the tar file is compressed, and tar files tend to
be compressed more often than not. And since there is no inherent upper
bound on the size of tar files -- and tar is often used to package really
big data sets -- users of Archive::Tar need to be careful not to use this
module on the really big ones. (This can be awkward when a compressed
tar file contains stuff that "compresses really well", like XML data
with overly-verbose tagging -- I've seen compression ratios of 10:1 on
such data.)
When you install Archive::Tar, you also get Archive::Tar::File, which is
the object used to contain each individual data file in a tar set. When
you do:
my $tar_obj = Archive::Tar->new;
$tar_obj->read( "my.tar")
# or just: my $tar_obj = Archive::Tar->new( "my.tar" );
this creates an array of Archive::Tar::File objects. Called in a list
context, "read" returns the list of these objects (and as you would
expect, when called in a scalar context, it returns the number of File
objects). When you are creating a tar set, each call to the "add_data"
or "add_files" method will append to the list of File objects currently
in memory. When you call the "write" method, all the File objects
currently in the set are written as a tar stream to a file name that you
provide (or the stream is returned to the caller as a scalar, if you do
not provide an output file name).
There are also three class methods that provide just the basic operations
of listing and extracting tar-set contents, and creating a tar set from
a list of data file names. These methods avoid the memory load, because
they don't bother holding the data files in memory as objects. This
also means that you give up a lot of detailed control on individual data
files.
The Archive::Tar::File objects provide a lot of handy features,
including accessors for file name, mode, linkname, user-name/uid,
group-name/gid, size, mtime, cksum, type, etc. You can rename a file or
replace its data content, check for empty files using "has_content", and
choose between getting a copy of the file content or a reference to the
content ("get_content" or "get_content_by_ref").
Getting back to the use of the Archive::Tar object itself, I did come
across one potential trap when trying to do a "controlled" extraction of
data from an input tar file. Most of the object methods related to
reading/extracting are presented in terms of using one or more
individual file names as the means for specifying which data file to
handle -- in fact, after the "new" and "read" methods, the next three methods
described are "contains_file( $filename )", "extract( [@filenames]
)" and "list_files()". Most of the remaining methods also need or allow
file names as args, which leads one to assume that the "easiest" way to
use the object is to do everything in terms of file names.
But the problem is that each time you specify a file name for one of
these object methods, the Archive::Tar module needs to search through
its list of Archive::Tar::File objects to find the file with that name.
This gets painfully slow when you're dealing with a set of many files,
and doing something with each of them.
Obviously, there are bound to be situations where you will need to
do things by specifying particular data file names in a tar set, but
more often, you'll want to work directly with the File objects. A
couple of examples will suffice to show the difference:
use Archive::Tar;
# here's the slow approach to mucking with files in a tar set:
my $tar = Archive::Tar->new( "some.tar" );
my @filenames = $tar->list_files;
for my $filename ( @filenames ) {
my $filedata = $tar->get_content( $filename );
# do other stuff...
}
The problem there is that each call to "tar->get_content( $filename )"
invokes a linear search through the set of data files, in order to
locate the File object having that name. The following approach goes
much faster, because it just iterates through the list of File objects
directly:
use Archive::Tar;
my $tar = Archive::Tar->new( "some.tar" );
my @files = $tar->get_files; # returns list of Archive::Tar::File obj
+ects
for my $file ( @files ) {
my $data = $file->get_content; # same as above, but no searching
+involved
# do other stuff...
}
And of course, given the list of File objects, you have much better
control over the selection and handling of files -- here are a few
examples:
my %cksums;
for my $file ( grep { $_->has_content } @files ) # only do non-empty f
+iles
{
next if $file->uname eq 'root'; # let's leave these alone
if ( $file->name =~ /\.[ch]$/ ) {
# do things with source code files here...
}
elsif ( exists( $cksums{ $file->cksum } )) {
# file's content duplicates another file we've already seen...
}
else {
$cksum{$file->cksum} = undef; # keep track of cksums
}
}
To conclude: on the whole, if fine-grained control of tar file i/o is
something that would be helpful to you, and if you can limit yourself to
dealing with tar files that fit in available memory, then you really
should be using this module. It's good.
(updated to fix some typos and simple mistakes.)
|
Params::Validate
1 direct reply — Read more / Contribute
|
by rinceWind
on May 16, 2004 at 18:08
|
|
Whereas C++ and Java go to great lengths to ensure that the data types of any arguments passed to a function, match what is expected, perl is often criticised for not enforcing anything.
Perl 6 will address this issue, but in the mean time, any subroutines you write are passed an array @_ containing a potential hotch-potch of scalars and references, blessed or otherwise.
Although it's nice to write code that can DWIM given a parameter of varying data types, this is always an extra effort to implement. Also, what will your sub do when passed a SCALAR when it's expecting an ARRAYREF? At best, it will die with a run-time error, and a message which likely as not, would not be immediately obvious.
Enter Params::Validate
By default, this module exports the functions validate and validate_pos. Params::Validate caters for two calling conventions: named parameters and positional parameters, which are validated by validate and validate_pos respectively. Unfortunately you can't mix them; any positional parameters before your list of key/value pairs need to be removed first. In fact, this is the way to use Params::Validate to handle method calls.
Examples:
sub mymethod {
my $self = shift;
my %args = validate( @_, {
foo => 1,
bar => { default => 99}
} );
...
}
sub mymethod2 {
my $self = shift;
my ($meenie,$minie,$mo) = validate_pos( @_,
{ type => SCALAR },
{ type => SCALAR | UNDEF },
{ type => ARRAYREF, default => [] }
);
...
}
As is apparent, the call to validate or validate_pos is quite straightforward and edifying to someone else looking at the code.
It's also quite easy to add such validation to existing code, for an immediate gain in robustness without too much cognitive effort. The module provides a whole host of tools for validating your argument list - I have just scratched the surface.
Error handling
When the parameter validation fails, the default action it to croak, with quite a helpful message about which parameter is invalid and why. You can elect to use a callback to catch validation errors instead.
Conclusion
I am a convert to using this module. I recommend it for CPAN modules and for corporate coding standards.
Update
I have had some interesting dependency issues with modules of mine that are using Params::Validate. I have seen CPAN pull in Ponie via Attribute::Handlers. Why would I want the Parrot/Ponie stuff coming into my production environment ?! I asked Arthur Bergman about this, and apparently it is a spurious dependency picked up by CPAN.pm, apparently sorted in a later release of Attribute::Handlers.
I did notice also that ActiveState's module status page was showing any of my modules that use Params::Validate, as having a dependency on Ponie.
|
DBD::Anydata
4 direct replies — Read more / Contribute
|
by idsfa
on May 14, 2004 at 14:04
|
|
Module Author: Jeff Zucker <jeff@vpservices.com>
Documentation
Abstract
Excellent tool for developing programs with limited "database" needs, prototyping full-on RDBMS applications and pulling in common data interchange formats. If you don't need /want the SQL baggage, try AnyData instead.
Pre-Requisites
Overview
DBD::AnyData is a DBI/SQL wrapper around AnyData which
allows the author to use many SQL constructs on traditionally non-SQL data
sources. Descendant from DBD::RAM, DBD::AnyData also implements
that module's ability to load data from multiple formats and treat them as
if they were SQL tables. This table can be held entirely in memory or tied
to the underlying data file. Tables can also be exported in any format which
the module supports.
Review
The variety and number of file formats in use
is staggeringly large and continues to grow. Perl hackers are often faced
with the job of being syntactic glue between applications, translating output
from one program into the necessary input for another. Abstracting the exact
format of these data allows the programmer to rise above mere hacking and
actually craft something (re)usable. Separating the logic from the
presentation improves the clarity of both.
DBD::AnyData attempts to provide this abstraction by presenting a DBI/SQL
interface. It layers over the required/companion AnyData module, which
presents a tied hash interface. The perl purist will most likely prefer
to stick with AnyData, minus the DBD. The extra layer of abstraction will
be most useful if you are more comfortable with SQL or your application
design requires it. To my mind, the niftiest use of this module is the
ability to prototype your code as if you had a whole relational database,
but have the ease of a few simple CSVs actually holding the data.
The list of supported formats is impressive, and continues to expand. CPAN
currently lists:
- perl data structures and __DATA__ segments
- Delimited text (Comma/Pipe/Tab/Colon/whatever separated)
- Fixed length records
- HTML Tables
- INI Files
- passwd Files
- MP3 Files (specifically, their ID3 tags)
- Paragraph Files
- Web Server Logs
- XML Files
- DBI Connections (to leverage existing modules)
With more
on the way.
DBD::AnyData has three basic modes of operation: file access, in-memory
access and format conversion. These modes are implemented as five extension
methods over a standard DBD.
In file access mode, the data file is read on each request and written
on each change. The entire file is never read into memory (unless requested)
and so this method is suitable for large data files. Be aware that these
are not atomic commits, so your database could end up in an
inconsistent state. This mode is not supported for remote files or
certain formats (DBI, XML, HTMLtable, MP3 or perl ARRAYs).
In-Memory mode loads the entire data source into memory. Obviously a
problem for huge data sets, but then you probably have those in a relational
database already. This method is ideal for querying a remote data source,
handled in the background by good old LWP.
Conversion mode takes data from an input (which can be local or remote,
and in any supported format) and writes it to a local file, perl string or
perl array. This function alone would be reason enough for the module to
exist, and it's really more of an afterthought.
Caveats
- Again, if you don't need SQL, use AnyData instead
Currently, DBD::AnyData will not allow SQL against multiple
tables in the same SQL statement (no JOINs) Updated: per jZed this feature is now available
- It isn't a real RDBMS. Don't expect atomicity, journals, etc etc
- Not all formats are fully featured, and most require more modules
Summary
DBD::AnyData is one of those fun modules that lets you shove the crud
work off on someone else (the author of the AnyData::Format:: module) and
get on with crafting good code. I've found it especially helpful when
putting together tiny web apps that might end up getting huge (and thus
require a moving to a true database). Anything that lets me stop writing
file format converters is worth checking out in my book.
|
List::Compare
2 direct replies — Read more / Contribute
|
by McMahon
on Mar 25, 2004 at 12:34
|
|
I had to create a report of the differences between two files each containing thousands of unsorted records. I *thought* that I needed some form of Diff. I tried Algorithm::Diff, but discovered that it only works line-by-line. For instance, Algorithm::Diff reports that (1,2,3) and (2,1,3) are different lists.
Furthermore, some commercial tools I tried did the same thing.
I accidentally stumbled across List::Compare, which was lucky for two reasons.
Most importantly, List::Compare solved my problem, and more: it shows intersections and unions of sets; it shows elements unique to either list (that was my particular problem); it shows all unique elements of both lists, and even all elements of both lists. The interface is elegant and intuitive. I'll be dealing with large inventory lists well into the future, and List::Compare is shaping up to be my tool of first resort for all my list comparison needs.
I was also lucky because James Keenan gives a detailed history of the source (Perl Cookbook) and the circumstances (introductory Perl course) that inspired him to write the module, as well as pointing out a number of similar modules, just in case List::Compare doesn't solve your particular problem. I am relatively new to Perl, and I don't entirely grok the power of hashes yet. List::Compare not only allowed me to solve my immediate problem quickly and elegantly, but it also showed me how to understand the code that underlies the List::Compare module itself.
Elegant, intuitive, well-documented, and with great hints about the magic behind the module. I'm glad I found List::Compare, and you probably will be, too.
|
Proc::Background
1 direct reply — Read more / Contribute
|
by flyingmoose
on Mar 24, 2004 at 17:40
|
|
Ah, the subject of processes and forking. This brings up many questions in the monastery, so I decided I would add this to the module reviews section each time it came up. Yes, folks, spawning processes and waiting on them doesn't have to be so complicated. Let's look at the problem-space a bit and we will understand why Proc::Background is so cool to have.
The Problem
unix: There are several ways to background a process on Unix, one can fork and exec a system call, one can make a system call using &, and so on. There are also things like Proc::ProcessTable, but again, this can get complicated.
windows: There are also several ways to background a process on Windows, but forking is often implemented wrong on your particular version and should be avoided. There is Win32::Process, but it requires an absolute path to the executable you are trying to run, which can sometimes be hard to acquire when you don't know where that program lies in your path.
The Solution
The above problems can get ugly in a hurry, especially if you are new to Unix (and don't understand fork well), don't want to deal with Win32 modules, or if you want code that works in something other than Unixy environments. This is where Proc::Background comes in. It allows one to not worry about Unix or Windows and to (effectively) manage processes without all of the gory details. In addition, it allows waiting on arbitrary processes and checking on the status of each. Very cool, and even cooler because it is cross platform.
Example Code (borrowed from CPAN)
use Proc::Background;
my $proc1 = Proc::Background->new($command, $arg1, $arg2);
my $proc2 = Proc::Background->new("$command $arg1 1>&2");
$proc1->alive;
$proc1->die;
$proc1->wait;
see CPAN for the full list of functions, but those are the basics. Easy, no?
When To Use It
- When you want clean code that is very short and understandable
- When you must execute processes in cross-platform code for Windows and Unix
- When you don't have enough tylenol/advil/beer to deal with Win32::Process
- When you have to inquire about the status of arbitrary processes or must act upon their states (is process A up? How about B? Now wait for C to finish!)
When Not To Use It
- When you have Unix buddies you are trying to impress
- When you have Win32 buddies you are trying to impress
- When you are trying to keep up your job security by keeping code hard to read :)
- When you need to execute arbitrary Perl code and not seperate executables
- If you are using solaris or cygwin. It appears that (per CPAN reports) this may not work there. Your mileage may vary. (I use Linux and Win32 most of the time).
- Foreground processes? NO, WAIT! It works there too, just use 'wait' method on your process after you invoke it and you have something a little more spiffy than the stock system call. This is at your discretion, of course, this isn't really required.
The Final Word
This module is very useful and is currently in my top 10. It efficiently allows management of any number of processes and allows me to forget (when I feel like it) how fork, exec, and Win32::Process work in Perl -- saving me pain and frustration. It also makes code much more legible due to a clean API, and that is always a good thing in a module.
You can remember how to use it without looking things up, since the API is so basic -- this is goodness. Try it out, unless you are a Unix purist who must always write their own fork code to spawn processes, this should work great for you.
|
Convert::Morse
2 direct replies — Read more / Contribute
|
by PERLscienceman
on Feb 22, 2004 at 22:30
|
|
CPAN LINK: Convert::Morse
CPAN ABSTRACT: Package to convert between ASCII text and MORSE alphabet.
Introduction:
This module caught my eye as yet another 'Cool Use For Perl', appealing to both the avid Perl Programmer and Amateur Radio Hobbyist inside me.
Functionality:
In a nutshell, Convert::Morse converts an ascii string to equivalent International Morse Code dots and dashs and visa-versa. In addition, the module contains a function to check and see if a particular ascii string "is morsable", convertable to a valid morse code string.
Demo Code:
#!/usr/bin/perl
use strict;
use Convert::Morse qw(as_ascii as_morse is_morsable);
print as_ascii('.... . .-.. .-.. --- -- --- .-. ... .'),"\n"; # 'Hell
+o Morse'
print as_morse('Perl?'),"\n"; # '.--. . .-. .-.. ..--..
+'
print "Yes!\n" if is_morsable('Hello Perl.'); # print "Yes!"
Bug(s) Found:
None immediately found with preliminary testing.
Module Author Noted Limitation:
Can not yet do Japanese code nor German Umlaute.
Final Thoughts:
With further tinkering I found this module to be quite useful in converting English text to valid dot-dash-spaces International Morse Code; indeed another Cool Use For Perl for both die-hard those who are both Die-Hard Perl Programmers and Radio Enthusiasts alike.
UPDATE:There is indeed only one International Morse Code recognized by International Treaty that does not include japanese or umlaute character sets; Thanks to theorbtwo for pointing that out.
|
Number::Spell
2 direct replies — Read more / Contribute
|
by PERLscienceman
on Jan 27, 2004 at 20:07
|
|
CPAN LINK: Number::Spell
CPAN ABSTRACT: Number::Spell provides functionality for spelling out numbers. Currently only integers are supported.
Introduction:
Every so often I get the urge to troll around the CPAN Module Repository to see what is interesting. This module caught my eye as a 'Cool Use for Perl', so I thought I would download it and give it a try.
Functionality:
Number::Spell, upon initial tinkering does what it says spells out integers into english words. I tested it out on some 'smaller' numbers (see demo below) and it seemed to work fine. So... I thought to myself lets see if we can break it, after all the documentation say it can go into the 'vigintillions' Unfortunately, I was successful at breaking it. I hit the ceiling at 100 trillion, after adding one more zero, which would then be one quadrillion, it only returned 'one'.
Demo Code:
#!/usr/bin/perl -w
use strict;
use Number::Spell;
my $string=spell_number(8597);
print "$string\n";
-----
Result:
eight thousand five hundred ninety seven
Bug(s) Found:
Stops working properly after 100 Trillion.
Final Thoughts:
I think this would be a really cool module if it worked properly past 100 Trillion. Has anyone else tried it and had the same result? Another possiblity would be for interpreting reals, although that would probably be pretty dicey to implement especially with a number like 1.09873532 . The author did mention this possibility in the original version.
UPDATE:
I did a little digging into the module and determined that when you go one place above 100 Trillon (1000000000000000),
the number then get interpreted or changed into exponential
format by the interpreter so instead of 1 Quadrillion as 10000000000000000, it is 1e16, which is not recognized/proper split by the regex inside the if statement on line 83 of spell.pm .
if($data=~/(\-?)\s*(\d+)/){
With a little more tinkering/experimentation with the demo script I have determined that such a large number will work with the module if you send it as a quoted string.
my $string=spell_number('1000000000000000'); #this works
my $string=spell_number(1000000000000000); #this doesNOT
The ultimate solution would be to fix the regex on line 83 in spell.pm to deal with the exponential format. Unfortunately a regex wizard I am not. So.... I would be inclined to leave that in the hands of the author or someone who knows more of regexes than I do. :)
|
Time::Piece::MySQL
1 direct reply — Read more / Contribute
|
by jeffa
on Jan 04, 2004 at 14:02
|
|
Time::Piece::MySQL is a very useful module for MySQL users. It is simply an
extension to Time::Piece that provides a handful of methods for converting back and
forth between Time::Piece objects and the MySQL date/time types: date, time, datetime, and
timestamp. (The year type is available from Time::Piece, so it doesn't need to be here.)
As an example, say i had a table of events that contained an id and a datetime field:
+---------+------------------+
| Field | Type |
+---------+------------------+
| id | int(10) unsigned |
| date | datetime |
+---------+------------------+
and i wanted to add 50 days to to each date. The following snippet would do just that:
use strict;
use warnings;
use DBI;
use Time::Seconds;
use Time::Piece::MySQL;
my $dbh = DBI->connect( ... );
my $sth = $dbh->prepare('update events set date = ? where id = ?');
my $dates = $dbh->selectall_arrayref(
'select id,date from events', {Slice => {}}
);
for (@$dates) {
my $date = localtime->from_mysql_datetime( $_->{date} );
$date += ONE_DAY * 50;
$sth->execute( $date->mysql_datetime, $_->{id} );
}
A very trivial example, but i think it demonstrates how it can make someone's Perl/MySQL
script easier to work with.
|
desift
1 direct reply — Read more / Contribute
|
by princepawn
on Nov 27, 2003 at 19:07
|
|
desift is a Perl program for data munging. What is data munging? According
to Dave Cross, author of ``Data Munging in Perl'', data munging has 3 phases:
- Read in the data
- Transform the data
- Output the data
Since desift is a data munging program, we can describe it via this
framework.
desift reads in your data from STDIN or from files specified on the
command-line after the option switches. It splits your data into a Perl
array for you. You control the split via the -d switch:
-d REGEX Field delimiter in input-file(s). Default is "\t" (tab)
A common phase of reading in data is filtering out what you don't want. To
specify input lines that you want to skip, supply the
-s option to desift:
-s REGEX Skip rows in input-file(s) matching REGEX.
In desift, the input data is transformed via a template string which may
be supplied on the command-line or in a file. There are two elements to the
template string: plain text and positional tags indexing into the array
built from splitting your input data.
If your template string is in a file, use the -t option. If you want to
supply the template string on the command-line, then use the -T option.
Here is a sample desift command using flags we have seen so far:
ls -l | desift -d"\s+" -T"File: %9 Permissions: %1" -s"^total"
It's not a completely perfect example, because filenames with spaces will only
have the part listed before the space in the filename. Here is some sample
output:
File: chessgames-dotcom Permissions: -rwxr-xr-x
File: desift Permissions: -rwxrwxrwx
File: desift.pod Permissions: -rw-r--r--
File: gerbold Permissions: drwxr-xr-x+
File: upload-cpan.pl Permissions: -rwxr-xr-x
File: xemacs.bat Permissions: -rwxr-xr-x
We can learn some things from looking at this output. First of all, the word
Permissions does not always start at the same column. My first attempt to
fix this was to put a tab in the template string. To do so, you must manually
put a tab in the string: \t or \\t or changing the string from single to
double-quote does not work. Even so, the output is still not lined up:
File: chessgames-dotcom Permissions: -rwxr-xr-x
File: desift Permissions: -rwxrwxrwx
File: desift.pod Permissions: -rw-r--r--
File: gerbold Permissions: drwxr-xr-x+
File: upload-cpan.pl Permissions: -rwxr-xr-x
And this makes output hard to read. I envision two possble solutions to this
problem.
One possible fix is to have a template flag which takes a numeric argument
indicating at which column the output should be written.
Another fix is more time and compute-consuming. Sift could read in all the
lines and then output them with just enough space for the columns to line
up... sort of like a database does when you SELECT data.
In a sense, transformation and output are one step in desift. Once a line of
data is transformed, it is then output.
desift is a cleanly written module which simplifies and abstracts the
split, array-slice, print-and-join cycle of programming leading to one
succinct command-line instead of a series of function calls.
In looking back at a recent project of mine, I find desift to be inadequate
for what I had to do. I had a CSV-file with name, email, phone, etc. I had to
filter this file for profanity and invalid email addresses and then output the
new file in a tab separated format for import into a database.
First of all, the input phase. Parsing CSV is not easy. I
could not pass desift a regexp to do such a split and field-massage
properly. Also, some of the data
was in Unicode format and only Text::CSV_XS with its Binary
option was robust enough for this task. Also I was dealing
with 4 files of 25 million lines each, so using a c-based module such as
CSV_XS was desirable for speed reasons as well.
For the transformation phase *of this project*, desift was
adequate. However, what if I wanted apply the Perl lc function to a field
instead of just writing it? That is a very likely operation and impossible with
desift. Now, what if the template for desift were passed off to sprintf
instead of its own custom sprintf-like formatter? And what if access to the
split array were via a localized variable such @_split? Then we could do
lc if we wished:
<code>
ls -l | desift -s``^total'' -d``\s+'' \
--sprintf-string=``File: %s Permissions: %s''
--sprintf-args=``lc($_split8), $_split[0]''
</code>
Also why is there not possibility of filtering data after it is split? Perhaps
we can only determine if data should be transformed or output after a test of
some sort. A test such as adding
two fields together, doing a SQL SELECT on a database or grepping a file, or
seeing if a certain or all fields were in the line.
Thus we can conclude that filtering callbacks should be available at each stage
as executable Perl subroutines and not limited to regular expressions at any
stage.
As mentioned earlier, the output phase is wed to the transformation phase. Thus
it is up to you use I/O redirection to capture desift output. For example,
it is not (and
perhaps should not be?) possible to commit desift results directly to a
database on a line-by-line basis. But in my experience, c-based SQL loaders
supplied with databases are 8-fold faster than isolated inserts via
Perl/DBI. So, complete of control over channeling output may or may not be a
Bad Thing.
There are a number of splitting and filtering and formatting modules available
on CPAN:
Parse::FixedLength, Parse::FixedDelimiter,
Text::xSV, Text::CSV_XS, DBD::AnyData,
Spreadsheet::ParseExcel.
Core Perl provide grep which is extremely adequate in a large number of
cases. Common
CPAN modules for doing so are Regexp::Common::profanity_us and
Email::Valid.
Core perl provides HERE documents, sprintf. And then on CPAN, there are very
few options for formatting a data model (hah!). Need I list any?
Template, Data::Table, Data::Presenter,
HTML::Template, Text::MagicTemplate
desift is available at http://desift.sourceforge.net
It was written by James Shimada.
|
Acme::Apache::Werewolf
5 direct replies — Read more / Contribute
|
by rob_au
on Nov 17, 2003 at 05:45
|
|
Whilst I read through the list of newly uploaded modules onto CPAN on a daily basis, it is rare that I see new modules which:
- Employ modules from the Astro:: namespace,
- Employ Apache access handlers in an interesting manner,
- Protect your web directories from werewolves, and,
- Inspire me to write module reviews.
Yet in my daily sojourn through the CPAN Uploads today, I found a module which incorporated all of the above - Acme::Apache::Werewolf. This module implements an Apache access handler which can be used to deny access to web directories based upon the phase of the moon, or more specifically, during the full moon, thereby protecting your web directories from maraudering werewolves.
Using this module, protecting files from werewolves is relatively straight-forward:
<Directory /fullmoon>
PerlAccessHandler Acme::Apache::Werewolf
PerlSetVar MoonLength 4
</directory>
The only configurable parameter associated with this module is MoonLength which determines the length in days over which the moon is considered to be in full. In the above configuration, the full moon is 4 days, which would be from day 12 through day 16 of the lunar cycle. And in the words of the module author, it is wise to err on the side of caution and make this too large, rather than too small and risk the wrath of werewolves.
Now if only I could similarly protect my web directories from other supernatural beings ...
|
Acme::Comment
2 direct replies — Read more / Contribute
|
by PERLscienceman
on Nov 05, 2003 at 22:07
|
|
CPAN LINK: Acme::Comment
CPAN ABSTRACT: This module allows for multi-line comments which are filtered out. Unlike the pseudo multi-line comment if (0) {}, the code being commented.
Introduction:
Roaming about the monastery I came upon the following node:
Block Commenting, a fellow monk essentially searching for advise on the possibilty of multi-line commenting. A few replies down mentioned in passing the module Acme::Comment, it's claim was to allow multi-line comments which are filtered out. I became curious so I downloaded the module and checked it out for myself.
Functionality:
Acme::Comment, in a nutshell, allows for multi-line and single line commenting in several different language styles. Some of the multi-line comment language styles are: C++, HTML, Pascal and Java to name a few. In total, both single and multi line together there were a total of 43
different programming languages represented. I tested it out in "HTML" mode
using multi-line comments with ActivePerl 5.8.0 on WinXP,
and it proved to be quite easy to use. The distribution itself contains fairly straight forward documentation.
Generic Example:
#!/usr/bin/perl -w
use strict;
use Acme::Comment type=>'HTML';
<!--
Multi-line comments here.
Everything enclosed in the html
style comment braces is ignored.
-->
my $a=1;
my $b=2;
my $c=$a + $b;
print "$a + $b = $c\n";
Final Thoughts:
For implementation of multi-line comments in various language formats I found this module to be quite useful.
The only drawback that I can immediately see is that this module is not yet widely known. I think multi-line commenting
(a single format of) would be great implemented as a standard
feature in a future version of Perl; perhaps in Perl6? (it can't hurt to hope)
|
Acme::DNS::Correct
3 direct replies — Read more / Contribute
|
by antirice
on Sep 22, 2003 at 12:59
|
|
If you've been following the news as of late, VeriSign has decided to resolve all non-existent domain names to a service they set up called Site Finder in a scheme to take advantage of their monopoly as controllers of the .com and .net TLDs. This is particularly annoying for individuals who enjoy checking the validity of links (I send a header that is the same as what IE 6.0 would send). As is shown by the following link, if the domain name doesn't exist (expired or was entered by someone who just wanted to dump trash into your database) then it will return a valid page (after a 302 moved header): http://www.lsadjflj.com/alksdjf/aldhgjh.
Enter Acme::DNS::Correct to correct this problem. It is designed as a drop-in replacement for Net::DNS::Resolver. If the ip for the Site Finder site is detected, the response will be cleansed of the offending ip. The only bug in this module is in the case where you actually wish to resolve sitefinder-idn.verisign.com.
|
Tree::DAG_Node
2 direct replies — Read more / Contribute
|
by bm
on Sep 02, 2003 at 11:13
|
|
I work in release management, and am constantly dealing with tree's of many different types, such as projects, file system, releases, versions of a file, class diagrams,etc. I have grown used to managing these through a variety of different ways, such as hashes, or rolling my own "tree structure" by adding parent/child or predecessor/successor properties and methods to my classes.
But Perl being Perl, well, CPAN being CPAN, you can always take advantage of other's experience. A few searches later, I found Tree::DAG_Node.
This class represents tree structures in OO Perl. Specifically, this class manages the relationships between a set of "nodes". There is only one type of relationship you can create: the mother node and the daughter node. The daughter list of a node is ordered, but this is of course ignoreable. While a node can contain whatever data you would like it to (through the 'attributes' property), not every relationship can be created - for example, a node may only have one mother.
The author, prolific CPAN contributor Sean M. Burke, encourages inheriting off Tree::DAG_Node. This exposes a seriously large number of tree related methods to your class, such as:
- $node->daughters or $node->mother
- $node->add_daughter or $node->add_daughter_left
- $node->attributes->{'version'} = '1.1.3.4' (defines the attribute 'version' in the node)
- $node->ancestors (returns a list of nodes)
- $node->walk_down ({ callback => \&foo, callbackback => \&foo, ... }) (a depth first traversal of the tree, executes the passed callback)
- various list of list to tree conversion methods
- $node->draw_ascii_tree (ASCII pretty print of the tree)
And that is just the summary! To quote the doco In fact, I'd be very surprised if any one user ever had use for more that even a third of the methods in this class. And remember: an atomic sledgehammer will kill that fly.
, this is a very large class. Perhaps too over the top for some solutions (hence the atomic sledgehammer analogy!). Autoloader is not implemented, so some might find this a little slow for their needs, and there lies the biggest problem with this class. The other thing I don't get is where he gets the DAG in DAG_Node from!
In summary though, and excellent class that provides a vast array of sophistication to a tree structure. Adding:
use Tree::DAG_Node;
@ISA = qw(Tree::DAG_Node);
to the top of your class can open many doors that would not of otherwise existed (just look at the walk_down method alone). I highly recommend this class for implementing tree structures.
|
Semi::Semicolons
2 direct replies — Read more / Contribute
|
by ailie
on Aug 31, 2003 at 18:44
|
|
Authors: David H. Adler and Michael G. Schwern, from an idea by Adam Turoff
Version: 0.03
Description: The Semi::Semicolons module allows you to use 'Peterbilt' rather than a semicolon as your statement terminator.
use Semi::Semicolons;
print "Why on earth would anyone use this?"Peterbilt
You can also customize your statement terminator.
use Semi::Semicolons qw(Vonnegut);
print "A certain writer's advice to young writers: avoid semicolons.\n
+"Vonnegut
(Of course, using 'Vonnegut' rather than the name of an actual semi may be considered, by some, to detract from the humor of the module's name.)
Why should you use it? You probably shouldn't, unless you're easily amused (like me).
Why should you not use it? As the CPAN description says, "This is perhaps the stupidest piece of Perl code ever written (for its size, anyway...)"
Verdict: Two thumbs up. Way up!
|
|