review reviews

User Questions

PAR 2 direct replies — Read more / Contribute	by rinceWind on Jul 06, 2004 at 12:15

PAR has saved my day! It will go a long way towards helping me in my crusade of Perl advocacy in the present $CLIENT (who shall remain nameless) environment.² The situation is that we have three Unix (Solaris) environments: dev, test and live. Although Solaris comes with a version of perl pre-installed, the admins have restricted access to it on all but the dev machine (under some historical edicts from senior management). Now, as part of my activities in development, I have been producing a nice set of perl tools (as is my want), to get my job done. Many of these would benefit production, especially for production support, which is closer to what my current role is. PAR for the course +----+ foo.pl===>\| pp \|====>foo +----+ Just like cc, pp turns your Perl file (and all called modules) into an executable. It's not actually compiling them, just packaging them up (and compressing). My colleagues feel a lot happier about the (potentially) tamper proof nature of executables, and I can deploy my perl 'scripts' into test and live environments. Funnily enough, they haven't had a problem with deploying .ksh or .awk scripts. Caveat removed (thanks are due to PodMaster for helping me sort this one out. Considered: MUBA move to meditations: not actually a review: it's more like giving comliments but that's it Unconsidered: ysth - Keep/Edit/Delete = 6/9/0 - moving from reviews isn't supported Update: I do feel that bibo and MUBA have a point, and this node should be more of a review (see also this node). Here goes... Peeping under the hood Although PAR delivers applications that work out of the box¹, it is worth examining what the output of pp actually is. The resulting executable is a self extracting zip file, and tools such as winzip or unzip -l can reveal the contents of the file. Besides the various perl modules called, you will find script/foo.pl (this being your source script) and script/main.pl. Note that you don't see Perl itself or any of the magic glue that makes your application run. script/main.pl is a small bootstrap script that calls your script, and it looks like this: `my $zip = $PAR::LibCache{$ENV{PAR_PROGNAME}} \|\| Archive::Zip->new(__FI +LE__); my $member = eval { $zip->memberNamed('script/foo.pl') } or die qq(Can't open perl script "script/foo.pl": No such file + or directory ($zip)); PAR::_run_member($member, 1);` [download] The first time a PAR application is run (or a new version of the application), it is unzipped into temporary directories: $TEMP/par_$USER/cache_nnnnnnnnnnn the nnnnnnnnnnn here is an MD5 checksum, hence a new version of the PAR application will generate a new cache directory. The first time the application is run, there will be a significant startup time, as the zip kit is unpacked. It is worthwhile explaining this to users. Run time requires Unfortunately, PAR (and Module::ScanDeps) will not detect all modules, especially those whose name is determined at run time. If you are missing a module or two, you need to include the modules in the build - either by adding explicit use or require statements in the script (useful for Tk widget modules), or by specifying -M to pp. Parlez vous? There are circumstances where you don't want to bundle everything into a single executable. You may have several scripts (CGI scripts for example) calling the same bunch of modules, and you want to distribute this. The PAR install delivers another utility called parl (PAR loader), which takes mostly the same parameters as perl. This provides the ability to run a perl script and call in one or more PAR libraries (built with pp, but called .par). Again, a Perl install is not needed on the target machine. You need to deliver parl, the scripts and the PAR libraries. On the Windows platform, you could package this up with Install Shield. tkpp For those averse to command lines, there is a script provided called tkpp, which provides a very thin GUI Tk wrapper around pp. Far from it being a responsive GUI, it freezes while building the application. I find no benefit in using this over the command line, other than saving having to remember the command line options. Notes:* ¹ I like things that work straight out of the box. I am a great fan of Knoppix - a Linux distro that works straight off the CD. ² This was the subject of a lightning talk I gave at YAPC::Europe 2004, the slides of which are here
Archive::Tar 3 direct replies — Read more / Contribute	by graff on Jun 01, 2004 at 00:06

Requires IO::Zlib in order to handle compressed tar files. For just about any operating system you'll ever use, you should be able to find some command-line utility or GUI application that knows how to read unix tar files (even those that have been compressed), list the names and sizes of directories and files that they contain, and extract their contents. Many of these utilities also know how to create tar files from a chosen directory tree or list of files. So why does anyone need a Perl module that can do this? Well what if you're on a system where the available tools don't support the creation of a tar file? In this case, Archive::Tar will allow you to create such a tool yourself, so when some remote colleague says "just send me a tar file with the stuff you're working on", you can do that -- rather than embarrass yourself by replying with something like "I only have pkzip / WinZIP / Stuff-it / Toys-R-Us Archiver / ... Will that format be okay?" (To be honest, that's not really much of a problem these days -- most archiving file formats have supporting utilities ported to most OS's, and the definitive command-line "tar" utility is available and fully functional for every version of Microsoft OS and MacOSX, as well as being intrinsic to every type of unix system. There must be one for Amiga as well...) But there is one feature of the definitive "tar" tool that can be a bit limiting at times: whether creating or extracting tar file contents, it is completely, inescapably faithful to its source. In creation, whatever is on disk goes directly into the tar set, sticking to the path and file names provided; on extraction, whatever is in the tar set goes right into directories and files on disk, again, sticking to the names provided. There are ways to (de)select particular paths/files, but that's about all the flexibility you get. Usually, this is exactly what everyone wants, but sometimes you might just wish you could do something a little different when you create or extract from a tar file, like: - rename files and/or directories - simplify an overly complicated directory layout - sort files into a more elaborate directory layout - modify file content, or skip certain files - do any (combination) of the above based on any available information, including path/name, date, size, ownership, permissions and/or content, or even some other source, like a database or log file. There's also a situation not envisioned when "tar" was first conceived decades ago, but common today: you may want to accumulate a set of resources from the web and save them all in one tar file, without ever bothering to write each one to a separate local disk file first -- tar is just a really handy format for this sort of thing. Unfortunately, as of this writing (Archive::Tar v1.08), the flexibility you get with this module is limited by one major design issue: the entire content of a tar set (all the uncompressed data files contained in the tar file) must reside in memory. Presumably, this approach was chosen so that both compressed and uncompressed tar files would be handled in the same way. If people only dealt with uncompressed tar files, then the module could be designed to scan a tar image and get the path names, sizes, other metadata, and the byte offsets to each individual data file in the set, so that only the indexing info would be memory resident. But you can't do that very well when the tar file is compressed, and tar files tend to be compressed more often than not. And since there is no inherent upper bound on the size of tar files -- and tar is often used to package really big data sets -- users of Archive::Tar need to be careful not to use this module on the really big ones. (This can be awkward when a compressed tar file contains stuff that "compresses really well", like XML data with overly-verbose tagging -- I've seen compression ratios of 10:1 on such data.) When you install Archive::Tar, you also get Archive::Tar::File, which is the object used to contain each individual data file in a tar set. When you do: `my $tar_obj = Archive::Tar->new; $tar_obj->read( "my.tar") # or just: my $tar_obj = Archive::Tar->new( "my.tar" );` [download] this creates an array of Archive::Tar::File objects. Called in a list context, "read" returns the list of these objects (and as you would expect, when called in a scalar context, it returns the number of File objects). When you are creating a tar set, each call to the "add_data" or "add_files" method will append to the list of File objects currently in memory. When you call the "write" method, all the File objects currently in the set are written as a tar stream to a file name that you provide (or the stream is returned to the caller as a scalar, if you do not provide an output file name). There are also three class methods that provide just the basic operations of listing and extracting tar-set contents, and creating a tar set from a list of data file names. These methods avoid the memory load, because they don't bother holding the data files in memory as objects. This also means that you give up a lot of detailed control on individual data files. The Archive::Tar::File objects provide a lot of handy features, including accessors for file name, mode, linkname, user-name/uid, group-name/gid, size, mtime, cksum, type, etc. You can rename a file or replace its data content, check for empty files using "has_content", and choose between getting a copy of the file content or a reference to the content ("get_content" or "get_content_by_ref"). Getting back to the use of the Archive::Tar object itself, I did come across one potential trap when trying to do a "controlled" extraction of data from an input tar file. Most of the object methods related to reading/extracting are presented in terms of using one or more individual file names as the means for specifying which data file to handle -- in fact, after the "new" and "read" methods, the next three methods described are "contains_file( $filename )", "extract( [@filenames] )" and "list_files()". Most of the remaining methods also need or allow file names as args, which leads one to assume that the "easiest" way to use the object is to do everything in terms of file names. But the problem is that each time you specify a file name for one of these object methods, the Archive::Tar module needs to search through its list of Archive::Tar::File objects to find the file with that name. This gets painfully slow when you're dealing with a set of many files, and doing something with each of them. Obviously, there are bound to be situations where you will need to do things by specifying particular data file names in a tar set, but more often, you'll want to work directly with the File objects. A couple of examples will suffice to show the difference: `use Archive::Tar; # here's the slow approach to mucking with files in a tar set: my $tar = Archive::Tar->new( "some.tar" ); my @filenames = $tar->list_files; for my $filename ( @filenames ) { my $filedata = $tar->get_content( $filename ); # do other stuff... }` [download] The problem there is that each call to "tar->get_content( $filename )" invokes a linear search through the set of data files, in order to locate the File object having that name. The following approach goes much faster, because it just iterates through the list of File objects directly: `use Archive::Tar; my $tar = Archive::Tar->new( "some.tar" ); my @files = $tar->get_files; # returns list of Archive::Tar::File obj +ects for my $file ( @files ) { my $data = $file->get_content; # same as above, but no searching +involved # do other stuff... }` [download] And of course, given the list of File objects, you have much better control over the selection and handling of files -- here are a few examples: `my %cksums; for my $file ( grep { $_->has_content } @files ) # only do non-empty f +iles { next if $file->uname eq 'root'; # let's leave these alone if ( $file->name =~ /\.[ch]$/ ) { # do things with source code files here... } elsif ( exists( $cksums{ $file->cksum } )) { # file's content duplicates another file we've already seen... } else { $cksum{$file->cksum} = undef; # keep track of cksums } }` [download] To conclude: on the whole, if fine-grained control of tar file i/o is something that would be helpful to you, and if you can limit yourself to dealing with tar files that fit in available memory, then you really should be using this module. It's good. (updated to fix some typos and simple mistakes.)
Params::Validate 1 direct reply — Read more / Contribute	by rinceWind on May 16, 2004 at 18:08

Whereas C++ and Java go to great lengths to ensure that the data types of any arguments passed to a function, match what is expected, perl is often criticised for not enforcing anything. Perl 6 will address this issue, but in the mean time, any subroutines you write are passed an array @_ containing a potential hotch-potch of scalars and references, blessed or otherwise. Although it's nice to write code that can DWIM given a parameter of varying data types, this is always an extra effort to implement. Also, what will your sub do when passed a SCALAR when it's expecting an ARRAYREF? At best, it will die with a run-time error, and a message which likely as not, would not be immediately obvious. Enter Params::Validate By default, this module exports the functions validate and validate_pos. Params::Validate caters for two calling conventions: named parameters and positional parameters, which are validated by validate and validate_pos respectively. Unfortunately you can't mix them; any positional parameters before your list of key/value pairs need to be removed first. In fact, this is the way to use Params::Validate to handle method calls. Examples: `sub mymethod { my $self = shift; my %args = validate( @_, { foo => 1, bar => { default => 99} } ); ... } sub mymethod2 { my $self = shift; my ($meenie,$minie,$mo) = validate_pos( @_, { type => SCALAR }, { type => SCALAR \| UNDEF }, { type => ARRAYREF, default => [] } ); ... }` [download] As is apparent, the call to validate or validate_pos is quite straightforward and edifying to someone else looking at the code. It's also quite easy to add such validation to existing code, for an immediate gain in robustness without too much cognitive effort. The module provides a whole host of tools for validating your argument list - I have just scratched the surface. Error handling When the parameter validation fails, the default action it to croak, with quite a helpful message about which parameter is invalid and why. You can elect to use a callback to catch validation errors instead. Conclusion I am a convert to using this module. I recommend it for CPAN modules and for corporate coding standards. Update I have had some interesting dependency issues with modules of mine that are using Params::Validate. I have seen CPAN pull in Ponie via Attribute::Handlers. Why would I want the Parrot/Ponie stuff coming into my production environment ?! I asked Arthur Bergman about this, and apparently it is a spurious dependency picked up by CPAN.pm, apparently sorted in a later release of Attribute::Handlers. I did notice also that ActiveState's module status page was showing any of my modules that use Params::Validate, as having a dependency on Ponie.
DBD::Anydata 4 direct replies — Read more / Contribute	by idsfa on May 14, 2004 at 14:04

Module Author: Jeff Zucker <jeff@vpservices.com> Documentation Abstract Excellent tool for developing programs with limited "database" needs, prototyping full-on RDBMS applications and pulling in common data interchange formats. If you don't need /want the SQL baggage, try AnyData instead. Pre-Requisites SQL::Statement DBD::File AnyData DBI various others for some data formats Overview DBD::AnyData is a DBI/SQL wrapper around AnyData which allows the author to use many SQL constructs on traditionally non-SQL data sources. Descendant from DBD::RAM, DBD::AnyData also implements that module's ability to load data from multiple formats and treat them as if they were SQL tables. This table can be held entirely in memory or tied to the underlying data file. Tables can also be exported in any format which the module supports. Review The variety and number of file formats in use is staggeringly large and continues to grow. Perl hackers are often faced with the job of being syntactic glue between applications, translating output from one program into the necessary input for another. Abstracting the exact format of these data allows the programmer to rise above mere hacking and actually craft something (re)usable. Separating the logic from the presentation improves the clarity of both. DBD::AnyData attempts to provide this abstraction by presenting a DBI/SQL interface. It layers over the required/companion AnyData module, which presents a tied hash interface. The perl purist will most likely prefer to stick with AnyData, minus the DBD. The extra layer of abstraction will be most useful if you are more comfortable with SQL or your application design requires it. To my mind, the niftiest use of this module is the ability to prototype your code as if you had a whole relational database, but have the ease of a few simple CSVs actually holding the data. The list of supported formats is impressive, and continues to expand. CPAN currently lists: perl data structures and __DATA__ segments Delimited text (Comma/Pipe/Tab/Colon/whatever separated) Fixed length records HTML Tables INI Files passwd Files MP3 Files (specifically, their ID3 tags) Paragraph Files Web Server Logs XML Files DBI Connections (to leverage existing modules) With more on the way. DBD::AnyData has three basic modes of operation: file access, in-memory access and format conversion. These modes are implemented as five extension methods over a standard DBD. In file access mode, the data file is read on each request and written on each change. The entire file is never read into memory (unless requested) and so this method is suitable for large data files. Be aware that these are not atomic commits, so your database could end up in an inconsistent state. This mode is not supported for remote files or certain formats (DBI, XML, HTMLtable, MP3 or perl ARRAYs). In-Memory mode loads the entire data source into memory. Obviously a problem for huge data sets, but then you probably have those in a relational database already. This method is ideal for querying a remote data source, handled in the background by good old LWP. Conversion mode takes data from an input (which can be local or remote, and in any supported format) and writes it to a local file, perl string or perl array. This function alone would be reason enough for the module to exist, and it's really more of an afterthought. Caveats Again, if you don't need SQL, use AnyData instead ~~Currently, DBD::AnyData will not allow SQL against multiple tables in the same SQL statement (no JOINs)~~ Updated: per jZed this feature is now available It isn't a real RDBMS. Don't expect atomicity, journals, etc etc Not all formats are fully featured, and most require more modules Summary DBD::AnyData is one of those fun modules that lets you shove the crud work off on someone else (the author of the AnyData::Format:: module) and get on with crafting good code. I've found it especially helpful when putting together tiny web apps that might end up getting huge (and thus require a moving to a true database). Anything that lets me stop writing file format converters is worth checking out in my book.
List::Compare 2 direct replies — Read more / Contribute	by McMahon on Mar 25, 2004 at 12:34

I had to create a report of the differences between two files each containing thousands of unsorted records. I thought that I needed some form of Diff. I tried Algorithm::Diff, but discovered that it only works line-by-line. For instance, Algorithm::Diff reports that (1,2,3) and (2,1,3) are different lists. Furthermore, some commercial tools I tried did the same thing. I accidentally stumbled across List::Compare, which was lucky for two reasons. Most importantly, List::Compare solved my problem, and more: it shows intersections and unions of sets; it shows elements unique to either list (that was my particular problem); it shows all unique elements of both lists, and even all elements of both lists. The interface is elegant and intuitive. I'll be dealing with large inventory lists well into the future, and List::Compare is shaping up to be my tool of first resort for all my list comparison needs. I was also lucky because James Keenan gives a detailed history of the source (Perl Cookbook) and the circumstances (introductory Perl course) that inspired him to write the module, as well as pointing out a number of similar modules, just in case List::Compare doesn't solve your particular problem. I am relatively new to Perl, and I don't entirely grok the power of hashes yet. List::Compare not only allowed me to solve my immediate problem quickly and elegantly, but it also showed me how to understand the code that underlies the List::Compare module itself. Elegant, intuitive, well-documented, and with great hints about the magic behind the module. I'm glad I found List::Compare, and you probably will be, too.
Proc::Background 1 direct reply — Read more / Contribute	by flyingmoose on Mar 24, 2004 at 17:40

Review of Proc::Background Ah, the subject of processes and forking. This brings up many questions in the monastery, so I decided I would add this to the module reviews section each time it came up. Yes, folks, spawning processes and waiting on them doesn't have to be so complicated. Let's look at the problem-space a bit and we will understand why Proc::Background is so cool to have. The Problem unix: There are several ways to background a process on Unix, one can fork and exec a system call, one can make a system call using &, and so on. There are also things like Proc::ProcessTable, but again, this can get complicated. windows: There are also several ways to background a process on Windows, but forking is often implemented wrong on your particular version and should be avoided. There is Win32::Process, but it requires an absolute path to the executable you are trying to run, which can sometimes be hard to acquire when you don't know where that program lies in your path. The Solution The above problems can get ugly in a hurry, especially if you are new to Unix (and don't understand fork well), don't want to deal with Win32 modules, or if you want code that works in something other than Unixy environments. This is where Proc::Background comes in. It allows one to not worry about Unix or Windows and to (effectively) manage processes without all of the gory details. In addition, it allows waiting on arbitrary processes and checking on the status of each. Very cool, and even cooler because it is cross platform. Example Code (borrowed from CPAN) `use Proc::Background; my $proc1 = Proc::Background->new($command, $arg1, $arg2); my $proc2 = Proc::Background->new("$command $arg1 1>&2"); $proc1->alive; $proc1->die; $proc1->wait;` [download] see CPAN for the full list of functions, but those are the basics. Easy, no? When To Use It When you want clean code that is very short and understandable When you must execute processes in cross-platform code for Windows and Unix When you don't have enough tylenol/advil/beer to deal with Win32::Process When you have to inquire about the status of arbitrary processes or must act upon their states (is process A up? How about B? Now wait for C to finish!) When Not To Use It When you have Unix buddies you are trying to impress When you have Win32 buddies you are trying to impress When you are trying to keep up your job security by keeping code hard to read :) When you need to execute arbitrary Perl code and not seperate executables If you are using solaris or cygwin. It appears that (per CPAN reports) this may not work there. Your mileage may vary. (I use Linux and Win32 most of the time). Foreground processes? NO, WAIT! It works there too, just use 'wait' method on your process after you invoke it and you have something a little more spiffy than the stock system call. This is at your discretion, of course, this isn't really required. The Final Word This module is very useful and is currently in my top 10. It efficiently allows management of any number of processes and allows me to forget (when I feel like it) how fork, exec, and Win32::Process work in Perl -- saving me pain and frustration. It also makes code much more legible due to a clean API, and that is always a good thing in a module. You can remember how to use it without looking things up, since the API is so basic -- this is goodness. Try it out, unless you are a Unix purist who must always write their own fork code to spawn processes, this should work great for you.
Convert::Morse 2 direct replies — Read more / Contribute	by PERLscienceman on Feb 22, 2004 at 22:30

CPAN LINK: Convert::Morse CPAN ABSTRACT: Package to convert between ASCII text and MORSE alphabet. Introduction: This module caught my eye as yet another 'Cool Use For Perl', appealing to both the avid Perl Programmer and Amateur Radio Hobbyist inside me. Functionality: In a nutshell, Convert::Morse converts an ascii string to equivalent International Morse Code dots and dashs and visa-versa. In addition, the module contains a function to check and see if a particular ascii string "is morsable", convertable to a valid morse code string. Demo Code: `#!/usr/bin/perl use strict; use Convert::Morse qw(as_ascii as_morse is_morsable); print as_ascii('.... . .-.. .-.. --- -- --- .-. ... .'),"\n"; # 'Hell +o Morse' print as_morse('Perl?'),"\n"; # '.--. . .-. .-.. ..--.. +' print "Yes!\n" if is_morsable('Hello Perl.'); # print "Yes!"` [download] Bug(s) Found: None immediately found with preliminary testing. Module Author Noted Limitation: Can not yet do Japanese code nor German Umlaute. Final Thoughts: With further tinkering I found this module to be quite useful in converting English text to valid dot-dash-spaces International Morse Code; indeed another Cool Use For Perl for both die-hard those who are both Die-Hard Perl Programmers and Radio Enthusiasts alike. UPDATE:There is indeed only one International Morse Code recognized by International Treaty that does not include japanese or umlaute character sets; Thanks to theorbtwo for pointing that out.
Number::Spell 2 direct replies — Read more / Contribute	by PERLscienceman on Jan 27, 2004 at 20:07

CPAN LINK: Number::Spell CPAN ABSTRACT: Number::Spell provides functionality for spelling out numbers. Currently only integers are supported. Introduction: Every so often I get the urge to troll around the CPAN Module Repository to see what is interesting. This module caught my eye as a 'Cool Use for Perl', so I thought I would download it and give it a try. Functionality: Number::Spell, upon initial tinkering does what it says spells out integers into english words. I tested it out on some 'smaller' numbers (see demo below) and it seemed to work fine. So... I thought to myself lets see if we can break it, after all the documentation say it can go into the 'vigintillions' Unfortunately, I was successful at breaking it. I hit the ceiling at 100 trillion, after adding one more zero, which would then be one quadrillion, it only returned 'one'. Demo Code: `#!/usr/bin/perl -w use strict; use Number::Spell; my $string=spell_number(8597); print "$string\n"; ----- Result: eight thousand five hundred ninety seven` [download] Bug(s) Found: Stops working properly after 100 Trillion. Final Thoughts: I think this would be a really cool module if it worked properly past 100 Trillion. Has anyone else tried it and had the same result? Another possiblity would be for interpreting reals, although that would probably be pretty dicey to implement especially with a number like 1.09873532 . The author did mention this possibility in the original version. UPDATE: I did a little digging into the module and determined that when you go one place above 100 Trillon (1000000000000000), the number then get interpreted or changed into exponential format by the interpreter so instead of 1 Quadrillion as 10000000000000000, it is 1e16, which is not recognized/proper split by the regex inside the if statement on line 83 of spell.pm . `if($data=~/(\-?)\s*(\d+)/){` [download] With a little more tinkering/experimentation with the demo script I have determined that such a large number will work with the module if you send it as a quoted string. `my $string=spell_number('1000000000000000'); #this works my $string=spell_number(1000000000000000); #this doesNOT` [download] The ultimate solution would be to fix the regex on line 83 in spell.pm to deal with the exponential format. Unfortunately a regex wizard I am not. So.... I would be inclined to leave that in the hands of the author or someone who knows more of regexes than I do. :)
Time::Piece::MySQL 1 direct reply — Read more / Contribute	by jeffa on Jan 04, 2004 at 14:02

Time::Piece::MySQL is a very useful module for MySQL users. It is simply an extension to Time::Piece that provides a handful of methods for converting back and forth between Time::Piece objects and the MySQL date/time types: date, time, datetime, and timestamp. (The year type is available from Time::Piece, so it doesn't need to be here.) As an example, say i had a table of events that contained an id and a datetime field: +---------+------------------+ \| Field \| Type \| +---------+------------------+ \| id \| int(10) unsigned \| \| date \| datetime \| +---------+------------------+ and i wanted to add 50 days to to each date. The following snippet would do just that: `use strict; use warnings; use DBI; use Time::Seconds; use Time::Piece::MySQL; my $dbh = DBI->connect( ... ); my $sth = $dbh->prepare('update events set date = ? where id = ?'); my $dates = $dbh->selectall_arrayref( 'select id,date from events', {Slice => {}} ); for (@$dates) { my $date = localtime->from_mysql_datetime( $_->{date} ); $date += ONE_DAY * 50; $sth->execute( $date->mysql_datetime, $_->{id} ); }` [download] A very trivial example, but i think it demonstrates how it can make someone's Perl/MySQL script easier to work with.
desift 1 direct reply — Read more / Contribute	by princepawn on Nov 27, 2003 at 19:07

Overview Read in the data Filtering Transform the data Outputting the data Assessment desift's limitations desift and the Future splitting (a common form of input transformation): filtering formatting Obtaining desift Overview `desift` is a Perl program for data munging. What is data munging? According to Dave Cross, author of ``Data Munging in Perl'', data munging has 3 phases: Read in the data Transform the data Output the data Since `desift` is a data munging program, we can describe it via this framework. Read in the data `desift` reads in your data from `STDIN` or from files specified on the command-line after the option switches. It `split`s your data into a Perl array for you. You control the split via the `-d` switch: -d REGEX Field delimiter in input-file(s). Default is "\t" (tab) Filtering A common phase of reading in data is filtering out what you don't want. To specify input lines that you want to skip, supply the `-s` option to `desift`: -s REGEX Skip rows in input-file(s) matching REGEX. Transform the data In `desift`, the input data is transformed via a template string which may be supplied on the command-line or in a file. There are two elements to the template string: plain text and positional tags indexing into the array built from splitting your input data. If your template string is in a file, use the `-t` option. If you want to supply the template string on the command-line, then use the `-T` option. Here is a sample `desift` command using flags we have seen so far: ls -l \| desift -d"\s+" -T"File: %9 Permissions: %1" -s"^total" It's not a completely perfect example, because filenames with spaces will only have the part listed before the space in the filename. Here is some sample output: File: chessgames-dotcom Permissions: -rwxr-xr-x File: desift Permissions: -rwxrwxrwx File: desift.pod Permissions: -rw-r--r-- File: gerbold Permissions: drwxr-xr-x+ File: upload-cpan.pl Permissions: -rwxr-xr-x File: xemacs.bat Permissions: -rwxr-xr-x We can learn some things from looking at this output. First of all, the word `Permissions` does not always start at the same column. My first attempt to fix this was to put a tab in the template string. To do so, you must manually put a tab in the string: `\t` or `\\t` or changing the string from single to double-quote does not work. Even so, the output is still not lined up: File: chessgames-dotcom Permissions: -rwxr-xr-x File: desift Permissions: -rwxrwxrwx File: desift.pod Permissions: -rw-r--r-- File: gerbold Permissions: drwxr-xr-x+ File: upload-cpan.pl Permissions: -rwxr-xr-x And this makes output hard to read. I envision two possble solutions to this problem. One possible fix is to have a template flag which takes a numeric argument indicating at which column the output should be written. Another fix is more time and compute-consuming. Sift could read in all the lines and then output them with just enough space for the columns to line up... sort of like a database does when you SELECT data. Outputting the data In a sense, transformation and output are one step in `desift`. Once a line of data is transformed, it is then output. Assessment `desift` is a cleanly written module which simplifies and abstracts the `split`, `array-slice`, `print-and-join` cycle of programming leading to one succinct command-line instead of a series of function calls. In looking back at a recent project of mine, I find `desift` to be inadequate for what I had to do. I had a CSV-file with name, email, phone, etc. I had to filter this file for profanity and invalid email addresses and then output the new file in a tab separated format for import into a database. desift's limitations First of all, the input phase. Parsing CSV is not easy. I could not pass `desift` a regexp to do such a split and field-massage properly. Also, some of the data was in Unicode format and only Text::CSV_XS with its `Binary` option was robust enough for this task. Also I was dealing with 4 files of 25 million lines each, so using a c-based module such as `CSV_XS` was desirable for speed reasons as well. For the transformation phase of this project, `desift` was adequate. However, what if I wanted apply the Perl `lc` function to a field instead of just writing it? That is a very likely operation and impossible with `desift`. Now, what if the template for desift were passed off to `sprintf` instead of its own custom sprintf-like formatter? And what if access to the split array were via a localized variable such `@_split`? Then we could do `lc` if we wished: <code> ls -l \| desift -s``^total'' -d``\s+'' \ --sprintf-string=``File: %s Permissions: %s'' --sprintf-args=``lc($_split8), $_split[0]'' </code> Also why is there not possibility of filtering data after it is split? Perhaps we can only determine if data should be transformed or output after a test of some sort. A test such as adding two fields together, doing a SQL SELECT on a database or grepping a file, or seeing if a certain or all fields were in the line. Thus we can conclude that filtering callbacks should be available at each stage as executable Perl subroutines and not limited to regular expressions at any stage. As mentioned earlier, the output phase is wed to the transformation phase. Thus it is up to you use I/O redirection to capture `desift` output. For example, it is not (and perhaps should not be?) possible to commit `desift` results directly to a database on a line-by-line basis. But in my experience, c-based SQL loaders supplied with databases are 8-fold faster than isolated inserts via Perl/DBI. So, complete of control over channeling output may or may not be a Bad Thing. desift and the Future There are a number of splitting and filtering and formatting modules available on CPAN: splitting (a common form of input transformation): Parse::FixedLength, Parse::FixedDelimiter, Text::xSV, Text::CSV_XS, DBD::AnyData, Spreadsheet::ParseExcel. filtering Core Perl provide `grep` which is extremely adequate in a large number of cases. Common CPAN modules for doing so are Regexp::Common::profanity_us and Email::Valid. formatting Core perl provides HERE documents, sprintf. And then on CPAN, there are very few options for formatting a data model (hah!). Need I list any? Template, Data::Table, Data::Presenter, HTML::Template, Text::MagicTemplate Obtaining desift `desift` is available at http://desift.sourceforge.net It was written by James Shimada.
Acme::Apache::Werewolf 5 direct replies — Read more / Contribute	by rob_au on Nov 17, 2003 at 05:45

Whilst I read through the list of newly uploaded modules onto CPAN on a daily basis, it is rare that I see new modules which: Employ modules from the `Astro::` namespace, Employ Apache access handlers in an interesting manner, Protect your web directories from werewolves, and, Inspire me to write module reviews. Yet in my daily sojourn through the CPAN Uploads today, I found a module which incorporated all of the above - Acme::Apache::Werewolf. This module implements an Apache access handler which can be used to deny access to web directories based upon the phase of the moon, or more specifically, during the full moon, thereby protecting your web directories from maraudering werewolves. Using this module, protecting files from werewolves is relatively straight-forward: `<Directory /fullmoon> PerlAccessHandler Acme::Apache::Werewolf PerlSetVar MoonLength 4 </directory>` [download] The only configurable parameter associated with this module is `MoonLength` which determines the length in days over which the moon is considered to be in full. In the above configuration, the full moon is 4 days, which would be from day 12 through day 16 of the lunar cycle. And in the words of the module author, it is wise to err on the side of caution and make this too large, rather than too small and risk the wrath of werewolves. Now if only I could similarly protect my web directories from other supernatural beings ...
Acme::Comment 2 direct replies — Read more / Contribute	by PERLscienceman on Nov 05, 2003 at 22:07

CPAN LINK: Acme::Comment CPAN ABSTRACT: This module allows for multi-line comments which are filtered out. Unlike the pseudo multi-line comment if (0) {}, the code being commented. Introduction: Roaming about the monastery I came upon the following node: Block Commenting, a fellow monk essentially searching for advise on the possibilty of multi-line commenting. A few replies down mentioned in passing the module Acme::Comment, it's claim was to allow multi-line comments which are filtered out. I became curious so I downloaded the module and checked it out for myself. Functionality: Acme::Comment, in a nutshell, allows for multi-line and single line commenting in several different language styles. Some of the multi-line comment language styles are: C++, HTML, Pascal and Java to name a few. In total, both single and multi line together there were a total of 43 different programming languages represented. I tested it out in "HTML" mode using multi-line comments with ActivePerl 5.8.0 on WinXP, and it proved to be quite easy to use. The distribution itself contains fairly straight forward documentation. Generic Example: `#!/usr/bin/perl -w use strict; use Acme::Comment type=>'HTML'; <!-- Multi-line comments here. Everything enclosed in the html style comment braces is ignored. --> my $a=1; my $b=2; my $c=$a + $b; print "$a + $b = $c\n";` [download] Final Thoughts: For implementation of multi-line comments in various language formats I found this module to be quite useful. The only drawback that I can immediately see is that this module is not yet widely known. I think multi-line commenting (a single format of) would be great implemented as a standard feature in a future version of Perl; perhaps in Perl6? (it can't hurt to hope)
Acme::DNS::Correct 3 direct replies — Read more / Contribute	by antirice on Sep 22, 2003 at 12:59

If you've been following the news as of late, VeriSign has decided to resolve all non-existent domain names to a service they set up called Site Finder in a scheme to take advantage of their monopoly as controllers of the .com and .net TLDs. This is particularly annoying for individuals who enjoy checking the validity of links (I send a header that is the same as what IE 6.0 would send). As is shown by the following link, if the domain name doesn't exist (expired or was entered by someone who just wanted to dump trash into your database) then it will return a valid page (after a 302 moved header): http://www.lsadjflj.com/alksdjf/aldhgjh. Enter Acme::DNS::Correct to correct this problem. It is designed as a drop-in replacement for Net::DNS::Resolver. If the ip for the Site Finder site is detected, the response will be cleansed of the offending ip. The only bug in this module is in the case where you actually wish to resolve sitefinder-idn.verisign.com.
Tree::DAG_Node 2 direct replies — Read more / Contribute	by bm on Sep 02, 2003 at 11:13

I work in release management, and am constantly dealing with tree's of many different types, such as projects, file system, releases, versions of a file, class diagrams,etc. I have grown used to managing these through a variety of different ways, such as hashes, or rolling my own "tree structure" by adding parent/child or predecessor/successor properties and methods to my classes. But Perl being Perl, well, CPAN being CPAN, you can always take advantage of other's experience. A few searches later, I found Tree::DAG_Node. This class represents tree structures in OO Perl. Specifically, this class manages the relationships between a set of "nodes". There is only one type of relationship you can create: the mother node and the daughter node. The daughter list of a node is ordered, but this is of course ignoreable. While a node can contain whatever data you would like it to (through the 'attributes' property), not every relationship can be created - for example, a node may only have one mother. The author, prolific CPAN contributor Sean M. Burke, encourages inheriting off Tree::DAG_Node. This exposes a seriously large number of tree related methods to your class, such as: $node->daughters or $node->mother $node->add_daughter or $node->add_daughter_left $node->attributes->{'version'} = '1.1.3.4' (defines the attribute 'version' in the node) $node->ancestors (returns a list of nodes) $node->walk_down ({ callback => \&foo, callbackback => \&foo, ... }) (a depth first traversal of the tree, executes the passed callback) various list of list to tree conversion methods $node->draw_ascii_tree (ASCII pretty print of the tree) And that is just the summary! To quote the doco In fact, I'd be very surprised if any one user ever had use for more that even a third of the methods in this class. And remember: an atomic sledgehammer will kill that fly. , this is a very large class. Perhaps too over the top for some solutions (hence the atomic sledgehammer analogy!). Autoloader is not implemented, so some might find this a little slow for their needs, and there lies the biggest problem with this class. The other thing I don't get is where he gets the DAG in DAG_Node from! In summary though, and excellent class that provides a vast array of sophistication to a tree structure. Adding: `use Tree::DAG_Node; @ISA = qw(Tree::DAG_Node);` [download] to the top of your class can open many doors that would not of otherwise existed (just look at the walk_down method alone). I highly recommend this class for implementing tree structures.
Semi::Semicolons 2 direct replies — Read more / Contribute	by ailie on Aug 31, 2003 at 18:44

Authors: David H. Adler and Michael G. Schwern, from an idea by Adam Turoff Version: 0.03 Description: The Semi::Semicolons module allows you to use 'Peterbilt' rather than a semicolon as your statement terminator. `use Semi::Semicolons; print "Why on earth would anyone use this?"Peterbilt` [download] You can also customize your statement terminator. `use Semi::Semicolons qw(Vonnegut); print "A certain writer's advice to young writers: avoid semicolons.\n +"Vonnegut` [download] (Of course, using 'Vonnegut' rather than the name of an actual semi may be considered, by some, to detract from the humor of the module's name.) Why should you use it? You probably shouldn't, unless you're easily amused (like me). Why should you not use it? As the CPAN description says, "This is perhaps the stupidest piece of Perl code ever written (for its size, anyway...)" Verdict: Two thumbs up. Way up!

Domain Nodelet^?

www.com | www.net | www.org

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others contemplating the Monastery: (4)

As of 2024-04-19 16:53 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found


Don't ask to ask, just ask
	PerlMonks

review reviews

PAR for the course

Peeping under the hood

Run time requires

Parlez vous?

tkpp

Enter Params::Validate

Error handling

Conclusion

Update

Module Author: Jeff Zucker <jeff@vpservices.com> Documentation

Abstract

Pre-Requisites

Overview

Review

Caveats

Summary

Review of Proc::Background

The Problem

The Solution

Example Code (borrowed from CPAN)

When To Use It

When Not To Use It

The Final Word

Module Author: Jeff Zucker <jeff@vpservices.com>
Documentation