legLess has asked for the wisdom of the Perl Monks concerning the following question:
Monks ~
I'm building a module, one of whose jobs is parsing input. It's been pointed out to me that input parsing has been done before :) and I'm looking for a way to be more constructively lazy about it. At the same time, I want to avoid a disease that has plagued me lately: multiple nested dependencies.
Perl folks I know here in Portland dot the spectrum from "Never use dependencies other than what comes with the default install," to "Who cares if it's a 3-year-old .01?" My own position on that scale varies depending on how obscure the dependency is, how much work it saves me, how much value it adds, and other such things. I've seen other discussions of this issue here, so feel free to skip that part of my question if you're bored with it.
Right now the module depends on CGI and nothing else; the module I'm thinking of using is Damian's (now Abigail's) Regexp::Common.
So my questions:
- How common is Regexp::Common? Perlmonk's own host doesn't have it installed by default; Red Hat and Debian don't have packages for it. I don't know about other sites. (see NOTE)
- I have an urge to roll this myself: am I in a state of sin?
Here's a simplified snip of the code I have now:
sub valex {
my $self = shift;
$_ = shift;
/^integer$/ and return qr/\d+/;
/^word$/ and return qr/\w+/;
# etc.
return undef;
}
Using Regexp::Common would replace the qr//s above. The benefits are a richer and more robust set of regexen: Damian and Abigail have wicked Perl-fu. The drawbacks are added complexity and a dependency of perhaps questionable value (for this application: notice I have no heartache using CGI :).
(NOTE) I don't care how easy or difficult it is for me to install dependencies, but one of the design goals is to avoid interface complexity in common situations. I very much want users of the module to be able to specify plain words as validators: 'integer', 'email', etc. So I'd be wrapping the common regexen with the valex sub anyway, and only using Regexp::Common internally. I have confidence (and so far my test suite agrees with me) that I can write or find these myself.
Thanks for your time and replies.
Re: Dependencies, or, How Common is Regexp::Common?
by Abigail-II (Bishop) on Sep 17, 2003 at 13:06 UTC
|
"Never use dependencies other than what comes with the default install,"
The drawbacks are added complexity and a dependency of perhaps
questionable value (for this application: notice I have no heartache
using CGI :).
I think such attitudes totally miss the point of Open Source in
general, and CPAN in particular. What is the point of sharing
software, if people balk at the slightess inconvenience and don't
want to use what's available. Does everything have to be delivered
to your doorstep?
It seems that certain people would prefer that Perl comes with
everything that's available on CPAN - and then some. Beside that
that would make it impossible to ever release a version of Perl
again (just look at how hard it is to release 5.8.1, which is
partially due to the bloat, and wanting to service everyone).
It's far better for packages to live on CPAN. Then at least there
is the potential that they will be update soon after a bug is revealed.
Suppose Regexp::Common came with 5.8.0, and it had a bug. The earliest
release that would fix the bug will be 5.8.1, which, if it came out
today, would be 14 months after 5.8.0. And if you have a hard time to
convince people to install a module, think how hard it's going to be
to convince them to install a new version of Perl! And if there would
be a bug in Regexp::Common released with 5.8.1, do you have any idea
how long you have to wait for a new release? The track to 5.10 was started
in July 2002. It's now September 2003, and there isn't even any sign
of a 5.9.0. You might have to wait *years* for a bugfix.
Having said that, Regexp::Common is easy to install. It's a pure Perl
module, and I don't have intention to ever turn it into something
that isn't pure Perl. All you need to do is (recursively) copy the files
in the 'lib' directory of the distribution. How hard can that be?
But even if you don't want to install Regexp::Common, there is always
the option to copy the code. Of course, your own license may prevent
that, and you do have to do more work in case the code you copied gets
upgraded, but the license of Regexp::Common allows you to go this way.
Abigail
| [reply] |
|
I wonder about Regexp::Common. I often have tasks that a regular-expression related and then I look at what the module offers and usually it doesn't have what I need.
By "what I Need" I mean 2 things. (a) it lacks a certain common regular expression (b) it lacks a certain tasks related to regular expressions
By (a), what I mean is sometimes a regular expression is common, but not in that distro. For example, I was told to write something to make sure an address was valid. So, I simply made sure that the string had a number and a letter in it... and it did get a little bit of filtering done. Is there a better solution? Aren't many people having to validate addresses? How are you doing it? Also, I am not sure how open Abigail-II is to new additions to the module and I am not sure if I should use rt.cpan.org or email him. He is certainly very present here, so I could msg him.
But also by (a) what I mean is that Abigail and Damian are both non-American, and so their profanity regular expressions were way off the mark. I had never even heard of some of the terms they thought were bad and others are completely normal in American context (e.g, "bl**dy").
So I coded Regexp::US::Profanity to do filtering with
Regarding (b), the regexp to count the number of a certain character in a string is very simple, and the task to count is also simple, but neither was readily available in the distro. And again, I was afraid to contact the author about it, so I just whipped up some lines of code to do it
Carter's compass: I know I'm on the right track when by deleting something, I'm adding functionality.
| [reply] |
|
For example, I was told to write something to make sure an address was
valid. So, I simply made sure that the string had a number and a letter
in it... and it did get a little bit of filtering done. Is there a better
solution? Aren't many people having to validate addresses? How are you
doing it?
Personally, I don't think such a thing belongs in Regexp::Common, because
there are no clear rules on what is a valid address. You could make some
heuristics, but they will give many false positives, and false negatives.
And the heuristics will differ from country to country.
Also, I am not sure how open Abigail-II is to new additions to the module
The PODs have always suggested there are not enough regexes and has
asked for people to send them. In the year and a half that I'm taken
care of this module, I haven't had enough regexes send in to need a
second hand to count them.
As for contacting me, email is preferred (regexp-common@abigail.nl).
I don't do the chatterbox, so don't waste your time messaging me.
As for the profanity regex, that's entirely Damians work, including the
nifty encoding. Had it not been there when I started maintaining it, I
would not have added. The problem I have with it, is that it's so subjective.
Who am I to decide what's profanity, and what isn't? You can never be
complete on this one, and where do you stop?
Regarding (b), the regexp to count the number of a certain character
in a string is very simple, and the task to count is also simple, but
neither was readily available in the distro.
The regexp is simple? You'd have to write something like (assuming you
want to count the occurrance of the character c:
/^(?{$count = 0})[^c]*(?:c(?{$count ++})[^c]*)*/
which I don't think is simple. I wouldn't use a regex for that, I'd use
tr/c/c/
and if you want to count the number of non-overlapping matches of a
pattern, I'd use:
$count = () = /$pat/g;
To catch that inside a single regex is really awkward. Remember that
Regexp::Common gives you patterns, that can be interpolated in a regexp.
For instance, if you want to count the number of HTTP URIs in a string,
Regexp::Common doesn't give you a function to that directly, but it
does do the hard work for you, it gives you the pattern:
$count = () = $str =~ /$RE{URI}{HTTP}/;
Patches are more than welcome, or even suggestions what to include.
The next version of Regexp::Common is planned to be released shortly
after 5.8.1 comes out. The major addition will be ISBN numbers, checking
against the latest country/publisher lists.
Abigail
| [reply] [d/l] [select] |
Re: Dependencies, or, How Common is Regexp::Common?
by Zaxo (Archbishop) on Sep 17, 2003 at 03:50 UTC
|
I'll vote with requiring it. It is easy enough to install on any platform supporting CPAN. If you package CPAN-style with ExtUtils::MakeMaker, you can provide a make rule to go get the distribution, or else say,
PREREQ_PM => { Regexp::Common => '2.00' },
in the attrubute list of WriteMakefile() in Makefile.PL.
If you don't expect your users to have system privileges, you can make the installation go into some private library and pepper the code with use lib '/my/private/lib';.
There are, of course, many admins who will refuse all modules beyond core, and even delete as many core ones as they can. There are others who will not install anything their vendor's package mechanism doesn't provide. Them aside, well-stocked perl installation will have a good chance of having it.
After Compline, Zaxo
| [reply] [d/l] [select] |
Re: Dependencies, or, How Common is Regexp::Common?
by bart (Canon) on Sep 17, 2003 at 03:57 UTC
|
How reliable do you want it to be? Because right now, in this simple case, you have some major errors. For example, with this snippet:
/^integer$/ and return qr/\d+/;
your routine would validate "foo123abc" as a valid value for an integer.
There's more to it than just being too lazy to install a module, you know. On the other side of the spectrum, there's the laziness of being pretty sure your module will do what you want it to do. A lot of work has been going into construction of these modules. At least, borrow some of that work, copying some of the code into your scripts, instead of reinventing a likely majorly flawed wheel yourself.
You may think that you have just made a minor error, and that this won't happen to you again. Think twice. These kinds of errors are all too common. | [reply] [d/l] |
|
sub is_integer {
return 0 unless $_[0];
return $_[0] =~ m/^\d$/ ? 1 : 0;
}
In development this routine was never required to deal with a TWO digit integer as all the developers used accounts with a <10 client ID number. Oh and the Test code.....the guy that wrote it tested all these args: undef,'', 'I am not an integer 42!', 0,1,2,3,4,5,6,7,8,9. Why ten tests for single digit integers and no tests for 16, 256,65535 GOK. Just goes to show that the volume of test is not the most important thing. Testing all the possible cases is. Even most of the probable cases would have been fine in this case.
So you can guess what happens. During a live demo client 10 gets created, but client ID 10 is not an integer according to the sub. End of that demo. Much egg on developer and manager faces. And the bug (besides the inadequate test suite) a single missing +
cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
| [reply] [d/l] |
|
# $value set to incoming parameter earlier
# $param object initialized earlier
my $valex = $param->valex;
$value =~ /($valex)/;
$param->errors( "Parameter '" . $param->name
. "' contained invalid data." )
unless(( defined $1 ) and ( $value eq $1 ));
$param->value( $1 );
Just for the heck of it I added another test, passing a parameter called 'bart' with a value of 'foo123abc'. I defined 'bart' as an integer and watched this test pass:
is( $valop->value( 'bart' ), 123, 'foo123abc validated correctly' );
Thanks for the reply, though.
UPDATE: I should note that one of the simplifications I made (not expecting the Spanish Inquisition, as they say) to the code in my original post was changing this line in the module:
$_ = shift || $self->validator;
to:
$_ = shift;
In the module, &valex serves a dual purpose which I didn't think pertinent to the question I was asking. | [reply] [d/l] [select] |
A point about the regexes that Regexp::Common doesn't supply
by TheDamian (Vicar) on Sep 17, 2003 at 18:51 UTC
|
Regexp::Common was originally conceived as a framework; one that would allow the Perl community to share and reuse commonly needed (and frequently poorly implemented) regular expressions.
And, indeed, it was been successful in that sense. My original version of the module had very few regexes. Others (principally Abigail-II) have contributed most of the current set it offers.
But because Regexp::Common is contribution-driven, it's entirely possible the module doesn't have the regex you need. Or that you could improve on one of the regexes it does offer (e.g. $RE{profanity}).
In that case, I would strongly encourage you not just to roll your own, but to integrate it with Regexp::Common (I tried to make that trivial to do). Then send Abigail-II the code, so that everyone can benefit.
| [reply] [d/l] |
Re: Dependencies, or, How Common is Regexp::Common?
by DrHyde (Prior) on Sep 17, 2003 at 15:09 UTC
|
I think it's reasonable to depend on other modules - no matter how unusual - without bundling them with your code. It's not as if it's hard to grab modules and their dependencies using the CPAN module. And if the machine you want it on has no direct access to the outside world it's still not hard. I install modules on just such a machine quite often.
In fact, I would go so far as to say that, in almost all cases, including some random module in your tarball is a BAD thing. Try searching CPAN for Test::More. It appears in several packages. It's at different versions in those packages too. Someone who's bundled it with their code may very well have bundled a buggy version. Better in my opinion to put it in the list of prerequisites for your module so that the CPAN or CPANPLUS module can fetch it for you. | [reply] |
Re: Dependencies, or, How Common is Regexp::Common?
by Anonymous Monk on Sep 17, 2003 at 03:48 UTC
|
Not very common I'd say. Besides, its Perl comment matcher is so
simplistic it simply matches *any* # character to
a newline.
| [reply] [d/l] |
|
| [reply] |
|
print " # This is not a comment\n";
Or that insane Acme::Comment module, but people who use that in a real program deserve what they get :)
---- I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
-- Schemer
Note: All code is untested, unless otherwise stated
| [reply] [d/l] |
|
A reply falls below the community's threshold of quality. You may see it by logging in.
|
|
|