http://www.perlmonks.org?node_id=369420

CassJ has asked for the wisdom of the Perl Monks concerning the following question:

Afternoon all,

Sorry if this is a bit long-winded. In over my head, I think...

I'm trying to write some code to parse some cgi-uploaded data. At the moment, data is uploaded and unpacked, if required, into a tmp directory and then parsed. The problem is this data could be of various (and ever increasing) types so I want to make it easy to install new parsers.

The GUI is generated using XML and XSLT and it generates a seperate tab for each possible type of data (by some magic that I didn't write and don't understand!). So, when adding a new parser it's just a case of adding a section to the XML, which is easy enough to automate. The cgi-query then contains a param 'upload_data_type' whose value depends on the tab used for upload.

So, I had a ParserFactory which would just take the value of 'upload_data_type' and try to load a class something like:

sub get_a_parser{ my $type = shift; my $parser = "EP::Parser::$type"; eval "require $parser" or die "can't find a parser. Are you sure it's +installed?"; return new $parser; }

The idea being that now all you would need to do to install a new parser is stick the appropriate .pm file in EP/Parser/ and add a section to the GUI XML which sets the 'upload_data_type' param to the name of the .pm file and it'll work.

A colleague says that the eval 'require $parser' stuff makes him uncomfortable cos you only get an error at runtime, but I can't see that it makes much difference as the main script isn't run until the user submits the page so it's not as though we could warn them of the problem earlier even if we caught it at compile time. I guess there could be security issues with loading a module based on a name from a cgi query, but I could strip it of dodgy characters first.

So: is this a bad plan? Is there a better way of doing it?

Thanks for any suggestions,

Cassxx

update:

Thanks for all your help! Cxx

Replies are listed 'Best First'.
Re: Is dynamic loading of pm's a bad thing?
by broquaint (Abbot) on Jun 25, 2004 at 02:44 UTC
    With a little help from you friendly neighbour CPAN modules this is a pretty standard plugin loader
    use Carp 'croak'; use Module::Locate 'locate'; sub get_a_parser { my $mod = "EP::Parser::$_[0]"; require locate $mod or croak "Couldn't locate module '$mod'"; return $mod->new( @_[ 1 .. $#_ ] ); }
    See. Module::Locate and Carp for more info.
    HTH

    _________
    broquaint

      If anyone's still interested, I did it like this in the end...
      sub create_data_parser{ my $class = shift; my %params = @_; my $parser = "EP::Core::Data::Parser::".$params{data_type}; require (locate $parser) ? return $parser->new(%params) : die "Cou +ldn't load parser"; }

      Cxx

      Hi,

      Thanks for your suggestion. I had a look at Module::Locate, but I'm not sure I quite understand how it works.

      Does locate $mod do what was suggested in one of the above posts and change a blah::blah string into a blah/blah.pm string then see if that pm exists? If so, does this mean that I don't have to worry about the fact that $mod is user defined cos unless the path actually corresponds to a pm file then locate'll fail?

      Thanks again,

      Cxx

        What it does is searches for a given module in the library path i.e it will find the path to the blah::blah module, or return false. Under the covers it will munge blah::blah into the appropriate format. And yes, you don't have to worry that $mod is user defined, if it exists in the path, you will get that path returned, otherwise, false.
        HTH

        _________
        broquaint

Re: Is dynamic loading of pm's a bad thing?
by matija (Priest) on Jun 24, 2004 at 17:24 UTC
    Don't "strip dodgy characters" - you can never be sure that you got them all.

    Your best bet would be to read the directory where the plugins are stored, and make a list of available plugins. Then compare the elements of that list with the parameter you got. That way, the user input only gets processed by the string comparison operator, which is much harder to fool than the directory listing functions.

Re: Is dynamic loading of pm's a bad thing?
by Zaxo (Archbishop) on Jun 24, 2004 at 20:12 UTC

    A further comment along the lines of matija's suggestion. You have a small finite set of valid EP::Parser:: $types. You can validate the user input by registering all the valid parsers to a hash keyed by the .pm files' names. If the client-supplied $type is a key, the input is validated and may be untainted.

    use constant PARSER => { Foo => undef, Bar => undef, Baz => undef }; # . . . $type = ( exists PARSER->{$type} and $type =~ /^(.*)$/ ) ? $1 : '';

    After Compline,
    Zaxo

Re: Is dynamic loading of pm's a bad thing?
by lachoy (Parson) on Jun 24, 2004 at 17:57 UTC

    Here's a pattern I've used many timess successfully in the past:

    • Force all parsers to use a specified namespace (e.g., EP::Parser::Soemthing)
    • In your main class (e.g., EP::Parser) have a routine that runs at 'use' or 'require' time that use something like Module::Find to find all modules in the specified namespace.
    • Iterate through the modules and 'require' each of them

    Since this runs whenever the module is loaded it will complain at startup if one of the implementations is incorrect.

    For another layer of abstraction you can combine the above with use of Class::Factory as the parent class and associating each of the implementations with a simple name rather than a class name.

    Chris
    M-x auto-bs-mode

Re: Is dynamic loading of pm's a bad thing?
by eserte (Deacon) on Jun 24, 2004 at 18:19 UTC
    You could also take a look at UNIVERSAL::require which would let you to write $parser->require instead of the eval, but this is actually a cosmetic change.
Re: Is dynamic loading of pm's a bad thing?
by Roy Johnson (Monsignor) on Jun 24, 2004 at 20:49 UTC
    According to the documentation, the difference between require $parser and, something like require Module::Parser is that in the latter case, it replaces '::' with '/' and tacks a '.pm' on the end. Seems like you could do that yourself, and avoid the eval:
    $parser =~ s#::#/#g; $parser .= '.pm' unless $parser =~ /\.pm$/; require $parser;
    This answer is too easy to be right. But I'd appreciate it if someone would explain why it's wrong.

    We're not really tightening our belts, it just feels that way because we're getting fatter.
      A more portable solution would be $parser = File::Spec->catfile(split /::/, $parser) . ".pm";

      Why bother doing this at all? All you've actually done is written more lines of code and made it less portable than it was before.

        All you've actually done is written more lines of code and made it less portable than it was before.
        What was it before? How is it less portable now?

        Consider the problem: dynamically loading pm's. That means that the module to be loaded might just be input by a user. You can't just use a hardwired name. The only other solution offered was eval, which comes with security issues.


        We're not really tightening our belts, it just feels that way because we're getting fatter.