Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Oh wise and varied monks, I come seeking the existence of yet unknown modules.

I have to clean up some gross, ancient code at $work, and before I try to make a new module I'd love to use an existing one if anyone knows of something appropriate.

At runtime I am parsing a file to determine what processing I need to do on a set of data.

If I were to write a module I would try to do it more generically (non-DBI-specific), but my exact use case is this:

I read a SQL file to determine the query to run against the database. I parse comments at the top and determine that
  • column A needs to have a s/// applied,
  • column B needs to be transformed to look like a date of given format,
  • column C gets a sort of tr///.
  • Additionally things can be chained so that column D might s///, then say if it isn't 1 or 2, set it to 3.

So when fetching from the db the program applies the various (possibly stacked) transformations before returning the data.

Currently the code is a disgustingly large and difficult series of if clauses processing hideously difficult to read or maintain arrays of instructions.

So what I'm imagining is perhaps an object that will parse those lines (and additionally expose a functional interface), stack up the list of processors to apply, then be able to execute it on a passed piece of data.

Optionally there could be a name/category option, so that one object could be used dynamically to stack processors only for the given name/category/column.

A traditionally contrived example:

$obj = $module->new(); $obj->parse("-- greeting:gsub: /hi/hello"); # don't say "hi" $obj->parse("-- numbers:gsub: /\D//"); # digits only $obj->parse("-- numbers:exchange: 1,2,3 one,two,three"); # then spell +out the numbers $obj->parse("-- when:date: %Y-%m-%d 08:00:00"); # format like a date, +force to 8am $obj->stack(action => 'gsub', name => 'when', format => '/1995/1996/') +; # my company does not recognize the year 1995. $cleaned = $obj->apply({greeting => "good morning", numbers => "t2", w +hen => "2010116"});

Each processor (gsub, date, exchange) would be a separate subroutine. Plugins could be defined to add more by name.

$obj->define("chew", \&CookieMonster::chew); $obj->parse("column:chew: 3x"); # chew the column 3 times

So the obvious first question is, does anybody know of a module out there that I could use? About the only thing I was able to find so far is Hash::Transform, but since I would be determining which processing to do dynamically at runtime I would always end up using the "complex" option and I'd still have to build the parser/stacker.

Is anybody aware of any similar modules or even a mildly related module that I might want to utilize/wrap?

If there's nothing generic out there for public consumption (surely mine is not the only one in the darkpan), does anybody have any advice for things to keep in mind or interface suggestions or even other possible uses besides munging the return of data from DBI, Text::CSV, etc?

If I end up writing a new module, does anybody have namespace suggestions? I think something under Data:: is probably appropriate... the word "pluggable" keeps coming to mind because my use case reminds me of PAM, but I really don't have any good ideas...

  • Data::Processor::Pluggable ?
  • Data::Munging::Configurable ?
  • I::Chew::Data ?

In reply to pluggable/dynamic data processing/munging/transforming module? by rwstauner

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2024-04-19 15:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found