Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

monks,

It is quite often I find myself with the need to keep chunks of text along with a module or script - not enough to warrant extra files, but enough to make using quotes and heredocs quite ugly. After reinventing my favourite simple solution (again), I found myself wanting to encapsulate the functionality in a module - but being put off by the simplicity of the implementation. I realise modules dont have to be magic, but still...

Cut to the chase - I wrote the module. I have included the start of some POD covering the general idea. Its all a bit rough'n'ready, but I would appreciate any opinions at this stage (or just tales of weird uses of __DATA__ you like to employ ;-).
I have not come across a similar solution on CPAN - but I am not really sure how such a module would be categorized (Tie::*, Text::* and Data::*)?.

My meditations is this:

  • Is there already a CPAN solution to this?
  • Would the module below be something you would consider useful? If not, why not?
  • Any ideas for a half-decent name?
  • Ive never written a module like this before. What is the best way to make modules that work around import() flexible/extensible?

cheers

edit: updated POD to match code posted later


NAME

Tie::DATA - access named data segments in __DATA__ handle via the package variable %DATA

SYNOPSIS

use Tie::DATA [[sub|scalar|regex], [sub|scalar]];

Simple Usage:

use Tie::DATA; foreach(keys %DATA) { print "$_ = $DATA{$_}\n"; } __DATA__ __foo__ yadda yadda yadda... __bar__ ee-aye ee-aye oh __baz__ woof woof

Intermediate Usage:

use Tie::DATA(':xml'); # predefined format foreach(keys %DATA) { print "$_ = $DATA{$_}\n"; } __DATA__ <foo>yadda yadda yadda...</foo> <bar> ee-aye ee-aye oh </bar> <baz> woof woof </baz>

Custom Usage:

use Tie::DATA ( sub{ ... }, # parse key/values from DATA sub{ ... } # process pairs ); ... __DATA__ ...

DESCRIPTION

Tie::DATA provides a means to break a module or scripts' __DATA__ handle into named segments, accessible via the read-only package variable %DATA. Tie::DATA is not intended for configuration variables, but for medium-sized bodies of text that should be kept with the code (without being embedded in variable declarations).

%DATA's entries are created lazily; that is, when it is first used.

There are two stages to execution, both of which can be customized by arguments to use Tie::DATA

parsing

By default, Tie::DATA uses similar syntax as the __DATA__ token to seperate segments. Of course, what is a suitable seperator depends on the text being stored, so several likely defaults are provided:
:ini
[foo] baz bar etc
:xml
<foo>baz bar etc</foo>
:define
#define foo baz bar etc
:cdata
<![foo[ baz bar etc ]]>

It is important to remember that by default, segments cannot be nested - in particular, :xml cannot have attributes.

Full customization of parsing can be gained by passing either a regex or sub reference as the first argument:

use Tie::DATA qr(<<<<<<<<(\w*?)>>>>>>>>); use Tie::DATA sub{split(/\s*:SEGMENT\s+(\w+)\s*/, shift);} use Some::Mad::Parser; use Tie::DATA \&Some::Mad::Parser::parse;
The subroutine reference should return a list of key value pairs.

processing

After parsing, if a callback has been registered as the second argument, then each Key-Value pair is passed to the callback function for further processing. This function is expected to return the actual Key-Value pair that will be used in %DATA.

For example, if you wanted to control how whitespace was treated for each segment individually, you might use something like:

use Tie::DATA(':ini', 'proc_kv'); foreach(keys %DATA) { print "$_ = $DATA{$_}\n"; } # our processing function, checks for # and removes processing hints in our keys # (see __DATA__) sub proc_kv { my ($k, $v) = @_; if($k =~ /:/) { my ($tag, $hint) = split(/:/, $k); $k = $tag; if($hint eq 'nowhitespace') { $v = ... } else { $v = ... } } return($k,$v); } __DATA__ [foo:nowhitespace] yadda yadda yadda... [bar] ee-aye ee-aye oh [baz] woof woof
There is no reason why the processing subroutine need be in the current module:
use My::Big::Routine; use Tie::DATA(':ini', 'My::Big::Routine::go'); foreach(keys %DATA) { print "$_ = $DATA{$_}\n"; }

CAVEATS

%DATA is read-only. Any attempt to modify it after the processing stage will cause the program to croak.




time was, I could move my arms like a bird and...

In reply to getting more from __DATA__ by Ctrl-z

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (7)
As of 2024-03-28 11:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found