Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

getting more from __DATA__

by Ctrl-z (Friar)
on Mar 24, 2005 at 23:07 UTC ( [id://442223]=perlmeditation: print w/replies, xml ) Need Help??

monks,

It is quite often I find myself with the need to keep chunks of text along with a module or script - not enough to warrant extra files, but enough to make using quotes and heredocs quite ugly. After reinventing my favourite simple solution (again), I found myself wanting to encapsulate the functionality in a module - but being put off by the simplicity of the implementation. I realise modules dont have to be magic, but still...

Cut to the chase - I wrote the module. I have included the start of some POD covering the general idea. Its all a bit rough'n'ready, but I would appreciate any opinions at this stage (or just tales of weird uses of __DATA__ you like to employ ;-).
I have not come across a similar solution on CPAN - but I am not really sure how such a module would be categorized (Tie::*, Text::* and Data::*)?.

My meditations is this:

  • Is there already a CPAN solution to this?
  • Would the module below be something you would consider useful? If not, why not?
  • Any ideas for a half-decent name?
  • Ive never written a module like this before. What is the best way to make modules that work around import() flexible/extensible?

cheers

edit: updated POD to match code posted later


NAME

Tie::DATA - access named data segments in __DATA__ handle via the package variable %DATA

SYNOPSIS

use Tie::DATA [[sub|scalar|regex], [sub|scalar]];

Simple Usage:

use Tie::DATA; foreach(keys %DATA) { print "$_ = $DATA{$_}\n"; } __DATA__ __foo__ yadda yadda yadda... __bar__ ee-aye ee-aye oh __baz__ woof woof

Intermediate Usage:

use Tie::DATA(':xml'); # predefined format foreach(keys %DATA) { print "$_ = $DATA{$_}\n"; } __DATA__ <foo>yadda yadda yadda...</foo> <bar> ee-aye ee-aye oh </bar> <baz> woof woof </baz>

Custom Usage:

use Tie::DATA ( sub{ ... }, # parse key/values from DATA sub{ ... } # process pairs ); ... __DATA__ ...

DESCRIPTION

Tie::DATA provides a means to break a module or scripts' __DATA__ handle into named segments, accessible via the read-only package variable %DATA. Tie::DATA is not intended for configuration variables, but for medium-sized bodies of text that should be kept with the code (without being embedded in variable declarations).

%DATA's entries are created lazily; that is, when it is first used.

There are two stages to execution, both of which can be customized by arguments to use Tie::DATA

parsing

By default, Tie::DATA uses similar syntax as the __DATA__ token to seperate segments. Of course, what is a suitable seperator depends on the text being stored, so several likely defaults are provided:
:ini
[foo] baz bar etc
:xml
<foo>baz bar etc</foo>
:define
#define foo baz bar etc
:cdata
<![foo[ baz bar etc ]]>

It is important to remember that by default, segments cannot be nested - in particular, :xml cannot have attributes.

Full customization of parsing can be gained by passing either a regex or sub reference as the first argument:

use Tie::DATA qr(<<<<<<<<(\w*?)>>>>>>>>); use Tie::DATA sub{split(/\s*:SEGMENT\s+(\w+)\s*/, shift);} use Some::Mad::Parser; use Tie::DATA \&Some::Mad::Parser::parse;
The subroutine reference should return a list of key value pairs.

processing

After parsing, if a callback has been registered as the second argument, then each Key-Value pair is passed to the callback function for further processing. This function is expected to return the actual Key-Value pair that will be used in %DATA.

For example, if you wanted to control how whitespace was treated for each segment individually, you might use something like:

use Tie::DATA(':ini', 'proc_kv'); foreach(keys %DATA) { print "$_ = $DATA{$_}\n"; } # our processing function, checks for # and removes processing hints in our keys # (see __DATA__) sub proc_kv { my ($k, $v) = @_; if($k =~ /:/) { my ($tag, $hint) = split(/:/, $k); $k = $tag; if($hint eq 'nowhitespace') { $v = ... } else { $v = ... } } return($k,$v); } __DATA__ [foo:nowhitespace] yadda yadda yadda... [bar] ee-aye ee-aye oh [baz] woof woof
There is no reason why the processing subroutine need be in the current module:
use My::Big::Routine; use Tie::DATA(':ini', 'My::Big::Routine::go'); foreach(keys %DATA) { print "$_ = $DATA{$_}\n"; }

CAVEATS

%DATA is read-only. Any attempt to modify it after the processing stage will cause the program to croak.




time was, I could move my arms like a bird and...

Replies are listed 'Best First'.
Re: getting more from __DATA__
by merlyn (Sage) on Mar 24, 2005 at 23:27 UTC

      yup, thats pretty like it.

      With no disrespect to unwitting pawns or evil masterminds, I was going for more user-friendly than "It is possible that this module may overwrite the source code in files that use it". ;-)

      Providing full-on writable handles is neat, but a little heavy - at least for my needs.




      time was, I could move my arms like a bird and...
Re: getting more from __DATA__
by jdporter (Paladin) on Mar 25, 2005 at 14:45 UTC
    Well, trying to shove it all in __DATA__ seems like unnecessary work to me. Have you considered a POD-based option? Modules such as Pod::Parser, Pod::Simple, or Pod::POM could be useful.

      I had definitely not considered a POD based option! This is interesting, but I would have thought even more work?
      Ill look into it.

      cheers




      time was, I could move my arms like a bird and...
Re: getting more from __DATA__
by Ctrl-z (Friar) on Mar 25, 2005 at 17:54 UTC

    ok, heres the code as it stands. Some outstanding issues:

    • Would supporting segments as handles be worthwhile?
    • I removed the :subs option and replaced with the probably more useful :cdata. Any other ideas for likely defaults?
    • Currently the code croak()s if any attempt at modifying %DATA is made. Too strict?
    • The regexes are fairly simple-minded, any advice is welcome

    cheers,

    package Tie::DATA; use strict; use warnings; use Carp; my %modules = (); my %regexen = (); my %callbacks = (); my %handles = (); my $reserved = undef; ## # Default handlers for parsing DATA segments # NB: these can be code refs that return a list of key/value pairs, # default is to use as regex in split() call. # my %defaults = ( ':ini' => '(?:\r?\n)*[\[\]](?:\r?\n)*', ':underscore' => '\s*__(\S+)__\s*', ':define' => '\s*#define\s+(\w+)\s+', ':cdata' => sub{$_ = shift or return; return(m#\s*<!\[(\w+?)\[(.*?)\]\]>\s*#sgoi); }, ':xml' => sub{$_ = shift or return; return(m#\s*<(\w+)>([^<]*?)</\1>\s*#sgoi); }, ); $defaults{':default'} = $defaults{':underscore'}; $reserved = join('|', keys %defaults); ## # parse arguments and tie callers %DATA # sub import { my $package = shift; my $regex = shift; my $cback = shift; my $caller = caller; $regex = $defaults{$regex} if($regex && $regex =~ /^$reserved$ +/); $regex = $defaults{':default'} unless $regex; if(!exists $modules{$caller}) { no strict 'refs'; # fix stringy code ref...allow relative or absolute naming if($cback && !ref($cback)) { $cback = ($cback =~ /\:\:/go) ? \&{$cback} : \&{$caller."\::".$cback}; } *{"$caller\::DATA"} = {}; tie %{"$caller\::DATA"}, $package, $caller; $handles{$caller} = \*{$caller."::DATA"}; $modules{$caller} = undef; $regexen{$caller} = (ref $regex) ? $regex : qr($regex); $callbacks{$caller} = $cback; } } ## # read DATA handle # cant do during import as perl hasnt parsed that far by then # sub _read_data { my $self = shift; if(! defined $modules{$$self}) { my (@data, $data, $tell, $rex, $code); $rex = delete $regexen{$$self}; $code = delete $callbacks{$$self}; $data = delete $handles{$$self}; { # slurp and split... no warnings; local $/ = undef; $tell = tell($data); Carp::croak("Error: $$self has no __DATA__ section") if ($tell < 0); @data = (ref($rex) eq "CODE") ? $rex->(<$data>) : split(/$rex/, <$data>); $modules{$$self} = {} and return unless @data; } # remove empty elements...depends on syntax used shift @data if $data[0] =~ /^\s*$/o; pop @data if $data[-1] =~ /^\s*$/o; Carp::croak("Error: \%$$self\::DATA - bad key/value pairs\n") if (@data% 2); # trim keys and invoke any callbacks... for(my $i=0; $i<@data; $i+=2) { $data[$i] =~ s#^\s*(.*?)\s*$#$1#sgoi; next unless $data[$i]; if($code) { ($data[$i], $data[$i+1]) = $code->($data[$i], $data[$i ++1]); } } $modules{$$self} = {@data}; # coerce into hashref seek($data, $tell,0); # cover our tracks } } ## # TIE HASH interface (read-only) # not much to see here... # sub TIEHASH { my $class = shift; my $caller = shift; return bless \$caller, $class; } sub FETCH { my $self = shift; my $key = shift; $self->_read_data if(! defined $modules{$$self}); return $modules{$$self}{$key}; } sub EXISTS { my $self = shift; my $key = shift; $self->_read_data if(! defined $modules{$$self}); return exists $modules{$$self}{$key}; } sub FIRSTKEY { my $self = shift; $self->_read_data if(! defined $modules{$$self}); my $a = keys %{$modules{$$self}}; return each %{$modules{$$self}}; } sub NEXTKEY { my $self = shift; $self->_read_data if(! defined $modules{$$self}); return each %{ $modules{$$self} } } sub DESTROY { my $self = shift; $modules{$$self} = undef; } sub STORE { my $self = shift; my $k = shift; my $v = shift; #$self->_read_data if(! defined $modules{$$self}); Carp::croak("Attempt to store key ($k) in read-only hash %".$$self +."::DATA"); } sub DELETE { my $self = shift; my $k = shift; #$self->_read_data if(! defined $modules{$$self}); Carp::croak("Attempt to delete key ($k) from read-only hash %".$$s +elf."::DATA"); } sub CLEAR { my $self = shift; #$self->_read_data if(! defined $modules{$$self}); Carp::croak("Attempt to clear read-only hash %".$$self."::DATA"); } 1;



    time was, I could move my arms like a bird and...
Re: getting more from __DATA__
by holli (Abbot) on Mar 25, 2005 at 06:26 UTC
    ++, but I'd like to see the source.


    holli, /regexed monk/

      Of course. I wanted to keep the size of the OP down to the essentials for the meditation - but will post later today.

      cheers




      time was, I could move my arms like a bird and...
Re: getting more from __DATA__
by hsmyers (Canon) on Mar 26, 2005 at 18:28 UTC
    I'd suggest a tweak to allow comments in the __DATA__ section.

    --hsm

    "Never try to teach a pig to sing...it wastes your time and it annoys the pig."

      like it. nice one ;-)

      Its kind of tricky as there is no guarantee what the content of each segment is... I suppose there could be a reserved hashkey that automatically gets ignored:

      __rem__ this is the foo part... __foo__ etc etc
      is that what you were thinking?




      time was, I could move my arms like a bird and...
        No, I was just thinking of something along the lines of a 'if first character of the line is semi-colon, then ignore' style of comment. Easy to parse and allows bread crumbs in the forest...

        --hsm

        "Never try to teach a pig to sing...it wastes your time and it annoys the pig."

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://442223]
Approved by kvale
Front-paged by hsmyers
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2025-06-16 22:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.