Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Unwrapping values in a template

by hacker (Priest)
on Nov 14, 2003 at 13:44 UTC ( [id://307043]=perlquestion: print w/replies, xml ) Need Help??

hacker has asked for the wisdom of the Perl Monks concerning the following question:

I have a hand-rolled template of key=value pairs that I'm parsing out of the body of an email, and checking each value for validity. One of these keys takes a URL as a value. Many mail readers will wrap long urls to the next line, so my original template, which looks like this...
<template> foo = Yes bar = 0 blort = home_url = http://www.foo.com/some/really/long/url </template>

...will end up looking like this, after I receive it in an email:

<template> foo = Yes bar = 0 blort = home_url = http://www.foo.com/some/really/long/url </template>

What I'm trying to do, is "unwrap" that wrapped field before I process the template with Config::General to parse out the keys. Once the keys are properly unwrapped, Config::General has no problem with it. I also looked into Config::General's '-SplitDelimiter' option, but it doesn't consider a newline as whitespace, and my attempts at defining a proper delimeter there failed.

I've been working with a few monks on the CB (thanks diotalevi) to try to work this out, and we've come up with two possibilities, each with a flaw of their own. The problem is that some keys can be left blank, with no value, while others can have values.

This code below, one example of the unwrap code, which properly treats blank keys as it should, but doesn't unwrap the 'home_url' field back up to the previous line:

use Data::Dumper; # This is actually in @body in the larger code open(TMPL, "<my.template") or die $!; my %template = map /(\S+)\s+=\s+(.*)/, split m((?<!\\)\n), ( do { local $/; readline *TMPL; } =~ m(<template>((?s:.*?)</template>)))[0]; $Data::Dumper::Sortkeys = \%template; print Dumper(\%template);

This next bit of code actually unpwraps the 'home_url' field properly, but also unwraps keys with blank values into the value field of the previous key:

use Data::Dumper; open(TMPL, "<my.template") or die $!; my $tmpl; for (<TMPL>) { next if /^#/; if (m!<template>!) { $templ = ""; } elsif (m!</template>!) { my %hash = ($tmpl =~ /^s+(\w+)\s+=\s+(.*)$/mg); if (%hash) { $Data::Dumper::Sortkeys = \%hash; print Dumper(\%hash); } else { print "Bad template: $tmpl\n"; } } elsif (defined $tmpl) { s/^\s+/ /; s/[ \t]$//; s/=\s*\z/= /; $tmpl .= $_; } } close TMPL;

This results in the output that looks like this:

$VAR1 = { 'bar' => '0', 'blort' => 'home_url = http://www.foo.com/some/really/long/ +url', 'foo' => 'Yes' };

Note how the 'home_url' key wraps up into the value side of the previous key.

What I'm trying to do, is keep all keys intact, including the 'home_url' one (which is the only key long enough to wrap, the others don't wrap).

Any helpful pointers here?

Update: I think jeffa is the winner on this one, the working code is now as follows:

my $conf = new Config::General( -ConfigFile => "pler.template", -ExtendedAccess => 1, -InterPolateVars => 1, -AutoTrue => 1, -StrictObjects => 0, ); my %config = $conf->getall; my $template = $conf->obj('template'); my %params = %{$config{'template'}}; for (keys %{$config{template}}) { if (/^http/) { $config{template}{home_url} = $_; delete $config{template}{$_}; last; } } $Data::Dumper::Sortkeys = \%config; print Dumper(\%config);

Replies are listed 'Best First'.
Re: Unwrapping values in a template
by jeffa (Bishop) on Nov 14, 2003 at 14:16 UTC
    How about munging the resulting datastructure instead of trying to massage the incoming data? (note: this works for the data you gave, but not necessarily for all data you will have to deal with - milleage will vary)
    use strict; use warnings; use Data::Dumper; use Config::General; my %config = ParseConfig(\*DATA); for (keys %{$config{template}}) { if (/^http/) { $config{template}{home_url} = $_; delete $config{template}{$_}; last; } } print Dumper \%config; __DATA__ <template> foo = Yes bar = 0 blort = home_url = http://www.foo.com/some/really/long/url </template>

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
      Ooooo, so close!

      I was all ready to hand you the gold star for this solution, until I realized, it drops the key entirely. Try this somewhere in the code you've suggested, and then try it again after putting the url back up into place in the template, manually:

      my %params = %{$config{'template'}}; print $params{'home_url'};

      However, this works, but isn't as elegant:

      print $config{'template'}->{'home_url'};

      Update: After screwing my head on straight, I realized that my initializer for %params was before the modification to the hash, and should have been after. This works properly, as it should:

      my %config = ParseConfig(\*DATA); for (keys %{$config{template}}) { if (/^http/) { $config{template}{home_url} = $_; delete $config{template}{$_}; last; } } $Data::Dumper::Sortkeys = \%config; print Dumper(\%config); print $params{'home_url'};

      Please ignore that man behind the curtain, it's just me making silly mistakes again.

        Just so you realize that you've hardcoded a solution for a single data set, whereas some of the other posted solutions are more generalized.

Re: Unwrapping values in a template
by bart (Canon) on Nov 14, 2003 at 15:49 UTC
    It's a matter of reliably recognizing keys from values. There are two things that complicate matters:
    • Values are optional;
    • Values can contain "=" signs, in particular in URLs.

    Now what distinguishes a key from a value, is that it starts at the start of a line, plus some whitespace; it contains only word characters, and is followed by (optional?) whitespace, and a "=" sign. If the whitespace isn't optional, it's easier to prevent false positives for recognized keys, because an URL can't contain whitespace. Still, you can't get an URL containing a "=" character while not containing any other non-word characters on the left of it, in particular any of colons, slashes and question marks.

    My preference would go to a solution that involves split. If split gets a regexp that contains capturing parens, the captured values will be inserted between the split data. So I split the values (possibly empty), and capture the keys. You'll get a leading dummy "value", without associated key, so we'll have top get rid of that first.

    This is code that works with the two versions of the input, but the regexp might be tweaked to minimize the chance of either false positives, and false negatives.

    foreach (<<'-1-', <<'-2-') { <template> foo = Yes bar = 0 blort = home_url = http://www.foo.com/some/really/long/url </template> -1- <template> foo = Yes bar = 0 blort = home_url = http://www.foo.com/some/really/long/url </template> -2- if(m[^<template>(.*?)\s*^</template>]sm) { (undef, my %data) = split /\s*^\s*(\w+)\s+=[^\S\n]*\n?/m, $1, +-1; use Data::Dumper; print Dumper \%data; } else { print "No match\n"; } }
    Result:
    $VAR1 = { 'blort' => '', 'foo' => 'Yes', 'bar' => '0', 'home_url' => 'http://www.foo.com/some/really/long/url' }; $VAR1 = { 'blort' => '', 'foo' => 'Yes', 'bar' => '0', 'home_url' => 'http://www.foo.com/some/really/long/url' };
Re: Unwrapping values in a template
by valdez (Monsignor) on Nov 14, 2003 at 14:06 UTC

    I think that you should add an explicit line terminator or an explicit null value; without these markers, you can't parse correctly yours files. If you really want to add some heuristic to your parser, try to apply your unwrap code only to lines that will probably contain long content (ie. *_url lines).

    HTH, Valerio

Re: Unwrapping values in a template
by Chmrr (Vicar) on Nov 14, 2003 at 14:35 UTC

    As long as all of your lines that are definitions start with whitespace, the following will work; this is because when the template wraps, it won't prefix the (wrapped) line with anything. This allows us to know what's a wrap, and what's not. No special handling of '=' signs needed.

    use Config::General; use Data::Dumper; undef $/; my $file = <DATA>; $file =~ s/\n(\S)/ $1/g; my $config = Config::General->new(-String=>$file); print Dumper {$config->getall}; __DATA__ foo = Yes bar = 0 blort = home_url = http://www.foo.com/some/really/long/url?foo=bar

    Produces:

    $VAR1 = { 'bar' => '0', 'boo' => '', 'blort' => '', 'foo' => 'Yes', 'home_url' => 'http://www.foo.com/some/really/long/url?foo=b +ar', };

    perl -pe '"I lo*`+$^X$\"$]!$/"=~m%(.*)%s;$_=$1;y^`+*^e v^#$&V"+@( NO CARRIER'

Re: Unwrapping values in a template
by synistar (Pilgrim) on Nov 14, 2003 at 16:03 UTC

    Here is a brute force approach (multiple regexs):

    #! perl -w use Data::Dumper; open(TMPL, "<my.template") or die $!; my $tmpl = 0; my $last; my %hash; for (<TMPL>) { next if /^#/; if (m!<template>!) { $tmpl = 1; next; } elsif (m!</template>!) { if (%hash) { $Data::Dumper::Sortkeys = 1; print "Result: ", Dumper(\%hash); } else { print "Bad template: $tmpl\n"; } } elsif ($tmpl) { if(/^\s+(\w+)\s*=\s*$/) { $hash{$1} = ''; $last = $1; } elsif(/^\s+(\w+)\s*=\s*(\S+)\s*$/) { if ($2 or $2 == 0) { $hash{$1} = $2; } } elsif (/^\s*(\S+[^=])\s*$/) { $hash{$last} = $1; } } } close TMPL;

    Not too elegant but it is fairly clear and works. Needs more code to handle all edge cases though.

Re: Unwrapping values in a template
by EvdB (Deacon) on Nov 14, 2003 at 16:34 UTC
    Whenever it comes to storing values in files or sending them around I now tend to use YAML which is breathtakingly easy to use and very effective. It automatically does the line wrapping and so you should not get a problem in the first place.

    From the YAML perldoc page:

    NAME YAML - YAML Ain't Markup Language (tm) SYNOPSIS use YAML; # Load a YAML stream of 3 YAML documents into Perl data structures. my ($hashref, $arrayref, $string) = Load(<<'...'); --- name: ingy age: old weight: heavy # I should comment that I also like pink, but don't tell anybody. favorite colors: - red - white - blue --- - Clark Evans - Oren Ben-Kiki - Brian Ingerson --- > You probably think YAML stands for "Yet Another Markup Language". It ain't! YAML is really a data serialization language. But if you want to think of it as a markup, that's OK with me. A lot of people try to use XML as a serialization format. "YAML" is catchy and fun to say. Try it. "YAML, YAML, YAML!!!" ... # Dump the Perl data structures back into YAML. print Dump($string, $arrayref, $hashref); # YAML::Dump is used the same way you'd use Data::Dumper::Dumper use Data::Dumper; print Dumper($string, $arrayref, $hashref); DESCRIPTION The YAML.pm module implements a YAML Loader and Dumper based on the YAML 1.0 specification. <http://www.yaml.org/spec/> ...continues...
    Link for the entire spec ( which you almost never use ): http://www.yaml.org/spec/.

    --tidiness is the memory loss of environmental mnemonics

Re: Unwrapping values in a template
by eric256 (Parson) on Nov 14, 2003 at 14:10 UTC

    You could go line by line, and do a look ahead, if the next line has an = in it then this line is complete as is, if the next line doesn't have an equals then this line = this line - the newline + the next line. You'd need to beware of if the next line doesn't exist, and if its a url with an equals sign.


    ___________
    Eric Hodges
Re: Unwrapping values in a template
by Anonymous Monk on Nov 14, 2003 at 14:25 UTC
    $/ = ""; while(<DATA>){ s#=[^\n\S]*\n(?!.*=)(?!</template>)#= #g; print; } __END__ <template> foo = Yes bar = 0 blort = home_url = http://www.foo.com/some/really/long/url zoo = xxx _url = http://www.foo.com/some/really/long/url boo = </template>

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://307043]
Approved by bart
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2024-03-29 02:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found