Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Parse C-like define statements

by Cirollo (Friar)
on May 29, 2002 at 16:39 UTC ( #170144=perlquestion: print w/ replies, xml ) Need Help??
Cirollo has asked for the wisdom of the Perl Monks concerning the following question:

I need to parse a file that contains C-like #DEFINE statements, but I don't think I can use something like C::Scan because the file I'm parsing is not actually in C (plus any module I use has to be in the standard Perl distribution). My problem arises when many of the #DEFINEs include things that were previously defined. Here is an example of what my file might look like:
------------------ #DEFINE <PATH> /path/to/something #DEFINE <VERSION> v12<REV> #DEFINE <REV> 3 #DEFINE <FILE> <PATH>/foo_<VERSION>.txt ------------------
Suppose I want to know what the parameter "FILE" is; parseDefines("definefile.txt", "FILE") should return "/path/to/something/foo_v123.txt" Here is the code I have so far:
sub parseDefines { my ($filename, $option); open(FILE, $filename) or die "Couldn't open file $filename"; my %defines; while(<FILE>) { chomp; if (/^\#DEFINE/) { /^\#DEFINE\s+<(\w+)>\s+(.*$)/; # I think its ok to use .* because I really # want to match EVERYTHING to the end of line # (yes, I did read Ovid's "Death to Dot Star" :) $defines{$1} = $2; } } for (sort keys %defines) { $defines{$_} =~ s/<(\w+)>/$defines{$1}/g; print $defines{$_} . "\t=>\t" . $defines{$1} . "\n"; } close FILE; return $defines{$option}; }
My concern is that this doesn't check to make sure every single DEFINE is evaluated all the way up. Since I did a "sort keys", everything gets evaluated in alphabetical order, so FILE will turn out to put "/path/to/something/foo_v12<REV>.txt" (since VERSION wasn't evaluated before it got put into FILE).

I can wrap another for(0..10) around the existing for loop to make sure it just goes through and evaluated everything a bunch of times, but these are large files and I'm thinking there is a better, neater way to do things.

Any ideas?

Comment on Parse C-like define statements
Select or Download Code
Re: Parse C-like define statements
by Cirollo (Friar) on May 29, 2002 at 16:45 UTC
    And of course the first line of that sub should read
    my ($filename, $option) = @_;
    That's what I get for posting code that was tested in a slightly different form :)
(jeffa) Re: Parse C-like define statements
by jeffa (Chancellor) on May 29, 2002 at 17:23 UTC
    Tough problem. You could try a queue - the idea is to store those keys whose values are not complete, and keep substituting until they are complete:
    use strict; use Data::Dumper; my (%hash,@queue); while (<DATA>) { chomp; if (/^#DEFINE\s+<([^>]+)>\s+(.*)/) { $hash{$1} = $2; } } do_it($_) for keys %hash; do_it(shift @queue) while @queue; sub do_it { my $key = shift; $hash{$key} =~ s/<([^>]+)>/$hash{$1}/g; unshift @queue, $key if $hash{$key} =~ /</; } print Dumper \%hash; __DATA__ ------------------ #DEFINE <PATH> /path/to/something #DEFINE <VERSION> v12<REV> #DEFINE <REV> 3 #DEFINE <FILE> <PATH>/foo_<VERSION>.txt
    BUT!!! This is going to loop infinitely if you have one single circular reference in your #DEFINE statements. For example, change <VERSION> to:
    #DEFINE <VERSION> v12<REV><FILE>
    and this will not work. Hopefully someone else will have a better answer, but if you are %110 certain that this will not be the case, then this code will prevent unecessary iterations.

    p.s. this might be a job for Parse::RecDescent ...

    UPDATE: danger++ and sfink++

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
Re: Parse C-like define statements
by danger (Priest) on May 29, 2002 at 18:29 UTC

    Your method was almost there --- you just need to make the following change:

    # change this: $defines{$_} =~ s/<(\w+)>/$defines{$1}/g; # to this: 1 while $defines{$_} =~ s/<(\w+)>/$defines{$1}/g;

    Doing just a single s///g can still leave you with unresolved defines (when the substituted text contains unresolved defines). Using the '1 while...' version continues to reapply the regex until all of the defines are resolved (or forever in the case of circular relationships). Here is an alternate version of your routine that returns a reference to the resolved %defines hash rather than a single option:

    #!/usr/bin/perl -w use strict; sub parse_defines { my $file = shift; open(FILE, $file) || die "can't open $file: $!"; my %defines; while(<FILE>){ next unless /^#DEFINE\s+<(\w+)>\s+(.*)/; $defines{$1} = $2; } for (values %defines) { 1 while s/<(\w+)>/$defines{$1}/; } return \%defines; } my $opts = parse_defines('defs.txt'); my $file = $opts->{FILE}; print "$file\n";

    Note: it also loops through and modifies the hash values in place rather than accessing each one via a key --- IIRC, having values return the actual values instead of copies is a 5.6ism, so you may stick with your method of looping over the keys for portability.

Re: Parse C-like define statements
by sfink (Deacon) on May 29, 2002 at 18:31 UTC
    What about forward references? Can you use definitions before they occur? If not, it seems like you just need to do things in one pass instead of two (untested):
    . . . my %defines; while(<FILE>) { chomp; if (/^#DEFINE \s+ <(\w+)> \s+ (.*) $/x) { my ($def, $text) = ($1, $2); 1 while $text =~ s/<(\w+)>/$defines{$1}/; $defines{$def} = $text; } } . . .
    And I hope you're only calling this once per file. If you pass in the same filename multiple times with different $options, then you should be caching the definition maps in a global variable or a passed-in parameter, like:
    sub parseDefines { my ($filename, $option, $defines) = @_; return $defines->{$option} if $defines; $defines = {}; ... }
Re: Parse C-like define statements
by Abigail-II (Bishop) on May 30, 2002 at 13:05 UTC
    This is certainly not an easy problem. It's easy to write a piece of code that will only terminate because it has exhausted all memory, or ran out of stack space - because you can have loops. Furthermore, even if there aren't loops, a naive approach might lead to a program that runs in time quadratic to the number of defines.

    I suggest approaching the problem as seeing the file as a graph. Each #define is a node, with as (outgoing) edges edges to nodes its definition is refering to.

    Now that you have made a graph, first you need to find out whether there are any loops - if there are, determine what you are going to do with them. Throw them out, die(), whatever. Second, do a topological sort, then you can process the defines like you are doing now:

    $defines{$_} =~ s/<(\w+)>/$defines{$1}/g;

    Luckely, there's a graph module on CPAN that could help you.

    Abigail

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://170144]
Approved by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2014-12-20 08:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (95 votes), past polls