Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

some regex help

by emilford (Friar)
on Apr 05, 2004 at 21:36 UTC ( #342759=perlquestion: print w/replies, xml ) Need Help??

emilford has asked for the wisdom of the Perl Monks concerning the following question:

I have a configuration file that I need to parse through. The lines will follow one of, say, three formats:
Managed Node XYZ123 = MN Type = Combo Rim = Planner
What I need to be able to do is grab any line that starts with "Managed Node" and store the values on either side of the "=" sign in a hash. Simple enough, but I'd like to double check the regex I came up with to see if anything better out there to catch user formatting...differences (i.e. - "a=b" vs "a = b", etc). I'd like to make this as forgiving as possible. On the left side of the "=" sign, numbers, letters, underscores, and dashes. On the right side, I'd like to limit it to a number of possibilities, set in a "|" delimited global variable.
my $options = 'A|B|C|D'; if ($line =~ /^Managed Node ([a-zA-Z0-9_-])\s*=\s*($options)/i) { my ($node, $value) = ($1, $2); }
The second thing I need to be able to do is two fold. If a line matches ABC = XYZ, I need to be able to grab both sides of the "=" sign and, in this case, create a variable called $ABC and set its value to "XYZ". I'm not sure how to go about dynamically creating variables like this, but here is my regex:
if ($line =~ /^(\w*)\s*=\s*(\w*)/);
or would this be better
if ($line =~ /%(.*)\s*=\s*(.*)/);
Thanks for the help.

Replies are listed 'Best First'.
Re: some regex help
by matija (Priest) on Apr 05, 2004 at 21:52 UTC
    First of all, creating variables like that is quite dangerous. You could quite easily find yourself setting a value that might overwrite something in your program. Having a hash that has the variable name as it's key, and the variable's value as it's value is much safer.

    Second, the /%(.*)\s*=\s*(.*)/ regexp is not exactly the same as the other one: the .* is greedy, therefore it will consume all the blanks before the equals sign. Your $1 will have trailing blanks if there are any trailing blanks to be had.(And what is the % doing there?)

    Third, you do know that the $options as you posted it here will not match MN, don't you>? This is because it doesn't contain the letter M or N, and because it only accepts one letter, you need a + or a count like {n,m} to match it correctly.

      Yes, that makes sense. I'll use a hash to store the dynamic variables. Duh, I should have thought of that. :)

      The % sign in my second regex was meant to be a ^ sign. Typo.

      I think I need to rethink the $options part. Say the $options variable was set to "A|B|C". In the configuration file that I need to parse, the line could be "Foo = A" or "Bar = A|B" to signify it either A or B. I don't think what I have will give me the desired results.

      I figured using (.*) would be bad. You know, because of the whole greedy thing....:-P. Thanks.
Re: some regex help
by Roy Johnson (Monsignor) on Apr 05, 2004 at 21:47 UTC
    I'm not sure how to go about dynamically creating variables like this
    The recommended alternative is to use a hash, where ABC is a key, rather than a variable name..

    Your regexen are fine. For the last example, you should use \w* instead of .*, as the latter will capture whitespace. If your expression should accept embedded whitespace (or other non-\w chars), it gets a little trickier.

    The PerlMonk tr/// Advocate
Re: some regex help
by DamnDirtyApe (Curate) on Apr 05, 2004 at 22:17 UTC

    Would this do the trick?

    #! /usr/bin/perl use strict; use Data::Dumper; my %hash; for (<DATA>) { $hash{$1} = $2 if /Managed Code\s+(\S+)\s*=\s*(\S+)/; } print Dumper \%hash; __DATA__ Managed Node XYZ123 = MN Type = Combo Rim = Planner

    Those who know that they are profound strive for clarity. Those who
    would like to seem profound to the crowd strive for obscurity.
                --Friedrich Nietzsche
      I think your solution might be the best approach. There shouldn't be any spaces in the variables, so \S+ should catch everything I would want. Great.
      A reply falls below the community's threshold of quality. You may see it by logging in.
Re: some regex help
by Elijah (Hermit) on Apr 05, 2004 at 21:59 UTC
    Well yes I believe using (.*) would be better than the alpha-numeric check of (\w*) simply because these values may have non-alpha-numeric characters in them at some time and would fail the pattern match.

    Ex: Untested

    #!/usr/bin/perl -w use strict; while (<DATA>) { if (/^(Managed Node)(.*)/) { my @value = split(/\=/, $2); print de_space($value[0]),"\n"; print de_space($value[1]),"\n"; }elsif (/^(.*\=.*)/) { my @data = split(/\=/, $1); print de_space($data[0])." \= ".de_space($data[1]),"\n"; } } sub de_space { my $object = shift; $object =~ s/ *$//; $object =~ s/^ *//; return $object; } __DATA__ Managed Node XYZ123 = MN Type = Combo Rim = Planner
    Well there is a way to extract and isolate everything you wanted but I do not know how you want to use the info when you have it so I just printed it to STDOUT.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://342759]
Approved by talexb
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (2)
As of 2021-12-06 17:36 GMT
Find Nodes?
    Voting Booth?
    R or B?

    Results (33 votes). Check out past polls.