Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Stuck with RegEx (Parsing whois output)

by marsch (Initiate)
on Sep 28, 2004 at 08:40 UTC ( #394500=perlquestion: print w/ replies, xml ) Need Help??
marsch has asked for the wisdom of the Perl Monks concerning the following question:

Hello folks,

I have a little problem like that:

In a file are different sections. In these sections are key-value pairs. Every pair can occur arbitrary times. To be more sophisticated, every section can have several names. Sections are divided by one or more empty lines. An example:

{section1}
key1: value1_1
key2: value2_1
key2: value2_2
key3: value3_1
...

{section2}{section3}
key1: value1_1
key2: value2_1
key2: value2_2
key2: value2_3
key3: value3_1
...

My problem is to find out all lines of a certain key in a distinct section, e.g. find all values of key2 in section 2. I tried with and without look-aheads and look-behinds, all I get is either the first or the last value, but never all of them (should be 3 elements):

my @data = $text =~ /(?<=\{section2\})(?:[^\[\]]+\n)*(?:key2:\s+([^\n] ++?)\s*?\n)/sg; my @data = $text =~ /(?:\{section2\}(?:\{section[\d]+\})*\n)(?:\w+?:\s +*(?:.+?)\s*?\n)*?(?:key2:\s+([^\n]+?)\s*?\n)/sg;

UPDATE: I try to tune Net::XWhois (DENIC parser), which uses an engine like

@matches = $self->{ Response } =~ /${$self->{Parser}}{$key}/sg;

to get the entries, so my task is to set up a new parser.

Is it possible at all to solve this problem?

Thanks for any hint

Marco

Comment on Stuck with RegEx (Parsing whois output)
Select or Download Code
Re: Stuck with RegEx
by rupesh (Hermit) on Sep 28, 2004 at 09:05 UTC

    A quick script that came to my mind.
    However, I always believe that TIMTOWTDI
    use strict; my $flag = "n"; open FH, '<', 'filename'; foreach (<FH>) { $flag = "n" if /\{section\d+\}/; $flag = "y" if /\{section2\}/; if ($flag =~ /[yY]/) { print $1 if /key2: (.*)$/; } } close FH;
    Update: $flag reset. Thanks to si_lence

      Hi,

      thanks for your help. Definitely, this would be fine. But my aim is to tune the Net::XWhois module, especially the DENIC parser. The engine parses the input with one regex:

      @matches = $self->{ Response } =~ /${$self->{Parser}}{$key}/sg;

      This is why I feel bound to that =~ m/.../sg structure. Otherwise I had to rewrite the complete code in order to capture the distinct whois responses...

      Does this not print all the values for key2 even if they are in a different section that comes after {section2}?
      So you need to reset the $flag variable as soon as you come to the next section.
      si_lence
Re: Stuck with RegEx
by Jasper (Chaplain) on Sep 28, 2004 at 09:43 UTC
    Something like this might work:
    my $section = 'section1'; # or whichever you're looking in. my $key = 'key2'; # ditto while (<FH>) { for (/$section/ ... /section/) { print $1 if /$key: (\w+)/; } }
    Totally untested. Slightly tested, and I had to change it to the ... range operator and not the .. range operator. (probably wasting my time, though)
Re: Stuck with RegEx
by TedPride (Priest) on Sep 28, 2004 at 09:57 UTC
    # where $inp = your file contents... my ($sections, @lines, $key, $value, $p, %hash); foreach (split(/\n\n+/, $inp)) { ($sections, @lines) = split(/\n/, $_); my %thash; foreach (@lines) { ($key, $value) = split(/: /, $_); if ($thash{$key}) { $p = $thash{$key}; push(@$p, $value); } else { $p = [$value]; $thash{$key} = $p; } } while ($sections =~ /{(.*?)}/g) { $hash{$1} = \%thash; } } print $hash{'section2'}->{'key2'}[0] . "\n"; print $hash{'section3'}->{'key2'}[0] . "\n"; $hash{'section2'}->{'key2'}[0] = 'value4_9'; print $hash{'section2'}->{'key2'}[0] . "\n"; print $hash{'section3'}->{'key2'}[0] . "\n";
    As you can see, the sections are just pointers to the actual data, so you can have unlimited section aliases without taking up a lot of space. To print all the contents of key2 in section2:
    $key = $hash{'section2'}->{'key2'}; foreach (@$key) { print "$_\n"; }
Re: Stuck with RegEx (Parsing whois output)
by cfreak (Chaplain) on Sep 28, 2004 at 13:40 UTC

      Hello,

      it is similar, indeed, but double key values of one section are overwritten. Additionally, it looks like the examples already provided and needs line-by-line processing.

      Thanks for your help

      Marco

Re: Stuck with RegEx (Parsing whois output)
by marsch (Initiate) on Sep 29, 2004 at 08:28 UTC

    SOLVED

    Hello folks,

    I finally made it, not too good, but proper. I think there ain't no solution for that problem with one single match as I tried, so I'll use this workaround as I may assume there is a finite number of equal key words:

    @address = $whois =~ m/(?:\{section1\}(?:\{section[\d]+\})*)\n(?:\S*(? +<!key9):\s+[\S]+[^\n]+\n)+(?:key9:\s+([\S]+[^\n]+)\n)(?:key9:\s+([\S] ++[^\n]+)\n)?(?:key9:\s+([\S]+[^\n]+)\n)?(?:key9:\s+([\S]+[^\n]+)\n)?( +?:(?<!key9):[^\n]+\n)*?/gs; for (my $c = scalar @address; $c > 0; --$c) { pop @address unless defined $address[$c - 1]; }

    This is the matched file format:

    ...
    key6: key6_1
    
    {section1}
    key7: key7_1_section1
    key8: key8_1_section1
    key9: key9_1_section1
    key9: key9_2_section1
    key9: key9_3_section1
    key10: key10_1_section1
    key11: key11_1_section1
    key12: key12_1_section1
    key16: key16_1_section1
    key17: key17_1_section1
    
    {section2}{section3}
    key7: key7_1_section2_section3
    ...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://394500]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (8)
As of 2014-08-02 10:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Who would be the most fun to work for?















    Results (56 votes), past polls