Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Stuck with RegEx (Parsing whois output)

by marsch (Initiate)
on Sep 28, 2004 at 08:40 UTC ( #394500=perlquestion: print w/ replies, xml ) Need Help??
marsch has asked for the wisdom of the Perl Monks concerning the following question:

Hello folks,

I have a little problem like that:

In a file are different sections. In these sections are key-value pairs. Every pair can occur arbitrary times. To be more sophisticated, every section can have several names. Sections are divided by one or more empty lines. An example:

{section1}
key1: value1_1
key2: value2_1
key2: value2_2
key3: value3_1
...

{section2}{section3}
key1: value1_1
key2: value2_1
key2: value2_2
key2: value2_3
key3: value3_1
...

My problem is to find out all lines of a certain key in a distinct section, e.g. find all values of key2 in section 2. I tried with and without look-aheads and look-behinds, all I get is either the first or the last value, but never all of them (should be 3 elements):

my @data = $text =~ /(?<=\{section2\})(?:[^\[\]]+\n)*(?:key2:\s+([^\n] ++?)\s*?\n)/sg; my @data = $text =~ /(?:\{section2\}(?:\{section[\d]+\})*\n)(?:\w+?:\s +*(?:.+?)\s*?\n)*?(?:key2:\s+([^\n]+?)\s*?\n)/sg;

UPDATE: I try to tune Net::XWhois (DENIC parser), which uses an engine like

@matches = $self->{ Response } =~ /${$self->{Parser}}{$key}/sg;

to get the entries, so my task is to set up a new parser.

Is it possible at all to solve this problem?

Thanks for any hint

Marco

Comment on Stuck with RegEx (Parsing whois output)
Select or Download Code
Re: Stuck with RegEx
by rupesh (Hermit) on Sep 28, 2004 at 09:05 UTC

    A quick script that came to my mind.
    However, I always believe that TIMTOWTDI
    use strict; my $flag = "n"; open FH, '<', 'filename'; foreach (<FH>) { $flag = "n" if /\{section\d+\}/; $flag = "y" if /\{section2\}/; if ($flag =~ /[yY]/) { print $1 if /key2: (.*)$/; } } close FH;
    Update: $flag reset. Thanks to si_lence

      Hi,

      thanks for your help. Definitely, this would be fine. But my aim is to tune the Net::XWhois module, especially the DENIC parser. The engine parses the input with one regex:

      @matches = $self->{ Response } =~ /${$self->{Parser}}{$key}/sg;

      This is why I feel bound to that =~ m/.../sg structure. Otherwise I had to rewrite the complete code in order to capture the distinct whois responses...

      Does this not print all the values for key2 even if they are in a different section that comes after {section2}?
      So you need to reset the $flag variable as soon as you come to the next section.
      si_lence
Re: Stuck with RegEx
by Jasper (Chaplain) on Sep 28, 2004 at 09:43 UTC
    Something like this might work:
    my $section = 'section1'; # or whichever you're looking in. my $key = 'key2'; # ditto while (<FH>) { for (/$section/ ... /section/) { print $1 if /$key: (\w+)/; } }
    Totally untested. Slightly tested, and I had to change it to the ... range operator and not the .. range operator. (probably wasting my time, though)
Re: Stuck with RegEx
by TedPride (Priest) on Sep 28, 2004 at 09:57 UTC
    # where $inp = your file contents... my ($sections, @lines, $key, $value, $p, %hash); foreach (split(/\n\n+/, $inp)) { ($sections, @lines) = split(/\n/, $_); my %thash; foreach (@lines) { ($key, $value) = split(/: /, $_); if ($thash{$key}) { $p = $thash{$key}; push(@$p, $value); } else { $p = [$value]; $thash{$key} = $p; } } while ($sections =~ /{(.*?)}/g) { $hash{$1} = \%thash; } } print $hash{'section2'}->{'key2'}[0] . "\n"; print $hash{'section3'}->{'key2'}[0] . "\n"; $hash{'section2'}->{'key2'}[0] = 'value4_9'; print $hash{'section2'}->{'key2'}[0] . "\n"; print $hash{'section3'}->{'key2'}[0] . "\n";
    As you can see, the sections are just pointers to the actual data, so you can have unlimited section aliases without taking up a lot of space. To print all the contents of key2 in section2:
    $key = $hash{'section2'}->{'key2'}; foreach (@$key) { print "$_\n"; }
Re: Stuck with RegEx (Parsing whois output)
by cfreak (Chaplain) on Sep 28, 2004 at 13:40 UTC

      Hello,

      it is similar, indeed, but double key values of one section are overwritten. Additionally, it looks like the examples already provided and needs line-by-line processing.

      Thanks for your help

      Marco

Re: Stuck with RegEx (Parsing whois output)
by marsch (Initiate) on Sep 29, 2004 at 08:28 UTC

    SOLVED

    Hello folks,

    I finally made it, not too good, but proper. I think there ain't no solution for that problem with one single match as I tried, so I'll use this workaround as I may assume there is a finite number of equal key words:

    @address = $whois =~ m/(?:\{section1\}(?:\{section[\d]+\})*)\n(?:\S*(? +<!key9):\s+[\S]+[^\n]+\n)+(?:key9:\s+([\S]+[^\n]+)\n)(?:key9:\s+([\S] ++[^\n]+)\n)?(?:key9:\s+([\S]+[^\n]+)\n)?(?:key9:\s+([\S]+[^\n]+)\n)?( +?:(?<!key9):[^\n]+\n)*?/gs; for (my $c = scalar @address; $c > 0; --$c) { pop @address unless defined $address[$c - 1]; }

    This is the matched file format:

    ...
    key6: key6_1
    
    {section1}
    key7: key7_1_section1
    key8: key8_1_section1
    key9: key9_1_section1
    key9: key9_2_section1
    key9: key9_3_section1
    key10: key10_1_section1
    key11: key11_1_section1
    key12: key12_1_section1
    key16: key16_1_section1
    key17: key17_1_section1
    
    {section2}{section3}
    key7: key7_1_section2_section3
    ...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://394500]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (11)
As of 2014-08-20 18:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (121 votes), past polls