Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Stuck with RegEx (Parsing whois output)

by marsch (Initiate)
on Sep 28, 2004 at 08:40 UTC ( #394500=perlquestion: print w/replies, xml ) Need Help??
marsch has asked for the wisdom of the Perl Monks concerning the following question:

Hello folks,

I have a little problem like that:

In a file are different sections. In these sections are key-value pairs. Every pair can occur arbitrary times. To be more sophisticated, every section can have several names. Sections are divided by one or more empty lines. An example:

key1: value1_1
key2: value2_1
key2: value2_2
key3: value3_1

key1: value1_1
key2: value2_1
key2: value2_2
key2: value2_3
key3: value3_1

My problem is to find out all lines of a certain key in a distinct section, e.g. find all values of key2 in section 2. I tried with and without look-aheads and look-behinds, all I get is either the first or the last value, but never all of them (should be 3 elements):

my @data = $text =~ /(?<=\{section2\})(?:[^\[\]]+\n)*(?:key2:\s+([^\n] ++?)\s*?\n)/sg; my @data = $text =~ /(?:\{section2\}(?:\{section[\d]+\})*\n)(?:\w+?:\s +*(?:.+?)\s*?\n)*?(?:key2:\s+([^\n]+?)\s*?\n)/sg;

UPDATE: I try to tune Net::XWhois (DENIC parser), which uses an engine like

@matches = $self->{ Response } =~ /${$self->{Parser}}{$key}/sg;

to get the entries, so my task is to set up a new parser.

Is it possible at all to solve this problem?

Thanks for any hint


Replies are listed 'Best First'.
Re: Stuck with RegEx
by rupesh (Hermit) on Sep 28, 2004 at 09:05 UTC

    A quick script that came to my mind.
    However, I always believe that TIMTOWTDI
    use strict; my $flag = "n"; open FH, '<', 'filename'; foreach (<FH>) { $flag = "n" if /\{section\d+\}/; $flag = "y" if /\{section2\}/; if ($flag =~ /[yY]/) { print $1 if /key2: (.*)$/; } } close FH;
    Update: $flag reset. Thanks to si_lence
      Does this not print all the values for key2 even if they are in a different section that comes after {section2}?
      So you need to reset the $flag variable as soon as you come to the next section.


      thanks for your help. Definitely, this would be fine. But my aim is to tune the Net::XWhois module, especially the DENIC parser. The engine parses the input with one regex:

      @matches = $self->{ Response } =~ /${$self->{Parser}}{$key}/sg;

      This is why I feel bound to that =~ m/.../sg structure. Otherwise I had to rewrite the complete code in order to capture the distinct whois responses...

Re: Stuck with RegEx (Parsing whois output)
by cfreak (Chaplain) on Sep 28, 2004 at 13:40 UTC


      it is similar, indeed, but double key values of one section are overwritten. Additionally, it looks like the examples already provided and needs line-by-line processing.

      Thanks for your help


Re: Stuck with RegEx
by TedPride (Priest) on Sep 28, 2004 at 09:57 UTC
    # where $inp = your file contents... my ($sections, @lines, $key, $value, $p, %hash); foreach (split(/\n\n+/, $inp)) { ($sections, @lines) = split(/\n/, $_); my %thash; foreach (@lines) { ($key, $value) = split(/: /, $_); if ($thash{$key}) { $p = $thash{$key}; push(@$p, $value); } else { $p = [$value]; $thash{$key} = $p; } } while ($sections =~ /{(.*?)}/g) { $hash{$1} = \%thash; } } print $hash{'section2'}->{'key2'}[0] . "\n"; print $hash{'section3'}->{'key2'}[0] . "\n"; $hash{'section2'}->{'key2'}[0] = 'value4_9'; print $hash{'section2'}->{'key2'}[0] . "\n"; print $hash{'section3'}->{'key2'}[0] . "\n";
    As you can see, the sections are just pointers to the actual data, so you can have unlimited section aliases without taking up a lot of space. To print all the contents of key2 in section2:
    $key = $hash{'section2'}->{'key2'}; foreach (@$key) { print "$_\n"; }
Re: Stuck with RegEx
by Jasper (Chaplain) on Sep 28, 2004 at 09:43 UTC
    Something like this might work:
    my $section = 'section1'; # or whichever you're looking in. my $key = 'key2'; # ditto while (<FH>) { for (/$section/ ... /section/) { print $1 if /$key: (\w+)/; } }
    Totally untested. Slightly tested, and I had to change it to the ... range operator and not the .. range operator. (probably wasting my time, though)
Re: Stuck with RegEx (Parsing whois output)
by marsch (Initiate) on Sep 29, 2004 at 08:28 UTC


    Hello folks,

    I finally made it, not too good, but proper. I think there ain't no solution for that problem with one single match as I tried, so I'll use this workaround as I may assume there is a finite number of equal key words:

    @address = $whois =~ m/(?:\{section1\}(?:\{section[\d]+\})*)\n(?:\S*(? +<!key9):\s+[\S]+[^\n]+\n)+(?:key9:\s+([\S]+[^\n]+)\n)(?:key9:\s+([\S] ++[^\n]+)\n)?(?:key9:\s+([\S]+[^\n]+)\n)?(?:key9:\s+([\S]+[^\n]+)\n)?( +?:(?<!key9):[^\n]+\n)*?/gs; for (my $c = scalar @address; $c > 0; --$c) { pop @address unless defined $address[$c - 1]; }

    This is the matched file format:

    key6: key6_1
    key7: key7_1_section1
    key8: key8_1_section1
    key9: key9_1_section1
    key9: key9_2_section1
    key9: key9_3_section1
    key10: key10_1_section1
    key11: key11_1_section1
    key12: key12_1_section1
    key16: key16_1_section1
    key17: key17_1_section1
    key7: key7_1_section2_section3

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://394500]
Approved by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (6)
As of 2017-09-26 04:21 GMT
Find Nodes?
    Voting Booth?
    During the recent solar eclipse, I:

    Results (292 votes). Check out past polls.