Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

More complicated regular expression?

by jaldama (Acolyte)
on Jan 26, 2012 at 15:03 UTC ( #950109=perlquestion: print w/replies, xml ) Need Help??
jaldama has asked for the wisdom of the Perl Monks concerning the following question:

Just want some advice on pulling the information I want from output that I am getting in a script. I am basically doing:  `host` or something to that effect. That output when just using the command line will be something like this:
Using domain server: Name: Address: Aliases: has address
What would be the best way to extract the in a general way so it could be done with whatever website is checked? I did a previous matching for an ip that looked like this:
if ($found_addr =~ m/^(\d+)\.(\d+)\.(\d+)\.(\d+)$/) { . . .
But THAT wouldn't work because the output shows the domain server we're using, in this case google's Can I pull something by line number? update 1/27 Ended up using
if ($found_addr =~ m/(\d.*)/) {
On the linux boxes since they pull up the host info a little differently. overall, this was pretty successful though. thanks for the advice.

Replies are listed 'Best First'.
Re: More complicated regular expression?
by JavaFan (Canon) on Jan 26, 2012 at 15:09 UTC
    /has address (\S+)/ and do_something($1);
      I'm trying
      if ($found_addr =~ m/address(.*)\d/){ print "$1\n"; }
      But it's cutting off the last digit of the address for some reason.

        And I bet you're getting a space in front of the 74. Your "\d" at the end is matching a single digit character and since it's at the end of your parenthesis grabbing match, it's taking the last character - excluding it from your match.

        Use JavaFan's regex above and all will be well.

        use YAPE::Regex::Explain; print YAPE::Regex::Explain->new( qr/address(.*)\d/ )->explain; __END__ The regular expression: (?-imsx:address(.*)\d) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- address 'address' ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- \d digits (0-9) ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

        So .* (matching the most amount possible) followed by \d, a digit. If you want the digit to be captured, move it inside the parens.

      Sweet deal javafan
        Last question. What if the output instead of
        Using domain server: Name: Address: Aliases: has address
        Comes out on one line like this:
        $ host A
        My previous matching idea still cuts off the last digit. I thought using javafan's and searching for A
        /A (\S+)/ and do_something($1);
        Might work but it doesn't seem to. Is there an issue because of the spacing?
Re: More complicated regular expression?
by TomDLux (Vicar) on Jan 26, 2012 at 20:01 UTC

    Is the question about doing DNS lookups in Perl? In that case use an appropriate module, such as Net::DNS::Lookup, or one of the others you find by querying CPAN for 'DNS Lookup'

    If your actual concern is achieving a greater understanding of regular expressions, try:

    • Match the constant part of the string, and capture the ip4 that follows ... I'm actually capturing the longest possible string of digits and dots, but you can generally rely on DNS servers to send values that make sense as IP numbers.
    • my ($ipnum) = ($msg =~ m{has address ([\d\.]+)});
    • Break the regex processing into simple components:
    • LINE: for my $line ( split "\n", $dns_data ) { next LINE unless index $line, 'has address'; my ($ipnum) = ($msg =~ m{([\d\.]+));


    You wrote:

        if ($found_addr =~ m/address(.*)\d/){ print "$1\n"; }
        But it's cutting off the last digit of the address for some reason.

    Let's go back to basics and discuss this block: IF (the stuff between parentheses is true)THEN DO {the stuff between curly braces}.

    the stuff between parentheses is applying a regular expression to the value stored in the variable $found_addr. The regular expression looks for the sequence of characters: 'a', 'd', 'd', 'r', 'e', 's', 's', followed by a wildcard sequence which is captured, followed by a digit which is not captured.

    From our knowledge of what was being returned, we know you are capturing a space character, ' ', followed by some digits, a dot, some digits, .... Since the captured sequence is followed by a non-captured digit, the last digit will not be captured. When you look at your printout, you'll notice the IP address is one character to the right, because of the captured space.

    If you want to capture the digit, put it within the capturing parentheses. If you want to exlude the space, explicitly specify it outside the parentheses.

    As Occam said: Entia non sunt multiplicanda praeter necessitatem.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://950109]
Approved by Corion
Front-paged by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2019-02-19 03:18 GMT
Find Nodes?
    Voting Booth?
    I use postfix dereferencing ...

    Results (101 votes). Check out past polls.