Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Perl regex and word boundaries

by MeatLips (Novice)
on Feb 13, 2013 at 19:46 UTC ( #1018613=perlquestion: print w/ replies, xml ) Need Help??
MeatLips has asked for the wisdom of the Perl Monks concerning the following question:

So interesting problem I've run into. I have a script for pulling hostnames in a two-node cluster. I ghettoed a bit using a bash egrep expression to determine hostnames have been added to /etc/hosts on both nodes.

So I may have an /etc/hosts that looks like this:

192.168.1.199 hostname62a.domain.com hostname62a 192.168.1.200 hostname62b.domain.com hostname62b 192.168.1.201 hostname62.domain.com hostname62 192.168.2.144 hostname62amgt.domain.com hostname62amgt 192.168.2.145 hostname62bmgt.domain.com hostname62bmgt
here's a snip of my code:
my $ha1 = "hostname62a"; my $ha2 = "hostname62b"; my $cmd1 = "egrep -i \"\\b$ha1\\b|\\b$ha2\\b\" /etc/hosts"; open(HOSTS1, "$cmd1|"); while(<HOSTS1>) { chomp; push (@hosts_ha1, $_); } close(HOSTS1);
I used word boundaries (\b) to make sure I only find what I'm looking for. Normally, this would return something like below:
192.168.1.199 hostname62a.domain.com hostname62a 192.168.1.200 hostname62b.domain.com hostname62b

This is what I want. Just the two hostnames.

The hostnames themselves follow whatever standard the customer sets, so we have little control over what they name their stuff. But usually the above code works well for just pulling out the hostnames. We do control how they format the names in /etc/hosts by providing a script interface, so how stuff is laid out in /etc/hosts is pretty constant.

Now here's the problem: (\b) boundaries work pretty well most of the time. But we have one customer that named his stuff like this:

192.168.1.199 hostname62a.domain.com hostname62a 192.168.1.200 hostname62b.domain.com hostname62b 192.168.1.201 hostname62.domain.com hostname62 192.168.2.144 hostname62a-r.domain.com hostname62a-r 192.168.2.145 hostname62b-r.domain.com hostname62b-r

So the above egrep statement finds these:

192.168.1.199 hostname62a.domain.com hostname62a 192.168.1.200 hostname62b.domain.com hostname62b 192.168.2.144 hostname62a-r.domain.com hostname62a-r 192.168.2.145 hostname62b-r.domain.com hostname62b-r
This is because "-" isn't considered part of a word if it's at the end, so the "\b" ignores it. I got no idea how to craft the right expression to determine just the hostnames I want. I do have customers that name their stuff like below:

192.168.1.2 hostname-node1.domain.com hostname-node1 192.168.1.3 hostname-node2.domain.com hostname-node2 192.168.1.4 hostname-node1mgt.domain.com hostname-node1mgt 192.168.1.5 hostname-node2mgt.domain.com hostname-node2mgt

Which will return:

192.168.1.2 hostname-node1.domain.com hostname-node1 192.168.1.3 hostname-node2.domain.com hostname-node2

So I can't split on the "-". Ugh, even now my head hurts thinking about this issue. Does anyone have any idea for some nifty perl regex that could solve my problem?

Comment on Perl regex and word boundaries
Select or Download Code
Re: Perl regex and word boundaries
by Kenosis (Priest) on Feb 13, 2013 at 19:59 UTC

    Would the following be helpful?

    use strict; use warnings; my $ha1 = "hostname62a"; my $ha2 = "hostname62b"; while (<DATA>) { print if /\s(?:$ha1|$ha2)\./; } __DATA__ 192.168.1.199 hostname62a.domain.com hostname62a 192.168.1.200 hostname62b.domain.com hostname62b 192.168.1.201 hostname62.domain.com hostname62 192.168.2.144 hostname62a-r.domain.com hostname62a-r 192.168.2.145 hostname62b-r.domain.com hostname62b-r

    Output:

    192.168.1.199 hostname62a.domain.com hostname62a 192.168.1.200 hostname62b.domain.com hostname62b
Re: Perl regex and word boundaries
by 7stud (Deacon) on Feb 14, 2013 at 00:46 UTC

    So interesting problem I've run into. I have a script for pulling hostnames in a two-node cluster. I ghettoed a bit using a bash egrep expression to determine hostnames have been added to /etc/hosts on both nodes. So I may have an /etc/hosts...

    Thanks for the irrelevant back story. All you need to post is: "I have this text ..., and I want to match these lines .... This is what I tried ... How make worky?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1018613]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2014-12-20 08:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (95 votes), past polls