|Problems? Is your data what you think it is?|
Perl regex and word boundariesby MeatLips (Novice)
|on Feb 13, 2013 at 19:46 UTC||Need Help??|
MeatLips has asked for the
wisdom of the Perl Monks concerning the following question:
So interesting problem I've run into. I have a script for pulling hostnames in a two-node cluster. I ghettoed a bit using a bash egrep expression to determine hostnames have been added to /etc/hosts on both nodes.
So I may have an /etc/hosts that looks like this:
here's a snip of my code:
I used word boundaries (\b) to make sure I only find what I'm looking for. Normally, this would return something like below:
This is what I want. Just the two hostnames.
The hostnames themselves follow whatever standard the customer sets, so we have little control over what they name their stuff. But usually the above code works well for just pulling out the hostnames. We do control how they format the names in /etc/hosts by providing a script interface, so how stuff is laid out in /etc/hosts is pretty constant.
Now here's the problem: (\b) boundaries work pretty well most of the time. But we have one customer that named his stuff like this:
So the above egrep statement finds these:
This is because "-" isn't considered part of a word if it's at the end, so the "\b" ignores it. I got no idea how to craft the right expression to determine just the hostnames I want. I do have customers that name their stuff like below:
Which will return:
So I can't split on the "-". Ugh, even now my head hurts thinking about this issue. Does anyone have any idea for some nifty perl regex that could solve my problem?