Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Slurping file using regular expressions for array or variables

by firefli (Initiate)
on Apr 03, 2014 at 21:03 UTC ( #1081007=perlquestion: print w/ replies, xml ) Need Help??
firefli has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a text file with information containing computer ip addresses, mac addresses, os details etc that looks like this for each node:
Nmap scan report for somenode.somedomain.com (192.x.x.x)
Host is up (0.032s latency).
Not shown: 974 closed ports
PORT STATE SERVICE
53/tcp open domain
...
49160/tcp open unknown
MAC Address: 24:34:E4:57:aB:BC (some company)
Device type: general purpose
Running: Microsoft Windows 7|2008
OS CPE: cpe:/o:microsoft:windows_7::-
cpe:/o:microsoft:windows_7::sp1
cpe:/o:microsoft:windows_server_2008::sp1
cpe:/o:microsoft:windows_8
OS details: Microsoft Windows 7 SP0 - SP1, Windows Server 2008 SP1, or Windows 8
Network Distance: 1 hop

I want to get the information reorganised so it looks like this:

$IPADDRESS, $MACADDRESS, $OSCPE, $OSDETAILS
$IPADDRESS, $MACADDRESS, $OSCPE, $OSDETAILS
etc
I slurped a file and tried to get this done various ways - one of which is

<code>($IPADDR,$MACADDR,$OS,$OSDETL) = $TEXT =~ /(192\.168\.1\.\d+).*?(\d2:\d2:\d2:\d2:\d2:\d2)/g;<code>

but it doesn't work. It will work if I only use

<code> @matches = ( $TEXT =~ /(192\.168\.1\.\d+)/g);
foreach my $val (@matches) {
print "$val\n";
<code>

But if I try to match another string in the same regular expression it fails. For example, this doesn't work:

<code> @matches = ( $TEXT =~ /(192\.168\.1\.\d+).*?(\d2:\d2:\d2:\d2:\d2:\d2)/g);
<code>


Thanks

Comment on Slurping file using regular expressions for array or variables
Re: Slurping file using regular expressions for array or variables
by kennethk (Monsignor) on Apr 03, 2014 at 22:03 UTC
    You failed to close your code tags. This has caused some rather dramatic mangling. Please correct this, and wrap your input in code tags as well, since white space gets mangled in HTML. See How do I post a question effectively?.

    Your MAC address code should read

    /([\dA-Fa-f]{2}:[\dA-Fa-f]{2}:[\dA-Fa-f]{2}:[\dA-Fa-f]{2}:[\dA-Fa-f]{2 +}:[\dA-Fa-f]{2})/g);
    or more concisely,
    /([\dA-F]{2}(?::[\dA-F]{2}){5})/ig);
    The code you posted (I think) was looking for literal '2's instead of a count ({2}). See Quantifiers in perlre.

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: Slurping file using regular expressions for array or variables
by ww (Bishop) on Apr 03, 2014 at 22:44 UTC
    Even viewing the xml version of your post (ie, the way you posted it), the code makes no sense... but first, let's check: Is this what you intended:
    ($IPADDR,$MACADDR,$OS,$OSDETL) = $TEXT =~ /(192\.168\.1\.[\d]+).*?([\d]2:[\d]2:[\d]2:[\d]2:[\d]2:[\d]2)/g;

    but it doesn't work. It will work if I only use

    @matches = ( $TEXT =~ /(192\.168\.1\.[\d]+)/g); foreach my $val (@matches) { print "$val\n";

    But if I try to match another string in the same regular expression it fails. For example, this doesn't work:

    @matches = ( $TEXT =~ /(192\.168\.1\.[\d]+).*?([\d]2:[\d]2:[\d]2:[\d]2:[\d]2:[\d]2)/g);

    If that's what you tried to write, amended to close the code tags as kennethk suggested, why are you using character classes when \d would work and \d{2} might do what you want?

    And not just BTW, were you specify \d+ the regex engine would any number of digits until some other character intervenes or it hits the end of the string... and .*? merely limits the number of matches to the dot (anything) to as few as possible before taking up with your [\d]2:s again.


    Questions containing the words "doesn't work" (or their moral equivalent) will usually get a downvote from me unless accompanied by:
    1. code
    2. verbatim error and/or warning messages
    3. a coherent explanation of what "doesn't work actually means.

      Thank you for your replies. Sorry about the munging of the previous post. I'm trying to populate the two arrays @IPADR and @MACADR using a binding to a slurped file using captures from regular expressions shown below. In between the two captures I put "..." since I wasn't sure what to put there. I had "*." on the assumption that the second capture would limit the dot's greediness. If I remove one of the captures and just use a single related array, I can print the values. As soon as I try to put the second capture into the regular expression, I get no output and no error message (running perl -w script.pl). I thought two captures could be used to populate separate arrays but there is something I am unaware of. I'm hoping it's not something you told me previously. the eginv.txt file that the regular expression is bound to has around 100 nodes in it, each with their own rows for mac addresses and ip addresses etc. I'd like to get the values into their respective arrays so I could print them so that each node is described on one line like this "NODENAME, IPADDRESS, MACADDRESS, SOMETHINGELSE...." Can I get that with some modifcations to what I have so far? Thanks

      This is the source of the information the script is running against sh +owing one node of about 100. Nmap scan report for somenode.somedomain.com (192.x.x.x) Host is up (0.032s latency). Not shown: 974 closed ports PORT STATE SERVICE 53/tcp open domain ... 49160/tcp open unknown MAC Address: 24:34:E4:57:aB:BC (some company) Device type: general purpose Running: Microsoft Windows 7|2008 OS CPE: cpe:/o:microsoft:windows_7::- cpe:/o:microsoft:windows_7::sp1 cpe:/o:microsoft:windows_server_2008::sp1 cpe:/o:microsoft:windows_8 OS details: Microsoft Windows 7 SP0 - SP1, Windows Server 2008 SP1, or + Windows 8 Network Distance: 1 hop
      my $file = 'eginv.txt'; { local( $/ ) ; open( my $fh, $file ) or die "Oops file dead\n"; my $TEXT = <$fh>; my $IPADDR=(); my $MACADDR=(); my $RUN=(); my $OSDETL=(); my $HOP=(); my @MACADR=(); my @IPADR=(); (@IPADR,@MACADR) = $TEXT =~ /(192\.168\.1\.[\d]+)...([A-Fa-f0-9]{2}:[A +-Fa-f0-9]{2}:[A-Fa-f0-9]{2}:[A-Fa-f0-9]{2}:[A-Fa-f0-9]{2}:[A-Fa-f0-9] +{2})/gs; foreach my $ial (@IPADR) { print "$ial\n"; } # foreach my $mal (@MACADR) { # print "$mal\n"; # } }
        As a wise man once said, 'you can't just make s**t up, and expect the computer to understand.'

        Your three dots between the captures means the second capture will fail unless there are exactly 3 anythings (characters between the first and the second item you're trying to capture.

        Please, do as others have suggested: use perldoc to read the basic regex docs ( perlrequick, perlretut and perlredoc ) until you truly understand at least those elements you're trying to use.


        Questions containing the words "doesn't work" (or their moral equivalent) will usually get a downvote from me unless accompanied by:
        1. code
        2. verbatim error and/or warning messages
        3. a coherent explanation of what "doesn't work actually means.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1081007]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (8)
As of 2014-12-27 07:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (176 votes), past polls