imrags has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,
My input is something like this:
2345 IP NAME.com STATUS IPADDRESS1 2345/1243 IP Name-interface.com STATUS IPADDRESS2 2345/3213 IP NAME-interfce2.com STATUS IPADDRESS1 2345/1212 IP Name-interface3.com STATUS IPADDRESS3 4321 IP CNAME.com STATUS IPADDRESS_1 4321/1643 IP CName-interface.com STATUS IPADDRESS_1 4321/3673 IP CNAME-interfce2.com STATUS IPADDRESS_2
Here, ip address of 2345 and 2345/3213 are same &
ip address of 4321 and 4321/1643 are same.
I looked at some multi-line matching..but i'm not able to figure out
how it should be done. I was thinking of using hashes
with hashkey as the ipaddress however, since the number
of entries are too high, i don't think that is feasible as
it will consume lot of memory (that's what i think).
What should i use for this kind of input? Tie or hashes or normal arrays???
Update:
The output I was looking for was:
2345/3213 IP Name-interface2.com STATUS IPADDRESS2 4321/1643 IP CName-interface.com STATUS IPADDRESS_1
I'm sorry for the delay in reply...Outage et all...
The output should contain only those which have duplicate IP address
that too the duplicate one (second one).
Raghu

Replies are listed 'Best First'.
Re: Multi line matching
by targetsmart (Curate) on May 15, 2009 at 10:42 UTC
    Good input, but what is the desired output? that will give us some clarity on what you are trying to match and store. What code have you tried?.

    Vivek
    -- In accordance with the prarabdha of each, the One whose function it is to ordain makes each to act. What will not happen will never happen, whatever effort one may put forth. And what will happen will not fail to happen, however much one may seek to prevent it. This is certain. The part of wisdom therefore is to stay quiet.
Re: Multi line matching
by McDarren (Abbot) on May 15, 2009 at 12:01 UTC

    So you have:

    2345 IP NAME.com STATUS IPADDRESS1 2345/3213 IP NAME-interfce2.com STATUS IPADDRESS1
    Which you say have the same "ip address".

    Okay, so what do you want to do with them?
    Do you want to keep one line and throw the other line away?
    If so, which one do you want to keep, and why? What are the rules to this game?

    If you are indeed wanting to remove "duplicates", then a hash is almost certainly what you need.

    Update: I'll take a punt and guess that you want to remove those lines that describe hosts, but not interfaces. If that's the case, then it could be as simple as:

    #!/usr/bin/perl use strict; use warnings; while (<DATA>) { next if /^\d{4}\s/; # A million ways to do this, but it depends wh +at the rest of your data looks like print; } __DATA__ 2345 IP NAME.com STATUS IPADDRESS1 2345/1243 IP Name-interface.com STATUS IPADDRESS2 2345/3213 IP NAME-interfce2.com STATUS IPADDRESS1 2345/1212 IP Name-interface3.com STATUS IPADDRESS3 4321 IP CNAME.com STATUS IPADDRESS_1 4321/1643 IP CName-interface.com STATUS IPADDRESS_1 4321/3673 IP CNAME-interfce2.com STATUS IPADDRESS_2
    Which would give you:
    2345/1243 IP Name-interface.com STATUS IPADDRESS2 2345/3213 IP NAME-interfce2.com STATUS IPADDRESS1 2345/1212 IP Name-interface3.com STATUS IPADDRESS3 4321/1643 IP CName-interface.com STATUS IPADDRESS_1 4321/3673 IP CNAME-interfce2.com STATUS IPADDRESS_2
    Is that what you are looking for?

    Cheers,
    Darren

Re: Multi line matching
by roboticus (Chancellor) on May 15, 2009 at 12:30 UTC
    raghu:

    It may consume a good bit of memory, but why worry about it until you *know* that it's a problem? If you know it is, then put the data in a database and let it do the heavy lifting for you. And just because I'm bored:

    #!/usr/bin/perl -w use strict; use warnings; my %IPs; while(<DATA>) { if (m#^(\d+)([\s\d/]+)IP\s+([^\s]+)\s+([^\s]+)\s+([^\s]+)#) { push @{$IPs{$1}}, sprintf("%-10s %-24s %s", $1 . $2, $3, $4); } } print "KEY \tIP NAME STATUS\n" . "------\t---------- ------------------------ --------------\n"; for my $IP (sort keys %IPs) { print sprintf("%-6s\t",$IP), join("\n\t", sort @{$IPs{$IP}}), "\n"; } __DATA__ 2345 IP NAME.com online IPADDRESS1 2345/1243 IP Name-interface.com inactive IPADDRESS2 2345/3213 IP NAME-interfce2.com online IPADDRESS1 2345/1212 IP Name-interface3.com online IPADDRESS3 4321 IP CNAME.com dead IPADDRESS_1 4321/1643 IP CName-interface.com online IPADDRESS_1 4321/3673 IP CNAME-interfce2.com online IPADDRESS_2

    gives us

    KEY IP NAME STATUS ------ ---------- ------------------------ -------------- 2345 2345 NAME.com online 2345/1212 Name-interface3.com online 2345/1243 Name-interface.com inactive 2345/3213 NAME-interfce2.com online 4321 4321 CNAME.com dead 4321/1643 CName-interface.com online 4321/3673 CNAME-interfce2.com online
    ...roboticus
Re: Multi line matching
by dwm042 (Priest) on May 15, 2009 at 18:16 UTC
    A hash seems ideal here. But since the OP is worried about space, I was going to suggest a tied hash of some kind. After looking at this thread, however, I'm more inclined to agree with roboticus and suggest that if you really have a size issue with your data, that you try putting it into a database (MySQL, Postgres, etc).