Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Extract Data Between Lines

by namelkcip (Initiate)
on Apr 04, 2014 at 02:26 UTC ( #1081059=perlquestion: print w/ replies, xml ) Need Help??
namelkcip has asked for the wisdom of the Perl Monks concerning the following question:

I have sample data such as:
object network Microsoft.Lync.Host.3 host 138.108.25.111 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.4 host 138.108.25.112 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.5 host 138.108.25.113 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.6
I need to extract each object into its own hash that I'll then roll up into an AoH. I have tried changing the delimiter $/ but I think that can only be a static string and each 'object network' line finishes with a unique name. I couldn't find any way to make the delimiter regex based (which I think would solve my problem unless the delimiter is removed/consumed when parsing records). I next thought about reading the whole file to an array and processing records in line batches. However some objects have a description, making it a batch of 3 lines, whereas some don't making it a batch of 2. I suppose I could do some work processing the array to look for whether or not the line has a description, but I feel like I am way over-thinking this. I also tried to extract text between lines, but I don't have unique START/END variables like examples showing the use of range /START/ .. /END/. I tried anyways using a regex match looking for just text between groupings of /object network/ but that was exclusive/destructive and I began to lose records. I can post some code if necessary.

I'm sure one of the Monks here has a solution! Thanks in advance.

Comment on Extract Data Between Lines
Select or Download Code
Re: Extract Data Between Lines
by Athanasius (Monsignor) on Apr 04, 2014 at 02:45 UTC

    Hello namelkcip, and welcome to the Monastery!

    Here is one approach:

    #! perl use strict; use warnings; my $record; while (<DATA>) { if (/^object network/) # new record { process_record($record) if $record; $record = $_; } else { $record .= $_; } } process_record($record) if $record; sub process_record { my ($record) = @_; print $record, '-' x 36, "\n"; } __DATA__ object network Microsoft.Lync.Host.3 host 138.108.25.111 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.4 host 138.108.25.112 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.5 host 138.108.25.113 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.6

    Output:

    12:41 >perl 891_SoPW.pl object network Microsoft.Lync.Host.3 host 138.108.25.111 description Help Desk Ticket #476739 ------------------------------------ object network Microsoft.Lync.Host.4 host 138.108.25.112 description Help Desk Ticket #476739 ------------------------------------ object network Microsoft.Lync.Host.5 host 138.108.25.113 description Help Desk Ticket #476739 ------------------------------------ object network Microsoft.Lync.Host.6 ------------------------------------ 12:41 >

    This handles records of variable line lengths, while still only reading the data line-by-line.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Extract Data Between Lines
by LanX (Canon) on Apr 04, 2014 at 02:52 UTC
    TIMTOWTDI :)
    use warnings; use strict; use Data::Dump; my $h; my @a; while(<DATA>) { if (/^object (.*)/) { if ($h) { push @a,$h; undef $h; } $h->{object}=$1; } $h->{host} = $1 if /^\s+host (.*)/; $h->{description} = $1 if /^\s+description (.*)/; } push @a,$h if $h; dd \@a; __DATA__ object network Microsoft.Lync.Host.3 host 138.108.25.111 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.4 host 138.108.25.112 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.5 host 138.108.25.113 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.6

    Cheers Rolf

    ( addicted to the Perl Programming Language)

Re: Extract Data Between Lines
by LanX (Canon) on Apr 04, 2014 at 03:19 UTC
    maybe easier to read: :)

    use Data::Dump; my $h; my @a; while(<DATA>) { if (/^object (.*)/) { $h = {}; push @a,$h; $h->{object} = $1; } $h->{$1} = $2 if /^\s+(\w+) (.*)/; } dd \@a; __DATA__ object network Microsoft.Lync.Host.3 host 138.108.25.111 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.4 host 138.108.25.112 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.5 host 138.108.25.113 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.6

    [ { description => "Help Desk Ticket #476739", host => "138.108.25.111", object => "network Microsoft.Lync.Host.3", }, { description => "Help Desk Ticket #476739", host => "138.108.25.112", object => "network Microsoft.Lync.Host.4", }, { description => "Help Desk Ticket #476739", host => "138.108.25.113", object => "network Microsoft.Lync.Host.5", }, { object => "network Microsoft.Lync.Host.6" }, ]

    Cheers Rolf

    ( addicted to the Perl Programming Language)

Re: Extract Data Between Lines
by wazat (Beadle) on Apr 04, 2014 at 03:38 UTC

    While the other suggestions are superior, you don't need to use a regex with $/

    use strict; use warnings; $/ = "\nobject "; while (<DATA>) { chomp; print "'$_'\n" } __DATA__ object network Microsoft.Lync.Host.3 host 138.108.25.111 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.4 host 138.108.25.112 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.5 host 138.108.25.113 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.6

    output

    'object network Microsoft.Lync.Host.3 host 138.108.25.111 description Help Desk Ticket #476739' 'network Microsoft.Lync.Host.4 host 138.108.25.112 description Help Desk Ticket #476739' 'network Microsoft.Lync.Host.5 host 138.108.25.113 description Help Desk Ticket #476739' 'network Microsoft.Lync.Host.6 '

    The first record needs to be massaged since object isn't preceded by a newline.

Re: Extract Data Between Lines
by kcott (Abbot) on Apr 04, 2014 at 06:17 UTC

    G'day namelkcip,

    Welcome to the monastery.

    Here's how you can use $/ to get each "object ..." as a record; specify which fields you want; and ignore fields (e.g. description) which don't exist in any given object.

    Note how $/ is localised within an anonymous block. This makes $/ = "\nobject" a temporary value which only affects the while (<DATA>) {...} loop; elsewhere in your script, $/ will have its default value. See "Temporary Values via local()" for more details.

    #!/usr/bin/env perl use strict; use warnings; my @objects; my @fields = qw{network host description}; { local $/ = "\nobject"; while (<DATA>) { my %object; for my $field (@fields) { / $field \s ( .*? ) $ /mx and $object{$field} = $1; } push @objects, \%object if keys %object; } } use Data::Dump; dd \@objects; __DATA__ object network Microsoft.Lync.Host.3 host 138.108.25.111 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.4 host 138.108.25.112 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.5 host 138.108.25.113 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.6

    Output:

    [ { description => "Help Desk Ticket #476739", host => "138.108.25.111", network => "Microsoft.Lync.Host.3", }, { description => "Help Desk Ticket #476739", host => "138.108.25.112", network => "Microsoft.Lync.Host.4", }, { description => "Help Desk Ticket #476739", host => "138.108.25.113", network => "Microsoft.Lync.Host.5", }, { network => "Microsoft.Lync.Host.6" }, ]

    -- Ken

Re: Extract Data Between Lines
by hdb (Parson) on Apr 04, 2014 at 12:59 UTC

    ...and here is my favorite way of doing it...

    use strict; use warnings; use Data::Dumper; my @data; while(<DATA>){ push @data, { object => $1 } if /^object (.*)/; $data[-1]->{$1} = $2 if /^ (.*?) (.*)/; } print Dumper \@data; __DATA__ object network Microsoft.Lync.Host.3 host 138.108.25.111 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.4 host 138.108.25.112 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.5 host 138.108.25.113 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.6
Re: Extract Data Between Lines
by namelkcip (Initiate) on Apr 04, 2014 at 14:31 UTC
    This is why I love perl monks! Thank you all! I knew it could be done more easily than what I was struggling with. I also see that some of your responses show me that I was on the right track, just missing a bit. Thanks again!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1081059]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (12)
As of 2014-07-30 08:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (229 votes), past polls