Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Extract Data Between Lines

by namelkcip (Initiate)
on Apr 04, 2014 at 02:26 UTC ( #1081059=perlquestion: print w/ replies, xml ) Need Help??
namelkcip has asked for the wisdom of the Perl Monks concerning the following question:

I have sample data such as:
object network Microsoft.Lync.Host.3 host 138.108.25.111 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.4 host 138.108.25.112 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.5 host 138.108.25.113 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.6
I need to extract each object into its own hash that I'll then roll up into an AoH. I have tried changing the delimiter $/ but I think that can only be a static string and each 'object network' line finishes with a unique name. I couldn't find any way to make the delimiter regex based (which I think would solve my problem unless the delimiter is removed/consumed when parsing records). I next thought about reading the whole file to an array and processing records in line batches. However some objects have a description, making it a batch of 3 lines, whereas some don't making it a batch of 2. I suppose I could do some work processing the array to look for whether or not the line has a description, but I feel like I am way over-thinking this. I also tried to extract text between lines, but I don't have unique START/END variables like examples showing the use of range /START/ .. /END/. I tried anyways using a regex match looking for just text between groupings of /object network/ but that was exclusive/destructive and I began to lose records. I can post some code if necessary.

I'm sure one of the Monks here has a solution! Thanks in advance.

Comment on Extract Data Between Lines
Select or Download Code
Re: Extract Data Between Lines
by Athanasius (Monsignor) on Apr 04, 2014 at 02:45 UTC

    Hello namelkcip, and welcome to the Monastery!

    Here is one approach:

    #! perl use strict; use warnings; my $record; while (<DATA>) { if (/^object network/) # new record { process_record($record) if $record; $record = $_; } else { $record .= $_; } } process_record($record) if $record; sub process_record { my ($record) = @_; print $record, '-' x 36, "\n"; } __DATA__ object network Microsoft.Lync.Host.3 host 138.108.25.111 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.4 host 138.108.25.112 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.5 host 138.108.25.113 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.6

    Output:

    12:41 >perl 891_SoPW.pl object network Microsoft.Lync.Host.3 host 138.108.25.111 description Help Desk Ticket #476739 ------------------------------------ object network Microsoft.Lync.Host.4 host 138.108.25.112 description Help Desk Ticket #476739 ------------------------------------ object network Microsoft.Lync.Host.5 host 138.108.25.113 description Help Desk Ticket #476739 ------------------------------------ object network Microsoft.Lync.Host.6 ------------------------------------ 12:41 >

    This handles records of variable line lengths, while still only reading the data line-by-line.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Extract Data Between Lines
by LanX (Canon) on Apr 04, 2014 at 02:52 UTC
    TIMTOWTDI :)
    use warnings; use strict; use Data::Dump; my $h; my @a; while(<DATA>) { if (/^object (.*)/) { if ($h) { push @a,$h; undef $h; } $h->{object}=$1; } $h->{host} = $1 if /^\s+host (.*)/; $h->{description} = $1 if /^\s+description (.*)/; } push @a,$h if $h; dd \@a; __DATA__ object network Microsoft.Lync.Host.3 host 138.108.25.111 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.4 host 138.108.25.112 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.5 host 138.108.25.113 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.6

    Cheers Rolf

    ( addicted to the Perl Programming Language)

Re: Extract Data Between Lines
by LanX (Canon) on Apr 04, 2014 at 03:19 UTC
    maybe easier to read: :)

    use Data::Dump; my $h; my @a; while(<DATA>) { if (/^object (.*)/) { $h = {}; push @a,$h; $h->{object} = $1; } $h->{$1} = $2 if /^\s+(\w+) (.*)/; } dd \@a; __DATA__ object network Microsoft.Lync.Host.3 host 138.108.25.111 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.4 host 138.108.25.112 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.5 host 138.108.25.113 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.6

    [ { description => "Help Desk Ticket #476739", host => "138.108.25.111", object => "network Microsoft.Lync.Host.3", }, { description => "Help Desk Ticket #476739", host => "138.108.25.112", object => "network Microsoft.Lync.Host.4", }, { description => "Help Desk Ticket #476739", host => "138.108.25.113", object => "network Microsoft.Lync.Host.5", }, { object => "network Microsoft.Lync.Host.6" }, ]

    Cheers Rolf

    ( addicted to the Perl Programming Language)

Re: Extract Data Between Lines
by wazat (Beadle) on Apr 04, 2014 at 03:38 UTC

    While the other suggestions are superior, you don't need to use a regex with $/

    use strict; use warnings; $/ = "\nobject "; while (<DATA>) { chomp; print "'$_'\n" } __DATA__ object network Microsoft.Lync.Host.3 host 138.108.25.111 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.4 host 138.108.25.112 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.5 host 138.108.25.113 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.6

    output

    'object network Microsoft.Lync.Host.3 host 138.108.25.111 description Help Desk Ticket #476739' 'network Microsoft.Lync.Host.4 host 138.108.25.112 description Help Desk Ticket #476739' 'network Microsoft.Lync.Host.5 host 138.108.25.113 description Help Desk Ticket #476739' 'network Microsoft.Lync.Host.6 '

    The first record needs to be massaged since object isn't preceded by a newline.

Re: Extract Data Between Lines
by kcott (Abbot) on Apr 04, 2014 at 06:17 UTC

    G'day namelkcip,

    Welcome to the monastery.

    Here's how you can use $/ to get each "object ..." as a record; specify which fields you want; and ignore fields (e.g. description) which don't exist in any given object.

    Note how $/ is localised within an anonymous block. This makes $/ = "\nobject" a temporary value which only affects the while (<DATA>) {...} loop; elsewhere in your script, $/ will have its default value. See "Temporary Values via local()" for more details.

    #!/usr/bin/env perl use strict; use warnings; my @objects; my @fields = qw{network host description}; { local $/ = "\nobject"; while (<DATA>) { my %object; for my $field (@fields) { / $field \s ( .*? ) $ /mx and $object{$field} = $1; } push @objects, \%object if keys %object; } } use Data::Dump; dd \@objects; __DATA__ object network Microsoft.Lync.Host.3 host 138.108.25.111 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.4 host 138.108.25.112 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.5 host 138.108.25.113 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.6

    Output:

    [ { description => "Help Desk Ticket #476739", host => "138.108.25.111", network => "Microsoft.Lync.Host.3", }, { description => "Help Desk Ticket #476739", host => "138.108.25.112", network => "Microsoft.Lync.Host.4", }, { description => "Help Desk Ticket #476739", host => "138.108.25.113", network => "Microsoft.Lync.Host.5", }, { network => "Microsoft.Lync.Host.6" }, ]

    -- Ken

Re: Extract Data Between Lines
by hdb (Parson) on Apr 04, 2014 at 12:59 UTC

    ...and here is my favorite way of doing it...

    use strict; use warnings; use Data::Dumper; my @data; while(<DATA>){ push @data, { object => $1 } if /^object (.*)/; $data[-1]->{$1} = $2 if /^ (.*?) (.*)/; } print Dumper \@data; __DATA__ object network Microsoft.Lync.Host.3 host 138.108.25.111 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.4 host 138.108.25.112 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.5 host 138.108.25.113 description Help Desk Ticket #476739 object network Microsoft.Lync.Host.6
Re: Extract Data Between Lines
by namelkcip (Initiate) on Apr 04, 2014 at 14:31 UTC
    This is why I love perl monks! Thank you all! I knew it could be done more easily than what I was struggling with. I also see that some of your responses show me that I was on the right track, just missing a bit. Thanks again!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1081059]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (16)
As of 2014-09-03 08:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (35 votes), past polls