Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Adaptive parser for tab delimited text file

by reaper9187 (Scribe)
on Nov 01, 2012 at 05:30 UTC ( #1001772=perlquestion: print w/ replies, xml ) Need Help??
reaper9187 has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone, I need to read certain portions of a text file as shown in the code below. I extract values from the hash arrays constructed from this portion. It would have been easy if the portion had been of fixed size (i.e say, from line 50 to line 100). As it turns out, the module size can be increased or decreased based on the user as he adds new cells . Is it possible to formulate an adaptive parser for such a case ..??? Please help
NEIGHBOUR RELATION DATA CELL LUC325C CELLR DIR CAND CS LUC325B MUTUAL BOTH NO KHYST KOFFSETP KOFFSETN LHYST LOFFSETP LOFFSETN 3 0 3 0 TRHYST TROFFSETP TROFFSETN AWOFFSET BQOFFSET 2 0 5 3 HIHYST LOHYST OFFSETP OFFSETN BQOFFSETAFR 5 3 0 3 CELLR DIR CAND CS LUC116A MUTUAL BOTH NO KHYST KOFFSETP KOFFSETN LHYST LOFFSETP LOFFSETN 3 0 3 0 TRHYST TROFFSETP TROFFSETN AWOFFSET BQOFFSET 2 0 5 3 HIHYST LOHYST OFFSETP OFFSETN BQOFFSETAFR 5 3 0 3 CELLR DIR CAND CS LUC204A MUTUAL BOTH NO KHYST KOFFSETP KOFFSETN LHYST LOFFSETP LOFFSETN 3 0 3 0 TRHYST TROFFSETP TROFFSETN AWOFFSET BQOFFSET 2 0 5 3 HIHYST LOHYST OFFSETP OFFSETN BQOFFSETAFR 5 3 0 3 CELL LUC082B

Comment on Adaptive parser for tab delimited text file
Download Code
Re: Adaptive parser for tab delimited text file
by Anonymous Monk on Nov 01, 2012 at 07:16 UTC
Re: Adaptive parser for tab delimited text file
by BrowserUk (Pope) on Nov 01, 2012 at 08:25 UTC

    Is this the same problem?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

      No.. Its different.. The earlier issues was resolved and it works fine now. My current problem is tht i need to read a particular section of text from the entire file and then iteratively perform more checks to determine the size of the block and start and end limters to find out the size of block i need to be working on (this size is variable and it changes as and when the user makes changes).
      To make it more understandable, the execution of code follows these steps: 1. identify the keyword "CELL" to determine the start of the particular block
      2. match the cell name(next line of CELL,eg LUC123A in this case)
      3. Identify the block size (in terms of number of lines ) between the two limiters ,i.e first and second occurence of the word "CELL"

      I keep getting confused for some reason. i'd really like some insight .. Sorry for the trouble

        Are you searching the file for just this block(s)? Or do you 'encounter' this block as your process the other parts of the file?

        Are the last two lines of your (extremely minimalist) sample indication that another matching Section follows?

        Oooh. The sample data just changed out of all recognition.............Oh well. Ignore my questions, they are no longer meaningful.

        Best of luck.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        RIP Neil Armstrong

Re: Adaptive parser for tab delimited text file
by choroba (Abbot) on Nov 01, 2012 at 12:35 UTC
    This works for me. There might be some glitches in the input format not shown in the sample, though.
    #!/usr/bin/perl use warnings; use strict; use Data::Dumper; <>; # Skip the first + line. my %hash; my ($cell, $cellr); while (my $header = <>) { if ($header =~ /\b/) { if ($header =~ /^CELL\s*$/) { $cell = <>; chomp $cell; next; } my @pos; my $flip; while ($header =~ /(\b|$)/g) { push @pos, $-[0] if ++$flip % 2; # Remember the p +osition where a field starts. } chomp (my $value_line = <>); if (length $header <= length $value_line) { $pos[-1] = length $value_line; # Do not clip th +e value line if longer then header. } else { $value_line .= ' ' x (length($header)); # Do not die if +the last fields are empty. } my @values; my @fields = split /\s+/, $header; for my $i (0 .. $#fields) { push @values, substr $value_line, $pos[$i], $pos[$i+1] - $ +pos[$i]; } s/^ +| +$//g for @values; if ($fields[0] eq 'CELLR') { $cellr = shift @values; shift @fields; } $hash{$cell}{$cellr}{$_} = shift @values for @fields; } } print Dumper \%hash;
    Updated: Keep the whole structure in one hash.
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Adaptive parser for tab delimited text file
by roboticus (Canon) on Nov 01, 2012 at 12:51 UTC

    reaper9187:

    If you don't mind storing your data in a hash, it doesn't have to be adaptive. Instead you can parse the line pairs as hash keys & values respectively. Something like this:

    $ cat t.pl #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %CELLS; my $tmp; my $curCELL; my $curCELLR; while (<DATA>) { # skip blank lines next if /^\s*$/; # Line isn't empty, so current line is a list of field names # and the next line is a list of the values my @names = split /\s+/, $_; my @values = split /\s+/, <DATA>; $curCELL = $values[0] if $names[0] eq 'CELL'; $curCELLR = $values[0] if $names[0] eq 'CELLR'; # Store the name/value pairs (if we have both keys) next unless defined $curCELL and defined $curCELLR; @{$CELLS{$curCELL}{$curCELLR}}{@names} = @values; } print Dumper(\%CELLS); __DATA__ NEIGHBOUR RELATION DATA CELL LUC325C CELLR DIR CAND CS LUC325B MUTUAL BOTH NO KHYST KOFFSETP KOFFSETN LHYST LOFFSETP LOFFSETN 3 0 3 0 TRHYST TROFFSETP TROFFSETN AWOFFSET BQOFFSET 2 0 5 3 HIHYST LOHYST OFFSETP OFFSETN BQOFFSETAFR 5 3 0 3 CELLR DIR CAND CS LUC116A MUTUAL BOTH NO KHYST KOFFSETP KOFFSETN LHYST LOFFSETP LOFFSETN 3 0 3 0 TRHYST TROFFSETP TROFFSETN AWOFFSET BQOFFSET 2 0 5 3 HIHYST LOHYST OFFSETP OFFSETN BQOFFSETAFR 5 3 0 3 CELLR DIR CAND CS LUC204A MUTUAL BOTH NO KHYST KOFFSETP KOFFSETN LHYST LOFFSETP LOFFSETN 3 0 3 0 TRHYST TROFFSETP TROFFSETN AWOFFSET BQOFFSET 2 0 5 3 HIHYST LOHYST OFFSETP OFFSETN BQOFFSETAFR 5 3 0 3 CELL LUC082B

    When I run it, I get the following:

    $ perl t.pl $VAR1 = { 'LUC325C' => { 'LUC204A' => { 'KHYST' => '', 'OFFSETN' => '0', 'TROFFSETN' => '0', 'OFFSETP' => '3', 'LOFFSETN' => undef, 'DIR' => 'MUTUAL', 'LOHYST' => '5', 'LHYST' => '3', 'CAND' => 'BOTH', 'BQOFFSET' => '3', 'LOFFSETP' => '0', 'CELLR' => 'LUC204A', 'KOFFSETN' => '0', 'KOFFSETP' => '3', 'TRHYST' => '', 'HIHYST' => '', 'CS' => 'NO', 'AWOFFSET' => '5', 'BQOFFSETAFR' => '3', 'TROFFSETP' => '2' }, 'LUC116A' => { 'KHYST' => '', 'OFFSETN' => '0', 'TROFFSETN' => '0', 'OFFSETP' => '3', 'LOFFSETN' => undef, 'DIR' => 'MUTUAL', 'LOHYST' => '5', 'LHYST' => '3', 'CAND' => 'BOTH', 'BQOFFSET' => '3', 'LOFFSETP' => '0', 'CELLR' => 'LUC116A', 'KOFFSETN' => '0', 'KOFFSETP' => '3', 'TRHYST' => '', 'HIHYST' => '', 'CS' => 'NO', 'AWOFFSET' => '5', 'BQOFFSETAFR' => '3', 'TROFFSETP' => '2' }, 'LUC325B' => { 'KHYST' => '', 'OFFSETN' => '0', 'TROFFSETN' => '0', 'OFFSETP' => '3', 'LOFFSETN' => undef, 'DIR' => 'MUTUAL', 'LOHYST' => '5', 'LHYST' => '3', 'CAND' => 'BOTH', 'BQOFFSET' => '3', 'LOFFSETP' => '0', 'CELLR' => 'LUC325B', 'KOFFSETN' => '0', 'KOFFSETP' => '3', 'TRHYST' => '', 'HIHYST' => '', 'CS' => 'NO', 'AWOFFSET' => '5', 'BQOFFSETAFR' => '3', 'TROFFSETP' => '2' } }, 'LUC082B' => { 'LUC204A' => { 'CELL' => 'LUC082B' } } }; $

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      Your code does not handle the empty fields correctly. KOFFSETP is always 0 in the data, but is 3 in your output.
      لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

        choroba:

        Thanks, I didn't notice. I think I'll leave it alone, though, as you've already posted a working one, and it's only a proof-of-concept anyway.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1001772]
Approved by cjb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (11)
As of 2014-08-20 22:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (124 votes), past polls