Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
Do you know where your variables are?
 
PerlMonks  

Reading tab/whitespace delimited text file

by reaper9187 (Scribe)
on Oct 21, 2012 at 18:34 UTC ( #1000233=perlquestion: print w/ replies, xml ) Need Help??
reaper9187 has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone,
There's a particular problem thats been bugging me for quite a while now. I have to read a text file separated by all sorts of tabs and whitespaces. I need to read the strings individually and check values.
How do i remove the whitespaces/tabs and is it possible to compare the strings/values in each line individually ??? Eg: in the sample below, i have to check if value of LOHYST is >= 3. If yes , then print it else move on to the next string. Is it possible ?? Seeking your wisdom
I've attached a section of the file below
TRHYST TROFFSETP TROFFSETN AWOFFSET BQOFFSET 2 0 5 3 HIHYST LOHYST OFFSETP OFFSETN BQOFFSETAFR 5 3 0 3 CELLR DIR CAND CS LUC083A MUTUAL BOTH NO

Comment on Reading tab/whitespace delimited text file
Download Code
Re: Reading tab/whitespace delimited text file
by LanX (Abbot) on Oct 21, 2012 at 19:36 UTC
    So what did you try?

    What about always reading 3 lines, and building a hash which assigns the values in the second line to the keys in the first?

    What did fail?

    If its just about decomposing did you try split /\t/, $line ?

    Cheers Rolf

      Hi, Thank you for the quick reply
      Jus trying to write a script for it. I'm new to perl. I have limited knowledge of hash and keys and how to use it .. Can you suggest something .??
Re: Reading tab/whitespace delimited text file
by BrowserUk (Pope) on Oct 21, 2012 at 20:46 UTC

    A space delimited file with spaces as fillers and absent fields? If so, you've got a nasty problem on your hands.

    If on the other hand, the fields are tab separated and space filled, that is a much simpler proposition. (But that's not what I see when I c&p your sample.)

    Is that the entire file or just one section? If the latter, you really need to show us at least 2 or 3 sections so we can see what separates them.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

      Thanks a lot for helping me
      As i said earlier, the above code is only a section of the entire file. There are multiple such sections

      SCTYPE SSDESDL QDESDL LCOMPDL QCOMPDL UL 90 30 5 55 BSPWRMINP BSPWRMINN 20
      . .
      CELL SCTYPE LWACH1A ACTIVE CHTYPE CHRATE SPV LVA ACL NCH YES BCCH 1 A3 1 SDCCH 0 A3 15 TCH FR 1 0 A3 13 TCH FR 2 0 A3 13 TCH FR 3 0 A3 13 TCH HR 1 0 A3 26 TCH HR 3 0 A3 26 CBCH 0 A3 1
      . .
      CELL LOL LOLHYST TAOL TAOLHYST LUC082A 120 3 61 0 DTCBP DTCBN DTCBHYST NDIST NNCELLS 4 2 10 1
      . .
      ACTIVE CHTYPE CHRATE SPV LVA ACL NCH YES BCCH 0 A3 0 SDCCH 0 A3 0 TCH FR 1 16 A3 32 TCH FR 2 0 A3 32


      Again the file is pretty large and i cannot mention all of the formats. Just need to get an idea on how to do it. I can then extend it over the entire file

        Yuck! I thought (hoped) that this type of file format -- mixed, fixed-format records -- had died long ago; but they seem to keep reinventing it :)

        For your first example, the trick is to define a regex that will match the fields in the header line:

        my $reHeader = '(\b\w+\s*)?' x 10; ## Adjust the repeat value to cover + the maximum no of fields

        and use that to construct an unpack template to parse the following values line.

        This is not 'nice code', but it demostrates the technique:

        #! perl -slw use strict; use Data::Dump qw[ pp ]; my $reHeader = '(\b\w+\s*)?' x 10; my %data; until( eof( DATA ) ) { ## Read the header line and remove the newline chomp( my $header = <DATA> ); ## parse the fields using the regex, ignoring undefined fields my @keys = grep defined, $header =~ $reHeader; ## trim the trailing whitespace from the keys s[\s*$][] for @keys; ## Use the capture position arrays (@- & @+) ## to work out the field widths and construct a template my $tmpl = join ' ', map{ defined( $-[$_] ) ? do{ my $n = $+[$_] - $-[$_]; "a$n" } : () } 1 .. $#+; ## read and chomp the values line chomp( my $vals = <DATA> ); ## Extract the value fields using the template my @vals = unpack $tmpl, $vals; ## trim leading & trailing whitespace s[^\s*][],s[\s*$][] for @vals; ## Add the key/value pairs to the hash @data{ @keys } = @vals; ## discard the blank line between the grouped pairs of lines. <DATA>; } pp \%data; ## display the hash constructed __DATA__ TRHYST TROFFSETP TROFFSETN AWOFFSET BQOFFSET 2 0 5 3 HIHYST LOHYST OFFSETP OFFSETN BQOFFSETAFR 5 3 0 3 CELLR DIR CAND CS LUC083A MUTUAL BOTH NO

        Outputs:

        C:\test>junk79 { AWOFFSET => 5, BQOFFSET => 3, BQOFFSETAFR => 3, CAND => "BOTH", CELLR => "LUC083A", CS => "NO", DIR => "MUTUAL", HIHYST => 5, LOHYST => 3, OFFSETN => "", OFFSETP => 0, TRHYST => 2, TROFFSETN => "", TROFFSETP => 0, }

        Extending that to apply it to all your other sections will require a little ingenuity and a lot of painstaking testing.

        I do hope for your sake that the number and ordering of the different sections is well-defined, else you've got an even worse task on your hands.

        Note:This assumes that field names do not contain spaces. If they do, you are in shit street.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        RIP Neil Armstrong

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1000233]
Approved by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2014-04-21 01:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (489 votes), past polls