http://www.perlmonks.org?node_id=1000233

reaper9187 has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone,
There's a particular problem thats been bugging me for quite a while now. I have to read a text file separated by all sorts of tabs and whitespaces. I need to read the strings individually and check values.
How do i remove the whitespaces/tabs and is it possible to compare the strings/values in each line individually ??? Eg: in the sample below, i have to check if value of LOHYST is >= 3. If yes , then print it else move on to the next string. Is it possible ?? Seeking your wisdom
I've attached a section of the file below
TRHYST TROFFSETP TROFFSETN AWOFFSET BQOFFSET 2 0 5 3 HIHYST LOHYST OFFSETP OFFSETN BQOFFSETAFR 5 3 0 3 CELLR DIR CAND CS LUC083A MUTUAL BOTH NO

Replies are listed 'Best First'.
Re: Reading tab/whitespace delimited text file
by BrowserUk (Patriarch) on Oct 21, 2012 at 20:46 UTC

    A space delimited file with spaces as fillers and absent fields? If so, you've got a nasty problem on your hands.

    If on the other hand, the fields are tab separated and space filled, that is a much simpler proposition. (But that's not what I see when I c&p your sample.)

    Is that the entire file or just one section? If the latter, you really need to show us at least 2 or 3 sections so we can see what separates them.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

      Thanks a lot for helping me
      As i said earlier, the above code is only a section of the entire file. There are multiple such sections

      SCTYPE SSDESDL QDESDL LCOMPDL QCOMPDL UL 90 30 5 55 BSPWRMINP BSPWRMINN 20
      . .
      CELL SCTYPE LWACH1A ACTIVE CHTYPE CHRATE SPV LVA ACL NCH YES BCCH 1 A3 1 SDCCH 0 A3 15 TCH FR 1 0 A3 13 TCH FR 2 0 A3 13 TCH FR 3 0 A3 13 TCH HR 1 0 A3 26 TCH HR 3 0 A3 26 CBCH 0 A3 1
      . .
      CELL LOL LOLHYST TAOL TAOLHYST LUC082A 120 3 61 0 DTCBP DTCBN DTCBHYST NDIST NNCELLS 4 2 10 1
      . .
      ACTIVE CHTYPE CHRATE SPV LVA ACL NCH YES BCCH 0 A3 0 SDCCH 0 A3 0 TCH FR 1 16 A3 32 TCH FR 2 0 A3 32


      Again the file is pretty large and i cannot mention all of the formats. Just need to get an idea on how to do it. I can then extend it over the entire file

        Yuck! I thought (hoped) that this type of file format -- mixed, fixed-format records -- had died long ago; but they seem to keep reinventing it :)

        For your first example, the trick is to define a regex that will match the fields in the header line:

        my $reHeader = '(\b\w+\s*)?' x 10; ## Adjust the repeat value to cover + the maximum no of fields

        and use that to construct an unpack template to parse the following values line.

        This is not 'nice code', but it demostrates the technique:

        #! perl -slw use strict; use Data::Dump qw[ pp ]; my $reHeader = '(\b\w+\s*)?' x 10; my %data; until( eof( DATA ) ) { ## Read the header line and remove the newline chomp( my $header = <DATA> ); ## parse the fields using the regex, ignoring undefined fields my @keys = grep defined, $header =~ $reHeader; ## trim the trailing whitespace from the keys s[\s*$][] for @keys; ## Use the capture position arrays (@- & @+) ## to work out the field widths and construct a template my $tmpl = join ' ', map{ defined( $-[$_] ) ? do{ my $n = $+[$_] - $-[$_]; "a$n" } : () } 1 .. $#+; ## read and chomp the values line chomp( my $vals = <DATA> ); ## Extract the value fields using the template my @vals = unpack $tmpl, $vals; ## trim leading & trailing whitespace s[^\s*][],s[\s*$][] for @vals; ## Add the key/value pairs to the hash @data{ @keys } = @vals; ## discard the blank line between the grouped pairs of lines. <DATA>; } pp \%data; ## display the hash constructed __DATA__ TRHYST TROFFSETP TROFFSETN AWOFFSET BQOFFSET 2 0 5 3 HIHYST LOHYST OFFSETP OFFSETN BQOFFSETAFR 5 3 0 3 CELLR DIR CAND CS LUC083A MUTUAL BOTH NO

        Outputs:

        C:\test>junk79 { AWOFFSET => 5, BQOFFSET => 3, BQOFFSETAFR => 3, CAND => "BOTH", CELLR => "LUC083A", CS => "NO", DIR => "MUTUAL", HIHYST => 5, LOHYST => 3, OFFSETN => "", OFFSETP => 0, TRHYST => 2, TROFFSETN => "", TROFFSETP => 0, }

        Extending that to apply it to all your other sections will require a little ingenuity and a lot of painstaking testing.

        I do hope for your sake that the number and ordering of the different sections is well-defined, else you've got an even worse task on your hands.

        Note:This assumes that field names do not contain spaces. If they do, you are in shit street.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        RIP Neil Armstrong

Re: Reading tab/whitespace delimited text file
by LanX (Saint) on Oct 21, 2012 at 19:36 UTC
    So what did you try?

    What about always reading 3 lines, and building a hash which assigns the values in the second line to the keys in the first?

    What did fail?

    If its just about decomposing did you try split /\t/, $line ?

    Cheers Rolf

      Hi, Thank you for the quick reply
      Jus trying to write a script for it. I'm new to perl. I have limited knowledge of hash and keys and how to use it .. Can you suggest something .??