Reading tab/whitespace delimited text file

reaper9187 has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone,
There's a particular problem thats been bugging me for quite a while now. I have to read a text file separated by all sorts of tabs and whitespaces. I need to read the strings individually and check values.
How do i remove the whitespaces/tabs and is it possible to compare the strings/values in each line individually ??? Eg: in the sample below, i have to check if value of LOHYST is >= 3. If yes , then print it else move on to the next string. Is it possible ?? Seeking your wisdom
I've attached a section of the file below

TRHYST  TROFFSETP  TROFFSETN  AWOFFSET  BQOFFSET
 2       0                     5         3

HIHYST  LOHYST  OFFSETP  OFFSETN  BQOFFSETAFR
 5       3       0                 3

CELLR     DIR     CAND   CS
LUC083A   MUTUAL  BOTH   NO
[download]

Comment on Reading tab/whitespace delimited text file Download Code

Replies are listed 'Best First'.
Re: Reading tab/whitespace delimited text file by BrowserUk (Patriarch) on Oct 21, 2012 at 20:46 UTC
A space delimited file with spaces as fillers and absent fields? If so, you've got a nasty problem on your hands. If on the other hand, the fields are tab separated and space filled, that is a much simpler proposition. (But that's not what I see when I c&p your sample.) Is that the entire file or just one section? If the latter, you really need to show us at least 2 or 3 sections so we can see what separates them. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP Neil Armstrong	[reply]
Re^2: Reading tab/whitespace delimited text file by reaper9187 (Scribe) on Oct 22, 2012 at 05:31 UTC
Thanks a lot for helping me As i said earlier, the above code is only a section of the entire file. There are multiple such sections `SCTYPE SSDESDL QDESDL LCOMPDL QCOMPDL UL 90 30 5 55 BSPWRMINP BSPWRMINN 20` [download] . . `CELL SCTYPE LWACH1A ACTIVE CHTYPE CHRATE SPV LVA ACL NCH YES BCCH 1 A3 1 SDCCH 0 A3 15 TCH FR 1 0 A3 13 TCH FR 2 0 A3 13 TCH FR 3 0 A3 13 TCH HR 1 0 A3 26 TCH HR 3 0 A3 26 CBCH 0 A3 1` [download] . . `CELL LOL LOLHYST TAOL TAOLHYST LUC082A 120 3 61 0 DTCBP DTCBN DTCBHYST NDIST NNCELLS 4 2 10 1` [download] . . `ACTIVE CHTYPE CHRATE SPV LVA ACL NCH YES BCCH 0 A3 0 SDCCH 0 A3 0 TCH FR 1 16 A3 32 TCH FR 2 0 A3 32` [download] Again the file is pretty large and i cannot mention all of the formats. Just need to get an idea on how to do it. I can then extend it over the entire file	[reply] [d/l] [select]
Re^3: Reading tab/whitespace delimited text file by BrowserUk (Patriarch) on Oct 22, 2012 at 06:31 UTC
Yuck! I thought (hoped) that this type of file format -- mixed, fixed-format records -- had died long ago; but they seem to keep reinventing it :) For your first example, the trick is to define a regex that will match the fields in the header line: `my $reHeader = '(\b\w+\s)?' x 10; ## Adjust the repeat value to cover + the maximum no of fields` [download] and use that to construct an unpack template to parse the following values line. This is not 'nice code', but it demostrates the technique: #! perl -slw use strict; use Data::Dump qw[ pp ]; my $reHeader = '(\b\w+\s)?' x 10; my %data; until( eof( DATA ) ) { ## Read the header line and remove the newline chomp( my $header = <DATA> ); ## parse the fields using the regex, ignoring undefined fields my @keys = grep defined, $header =~ $reHeader; ## trim the trailing whitespace from the keys s[\s$][] for @keys; ## Use the capture position arrays (@- & @+) ## to work out the field widths and construct a template my $tmpl = join ' ', map{ defined( $-[$_] ) ? do{ my $n = $+[$_] - $-[$_]; "a$n" } : () } 1 .. $#+; ## read and chomp the values line chomp( my $vals = <DATA> ); ## Extract the value fields using the template my @vals = unpack $tmpl, $vals; ## trim leading & trailing whitespace s[^\s][],s[\s*$][] for @vals; ## Add the key/value pairs to the hash @data{ @keys } = @vals; ## discard the blank line between the grouped pairs of lines. <DATA>; } pp \%data; ## display the hash constructed __DATA__ TRHYST TROFFSETP TROFFSETN AWOFFSET BQOFFSET 2 0 5 3 HIHYST LOHYST OFFSETP OFFSETN BQOFFSETAFR 5 3 0 3 CELLR DIR CAND CS LUC083A MUTUAL BOTH NO [download] Outputs: `C:\test>junk79 { AWOFFSET => 5, BQOFFSET => 3, BQOFFSETAFR => 3, CAND => "BOTH", CELLR => "LUC083A", CS => "NO", DIR => "MUTUAL", HIHYST => 5, LOHYST => 3, OFFSETN => "", OFFSETP => 0, TRHYST => 2, TROFFSETN => "", TROFFSETP => 0, }` [download] Extending that to apply it to all your other sections will require a little ingenuity and a lot of painstaking testing. I do hope for your sake that the number and ordering of the different sections is well-defined, else you've got an even worse task on your hands. Note:This assumes that field names do not contain spaces. If they do, you are in shit street. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP Neil Armstrong	[reply] [d/l] [select]
Re^4: Reading tab/whitespace delimited text file by reaper9187 (Scribe) on Oct 22, 2012 at 06:55 UTC
Re^4: Reading tab/whitespace delimited text file by reaper9187 (Scribe) on Nov 01, 2012 at 12:38 UTC
Re^5: Reading tab/whitespace delimited text file by BrowserUk (Patriarch) on Nov 01, 2012 at 13:08 UTC
Some notes below your chosen depth have not been shown here
Re: Reading tab/whitespace delimited text file by LanX (Saint) on Oct 21, 2012 at 19:36 UTC
So what did you try? What about always reading 3 lines, and building a hash which assigns the values in the second line to the keys in the first? What did fail? If its just about decomposing did you try split /\t/, $line ? Cheers Rolf	[reply]
Re^2: Reading tab/whitespace delimited text file by reaper9187 (Scribe) on Oct 21, 2012 at 19:57 UTC
Hi, Thank you for the quick reply Jus trying to write a script for it. I'm new to perl. I have limited knowledge of hash and keys and how to use it .. Can you suggest something .??	[reply]

Back to Seekers of Perl Wisdom