Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

The best way to split tab delimited file

by Ratna_Ranjan (Novice)
on Nov 16, 2009 at 15:17 UTC ( #807478=perlquestion: print w/ replies, xml ) Need Help??
Ratna_Ranjan has asked for the wisdom of the Perl Monks concerning the following question:

I have tab delimited text file,which i split with tabs store it in array and display.One of the column data has a tab in between,so instead of taking the column data as a whole it splits that also,here is my data
data name aliasname, alsoknown class type

output

data name aliasname, alsoknown class type
what i want is data name aliasname, alsoknown class type

here is the code that i have tried

#!/usr/bin/perl use strict; my $var; open(FH,"sample.txt")or die("can't open file:$!"); while($var=<FH>) { my @vareach=split("\t",$var); for my $each(@vareach) { print "$each\n"; } }

any suggestions on how to split these kinds of data?

Comment on The best way to split tab delimited file
Select or Download Code
Re: The best way to split tab delimited file
by BioLion (Curate) on Nov 16, 2009 at 15:27 UTC

    You should check out Text::Delimited for parsing delimited text!

    It allows you use column headers, extract columns, extract it into various datastructures etc... taking away a lot of the pain!

    For getting all the columsn back as an array ref (columns in orginal order) see the read() method and it's __DATA__ key, or you can ask for a particular column.

    Updated: Last para.

    Just a something something...
Re: The best way to split tab delimited file
by Utilitarian (Vicar) on Nov 16, 2009 at 15:33 UTC
    if the comma is a fixed aspect of this file, then
    @fields = split (/[^,]\t/,$record)
    would probably do what you need, but BioLion's suggestion above might make your life easier further down the road, as more "little quirks" emerge in the data

      By placing the comma in a negated character class you lose any character preceding the tab, apart from a comma, in the resultant array because it becomes part of the split term. Placing the comma in a negative zero-width look-behind, as gmargo does here, the characters are retained.

      $ perl -le ' > $txt = qq{abc\tdef,\tghi\tjkl\tmno}; > print for split m{[^,]\t}, $txt; > print q{-} x 20; > print for split m{(?<!,)\t}, $txt;' ab def, gh jk mno -------------------- abc def, ghi jkl mno $

      I hope this is of interest.

      Cheers,

      JohnGG

Re: The best way to split tab delimited file
by gmargo (Hermit) on Nov 16, 2009 at 15:35 UTC

    Perfect opportunity for a negative lookbehind.

    my @vareach=split(/(?<!,)\t/,$var);
      Below is the data,where the regex is not working..
      my $var='474627 asidase ta sidase ala,"lpha-D- ctoside gtohydrolase +","razyme","arazyme (enz Corp)","Melie","lagal","idase bta", + rug 00103';

        There are no tab characters in that line. Presumably due to a cut/paste issue. Can you try again to get the tabs in there? And also show the output you expect vs the output you are getting?

Re: The best way to split tab delimited file
by keszler (Priest) on Nov 16, 2009 at 15:41 UTC
    split uses a regex, not a string. (It is accepting your tab-in-a-string as a regex, but IMHO it shouldn't.)

    Given that you can use a regex, it looks like in this particular case you want to split on tabs, unless the tab is preceeded by a comma. There are several ways to construct such a regex. A couple off the top of my head:

    • Negated character class comma preceeding tab
    • Negative lookbehind for comma

    See perlretut for more ideas

    update - too slow. Those two ideas were already mentioned.

Re: The best way to split tab delimited file
by Anonymous Monk on Nov 17, 2009 at 23:37 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://807478]
Approved by BioLion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (9)
As of 2014-07-22 22:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (129 votes), past polls