Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation


by Marshall (Abbot)
on Sep 26, 2012 at 11:40 UTC ( #995749=note: print w/replies, xml ) Need Help??


A tab delimited file is one of the most horrific file formats that I could imagine. I cannot think of something worse than this. It is so hard that I won't even try to generate a tab delimited file, because my text editor just doesn't like to do that. But if you absolutely had to do that, the idea is shown below...

NEVER, NEVER EVER use a tab delimited file yourself - this is nasty stuff!
If you have fixed space fonts for this, then you cannot tell by just looking at this whether this is just spaces or even if there is a tab character in these lines!

123 aBVXC SAOMEWTRINOGN ABC 876 AsrdaDS some_bs 564 37897654 aofruafdouf abc <c> <c> #!/usr/bin/perl -w use strict; open (IN, '<', "tab_file.txt") or die "$!"; while (<IN>) { my ($first_token) = split (/\s/, $_); #should be(/\t/, $_) print $first_token,"\n"; } __END__ 123 876 37897654 tab_file.txt: (not really tabs)... 123 aBVXC SAOMEWTRINOGN ABC 876 AsrdaDS some_bs 564 37897654 aofruafdouf abc
When confronted with a tab delimted file, I would think about s/\t/|/g; or the tr equivalent! The '|' character is is just a FAR, FAR better field delimiter than a tab. Many Databases are done this way. Second choice would be a CSV format. A tab delimited file just has all things bad going for it - sorry if you have to deal with one of these things. Don't make one yourself!

Replies are listed 'Best First'.
by tobyink (Abbot) on Sep 26, 2012 at 12:31 UTC

    Huh? Tab, the character which was included in ASCII specifically for aligning tabular data is a "horrific" way of representing tabular data?

    Personally I find tab-delimited files to be very easy to deal with. With CSV data, the fields will often contain commas (addresses fields; some date formats and numeric formats) necessitating ways of "escaping" commas which vary between software packages. It is quite common to be in situations where you know that the fields themselves cannot contain \t or \n; and in those cases tab delimited data is a joy to work with. Want to slurp your data into a multi-dimensional array?

    my @data = map { chomp; split /\t/ } <$fh>;

    I'd suggest that if your text editor makes distinguishing between tabs and spaces difficult, then you should investigate other text editors.

    (Aside: yes, there are some very good CSV parsers for Perl which abstract away the nits when dealing with CSV. When you have to work with CSV in other programming languages you appreciate what a good job they do.)

    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
      The problem with a tab delimited file is that the tabs are hard to see in a normal text editor. Is that '" "\t"' or '" "\t" or whatever?

      So the basic problem is that tabs are not easily "visible". My programming editor also converts "tabs" to "spaces" when I write a program file. No program file that I work with has tab characters in it. When I "save it" all the tabs disappear.

      There is not a "standard" for the number of spaces for a tab character. In the "olden days", this made a difference because it saved disk space. This makes no difference now. Or in a practical sense, the space saving makes no difference. And it is "hard to read" the output.

      Many of the DB output formats that I work with use "|" as the field separator. That is not a valid character for a name or an address. This works well for many types of DB fields that you might want to import/export and you can just use a simple split() for input. Perl has a number of .CSV parsers and they do work very, very well. That is another option.

      This tab idea is a problem because it is hard to see! Yes, I can deal with it and I can set editor settings to allow me to see the difference between 2 spaces versus one space and tab, but this is a hassle.

        In many situations it's hard to tell the difference between lower-case L and the number one, or upper-case O and naught (the proximity of the latter pair on the keyboard makes this a particularly dangerous issue). But I don't eschew those characters; I choose fonts that make it easier to distinguish between them, and my text-editor's syntax highlighting will often (though not always) catch the difference. The tools can save you if you let them.

        Similarly my text editor has an easy toggle (Ctrl+Shift+A) which can be done with one hand (almost with one finger on this laptop keyboard!) to show or hide whitespace characters (and Ctrl+Shift+D does line break characters) when I need to do a quick visual check.

        But for the most part, when working on files that I've authored, I don't need to visually check because I know which characters will be tabs, and which will be spaces. In source code, the indents will all be tabs, and everywhere else will be spaces.

        'There is not a "standard" for the number of spaces for a tab character.'

        Indeed; that's kind of the point of them. You can set tab stops to whatever is most convenient for you. I like to use 3 column tab stops; other people might prefer 2, 4 or 8. If we all use a single tab character to indent source code, then we can all work on the same source code and see it with our preferred indentation.

        "Or in a practical sense, the space saving makes no difference."

        Indeed; if I were using tabs as a compression mechanism, I'd be an idiot. (Bzip2 works much better.) But that's not what I use them for; I use them because they make more sense in certain contexts (delimiting fields; indenting source code) than space characters. If there were a filesize penalty for using tabs, I'd continue to use them.

        perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://995749]
Corion had a meeting with some startup today. They have a very interesting DB proxy product, but their tech stack is really, really weird. They use the Pg wire protocol but not the Pg libraries to handle it. They support Pg SQL syntax, but don't use ...
[Corion]: ... the Pg parser (or so they claim).
[Corion]: Also, they rolled their own user management instead of supporting LDAP for user/role management, but that just shows that they're new in the enterprise market :)
Corion also just now realizes they didn't leave business cards.
[erix]: can you drop a name? :)

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (13)
As of 2018-04-19 12:19 GMT
Find Nodes?
    Voting Booth?