Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked


by tobyink (Abbot)
on Sep 26, 2012 at 12:31 UTC ( #995760=note: print w/replies, xml ) Need Help??


Huh? Tab, the character which was included in ASCII specifically for aligning tabular data is a "horrific" way of representing tabular data?

Personally I find tab-delimited files to be very easy to deal with. With CSV data, the fields will often contain commas (addresses fields; some date formats and numeric formats) necessitating ways of "escaping" commas which vary between software packages. It is quite common to be in situations where you know that the fields themselves cannot contain \t or \n; and in those cases tab delimited data is a joy to work with. Want to slurp your data into a multi-dimensional array?

my @data = map { chomp; split /\t/ } <$fh>;

I'd suggest that if your text editor makes distinguishing between tabs and spaces difficult, then you should investigate other text editors.

(Aside: yes, there are some very good CSV parsers for Perl which abstract away the nits when dealing with CSV. When you have to work with CSV in other programming languages you appreciate what a good job they do.)

perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

Replies are listed 'Best First'.
by Marshall (Abbot) on Sep 28, 2012 at 05:49 UTC
    The problem with a tab delimited file is that the tabs are hard to see in a normal text editor. Is that '" "\t"' or '" "\t" or whatever?

    So the basic problem is that tabs are not easily "visible". My programming editor also converts "tabs" to "spaces" when I write a program file. No program file that I work with has tab characters in it. When I "save it" all the tabs disappear.

    There is not a "standard" for the number of spaces for a tab character. In the "olden days", this made a difference because it saved disk space. This makes no difference now. Or in a practical sense, the space saving makes no difference. And it is "hard to read" the output.

    Many of the DB output formats that I work with use "|" as the field separator. That is not a valid character for a name or an address. This works well for many types of DB fields that you might want to import/export and you can just use a simple split() for input. Perl has a number of .CSV parsers and they do work very, very well. That is another option.

    This tab idea is a problem because it is hard to see! Yes, I can deal with it and I can set editor settings to allow me to see the difference between 2 spaces versus one space and tab, but this is a hassle.

      In many situations it's hard to tell the difference between lower-case L and the number one, or upper-case O and naught (the proximity of the latter pair on the keyboard makes this a particularly dangerous issue). But I don't eschew those characters; I choose fonts that make it easier to distinguish between them, and my text-editor's syntax highlighting will often (though not always) catch the difference. The tools can save you if you let them.

      Similarly my text editor has an easy toggle (Ctrl+Shift+A) which can be done with one hand (almost with one finger on this laptop keyboard!) to show or hide whitespace characters (and Ctrl+Shift+D does line break characters) when I need to do a quick visual check.

      But for the most part, when working on files that I've authored, I don't need to visually check because I know which characters will be tabs, and which will be spaces. In source code, the indents will all be tabs, and everywhere else will be spaces.

      'There is not a "standard" for the number of spaces for a tab character.'

      Indeed; that's kind of the point of them. You can set tab stops to whatever is most convenient for you. I like to use 3 column tab stops; other people might prefer 2, 4 or 8. If we all use a single tab character to indent source code, then we can all work on the same source code and see it with our preferred indentation.

      "Or in a practical sense, the space saving makes no difference."

      Indeed; if I were using tabs as a compression mechanism, I'd be an idiot. (Bzip2 works much better.) But that's not what I use them for; I use them because they make more sense in certain contexts (delimiting fields; indenting source code) than space characters. If there were a filesize penalty for using tabs, I'd continue to use them.

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
        There is a disagreement of opinion here - not any disagreement on the facts of the situation.

        Ok, there is more than one way to do it. I think fine.

        I personally prefer fixed width font and no tabs within code. My normal program editor actually converts tabs to the appropriate number of spaces when I save the code to a file. I indent the code like I want. When I work in MS Visual Studio, it doesn't do that and I find it annoying - sometimes I want to take a MS .C file and use it on a Unix system and then we get into this "how many spaces does a tab mean?" thing. You see it as a plus. I see it as a hassle.

        So I guess mileage varies. I have personally found the "|" (pipe character) to be a good field separator in many circumstances. When that doesn't work, then I go to full blown CSV with all the complications that involves. But there are some very good Perl modules that can parse this out albeit slower than simple split() or match global.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://995760]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (8)
As of 2018-03-20 21:49 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (259 votes). Check out past polls.