Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

good delimiter in data for Perl

by sathiya.sw (Monk)
on Jan 07, 2009 at 13:11 UTC ( [id://734629]=perlquestion: print w/replies, xml ) Need Help??

sathiya.sw has asked for the wisdom of the Perl Monks concerning the following question:

What is a good delimiter for Perl ?

In other languages, while parsing data we would expect the $#$ to not appear in it, and use it as delimiter. But here $ means a scalar, so which is a good delimiter here ?

Sathiyamoorthy

Replies are listed 'Best First'.
Re: good delimiter in data for Perl
by JavaFan (Canon) on Jan 07, 2009 at 13:28 UTC
    What is a good delimiter for Perl?
    That's not an interesting question. You shouldn't choose your delimiter based on the language the program(s) that process the data, but on the data itself.

    When I have to pick a delimiter, the questions I ask are:

    1. Do I want to look at the data myself?
    2. Is the data generated or edited by humans?
    3. Are there characters (or short sequences of characters) that cannot occur in the data?
    4. If the answer of the previous question is 'no', what delimiter is expected to be rare? (I also have to start thinking about an escape sequence).
    5. Do I want the delimiter to be surrounded by optional whitespace? (If any of the first two questions is 'yes', the answer to this one tends towards 'yes'; if the data can have fields with leading or trailing whitespace, the answer tends towards 'no').
Re: good delimiter in data for Perl
by Corion (Patriarch) on Jan 07, 2009 at 13:16 UTC

    You seem to be confused about the distinction between data and code. Perl has no problem with using a record or field separator of $#$. Personally, I prefer the tabulator (chr 09) as a universal field separator. Why do you think that Perl has a problem with using $#$ as a separator?

      Perl has no problem in doing it. But when i use the $#$ as the delimiter, then i need to escape it to avoid interpreting it as variable..

      My data can have tabs also, so what i can have as delimiter was my question ?
      Sathiyamoorthy

        Why not store the delimiter in a variable and then use that variable?

        my $delimiter = '$#$'; my @columns = split /\Q$delimiter\E/, $data;

        So, again, I'm not sure why you see this as a problem, as Perl provides many convenient escaping mechanisms, like single quotes and quotemeta.

Re: good delimiter in data for Perl
by zwon (Abbot) on Jan 07, 2009 at 13:16 UTC

    Why not to use some standard format like CSV?

      No, because my data itself can have those separators which the CSV will be using. so these separators may clash, and data parsed wrongly !!
      Sathiyamoorthy
        Do you really think CSV could be so popular if that wasn't a tackled problem?

        In CSV, there's a way to protect separators occuring inside data fields.

        CVS has standard ways to handle special characters in data:
        1,2,3,4 is (1, 2, 3, 4) 1,"2,2,2",3,4 is (2, '2,2,2', 3, 4) 1,"2,""2""2",3 is (1, '2,"2",2', 3, 4)
Re: good delimiter in data for Perl
by Tanktalus (Canon) on Jan 07, 2009 at 16:22 UTC

    Just use Text::CSV_XS and be done with it. Parsing strings with arbitrary delimiters which may or may not be present in the actual data isn't exactly a hard problem, but there are too many special cases to concern yourself with that end up making a simple split on a regex painful. Seeing as this is a common problem, someone has taken the time to put it in a module so that we don't need to continue to worry about those special cases and, instead, concentrate on the real problem at hand.

    In fact, I would suggest using DBD::CSV instead, as it takes care of yet another layer or three of abstraction so that the data in your table really gets treated like data, again, to allow you to concentrate more on business logic than data logic.

    Of course, from there, it's a relatively small step to put the data in a real database (whether SQLite, MySQL, DB2, Oracle, or whatever) and gain further abstraction, speed, and utility. This is my goal for nearly all my data. I can't always get there, but I do try to design my data that it COULD get to a database, which includes using DBD::* whenever possible.

Re: good delimiter in data for Perl
by Lawliet (Curate) on Jan 07, 2009 at 19:44 UTC

    The delimiter should depend on the data. If you know you are not using tabs or commas, use them for delimiters. If you know you are not going to use the string "KJBLDGL45W89Y0SDFAHKSDAFPOIU3WE8Y9", use that as a delimiter. Or maybe use '<(-^.^-)>' x 100; -- you get the idea.

    I think we would need more information about the data you are trying to delimit in order to suggest the best delimiter.

    Oh, and, just to cover some more bases, if you are searching for a delimiter for 'Perl', I would go with '' to be safe ;) (assuming you want each character).

    And you didn't even know bears could type.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://734629]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (6)
As of 2024-04-19 11:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found