Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: CSV file

by Moron (Curate)
on Oct 31, 2005 at 15:13 UTC ( [id://504285]=note: print w/replies, xml ) Need Help??


in reply to CSV file

Or say it in regexp:
while( <FILE> ) { if ( /^([^\,]*\,\s*[^\,]*)\,\s*(.*)$/ ) { print "Name: $1\n"; print "Address: $2\n"; } }
To explain the regexp: the first ^ matches beginning of line, the ([^\,]*) captures the next contiguous set of non comma characters and places it in the first "numbered" variable ($1), the \,\s* steps over the first comma and any immediately trailing space, (update: that is now repeated to allow for the fact that there is a comma in the name field), the (.*) picks up everything thereafter placing it in the $2 variable and the $ matches with the end of the line.

-M

Free your mind

Replies are listed 'Best First'.
Re^2: CSV file
by integral (Hermit) on Oct 31, 2005 at 15:20 UTC

    ([\,]) only matches a single non-comma character. It is missing a quantifier such as + or *.

    Everytime I see (.*) I feel very uneasy, and usually I can think of a much simpler way to accomplish the same thing. In this case, all the regular expression is really doing is finding the first comma! Now, split normally splits on all the delimiters it finds in a string, but it can take an optional third argument which tells it exactly how much to split the line; in this case we just want two fields. As a bonus we can give the split fields names so we don't have to use $1 and $2.

    while (<FILE>) { chomp; my ($name, $address) = split /,\s+/, $_, 2; print "$name lives at $address\n"; }

    Another thing that would make me think twice about your solution is that you've got an if statement, but there's no sensible behaviour defined for when the if is false! Although I have to say that using a proper CSV module (Text::CSV, etc) is always preferable.

    --
    integral, resident of freenode's #perl
    
      I spotted the lack of * shortly after posting.

      re .*: having eliminated the first column and its delimiter, .* must be what is left over as the second and last column - why make the engine work harder?. It really is quite common that the remainder of a string is defined by elimination of specific features of the preceding characters and needs no further analysis - only when the .* is attempted earlier than such elimination is completed does it cause a problem.

      re "if" - I opted for this rather than allowing any exceptions to just fall through the code, whereas no else is defined because neither was any provision for such exceptions stated in the requirement. Not seeing the benefit of this turns out to be a trap for habitual negative thinking: If there was a blank line in the data, your own solution, which apart from not filtering exceptions is fine, would accept the blank data and, for example, enter it into the database!

      -M

      Free your mind

Re^2: CSV file
by tirwhan (Abbot) on Oct 31, 2005 at 15:50 UTC

    That regex doesn't even work on the OP's data, and neither will it correctly split strings which contain delimiters within quotes. Read the top node again and you'll probably realise that a regular expression is not the right solution here.


    Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan
      I missed the fact there is a comma in the name field too and modified the regular expression accordingly. Someone more handy at regexp's could probably factorise this more succinctly, however.

      But if we analyse the logic behind your argument we get "an error in analysing the requirement implies that all technical solutions resembling it must be wrong" - which I would hope reveals why I disagree with the argument, while remaining open-minded about variety of possible solutions.

      -M

      Free your mind

        Your regex still doesn't do what the OP is asking for, namely to split the line into the comma-delimited fields, taking care not to split on the comma if it is within quotes. Nor will it handle tohe given data correctly (tip: there's a telephone number which is not part of the address). Do try to read what other people have written more carefully, especially if it has been explicitly pointed out to you.

        As for your other remark, this is an area where it's unwise to use regular expressions because it's rather complicated to come up with a robust solution (take a look at Mastering Regular Expressions, it handles this subject in some detail) and there are perfectly well-developed solutions for parsing CSV files in CPAN, which have already been pointed out by others.


        Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://504285]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-04-25 02:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found