Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Split function

by Anonymous Monk
on Dec 03, 2012 at 11:32 UTC ( [id://1006854]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, i am reading a text file. and my code looks like below:
open FH, "$INPUT_DIR/$input_file" or die "Couldn't Open File: $!"; while ( <FH> ) { chomp; my ($s, $a, $c, $r) = (split / [, \t]/, $_);
the split fucntion process the comma and tab delimited now. Input file:
process: clientserver,00001,AIT,SOURCE clientserver 00001 AIT SOURCE error: clientserve|00001|AIT|SOURCE
split should die if it finds the pipe and it should process if it finds comma or tab delimited.

Replies are listed 'Best First'.
Re: Split function
by rjt (Curate) on Dec 03, 2012 at 12:07 UTC

    I believe this will do what you are looking for:

    while (<DATA>) { chomp; my ($s, $a, $c, $r) = split /[,\t]/; die "Invalid string: $_" if !defined $r; print "Processing: $_\n"; } __DATA__ clientserver,00001,AIT,SOURCE clientserve|00001|AIT|SOURCE

    Output:

    Processing: clientserver,00001,AIT,SOURCE Invalid string: clientserve|00001|AIT|SOURCE at 1006854.pl line 6, <DA +TA> line 2.

    Just change the die and print lines to do what you actually need.

      Hi ALL, My requirement is: i need to process a text file which is comma/tab delimited. Example: INPUT File ABC,DEF,GHI,JKL code: my ($a,$b,$c,$d) = split(/,\t/, $_); will process this text file. If a a text file conatins a INPUT file as below: ABC|DEF|GHU|IJK the same code :my ($a,$b,$c,$d) = split(/,\t/, $_); should die.

        This evil goes against my sense of sane coding, but it seems to be what you're asking for:

        use strict; use warnings; while( my $line = <DATA> ) { chomp $line; my( $a, $b, $c, $d ) = split /(?(?=^[^|]*\|)(?{die "Pipe [|] detected in input."})|)[,\ +t]/, $line; print "[($a)($b)($c)($d)]\n"; } __DATA__ ABC,DEF,GHI,JKL ABC|DEF|GHI|JKL

        This throws an exception from within the regex passed to split if the input string contains a pipe character. I wouldn't recommend bringing that to a code review, but given that none of the other solutions already provided seem to satisfy you, I am thinking that you'll only be happy when an exception is thrown as part of the split line. Despite the hackish nature of the code, it produces what you're requesting. Here's the output:

        [(ABC)(DEF)(GHI)(JKL)] Pipe [|] detected in input. at (re_eval 1) line 1, <DATA> line 2.

        It would be a lot better to just follow the advice of bart's post, or Colonel_Panic's post, in this same thread. And if neither of those posts does what you need, rather than just repeating your question again, explain exactly how their code fails to meet your needs. I find it hard to believe that your requirement is for the exact line containing the split to throw an exception. It seems a lot more reasonable to just assure that an exception is thrown once split fails to produce reasonable output, or possibly to pre-screen the line of text and throw before you split, if a pipe character is found.

        Update: Just for fun, an explanation of the regex:

        (?(condition)true_regex|false_regex) creates a conditional. For our condition, we use a zero-width lookahead assertion, (?=^[^|]*|) that detects if a pipe character is found anywhere in the string. If that condition is satisfied, the "true_regex" gets tested. The "true_regex" that we use is a (?{code}) construct, which is used (or abused) to execute Perl code from within a regular expression. The codeabuse we execute is the die statement. For our "false_regex", we use an empty expression, which will not affect the rest of the split match. The remainder of the regex is just what we would normally pass to 'split'.


        Dave

Re: Split function
by afoken (Chancellor) on Dec 03, 2012 at 17:14 UTC
    i am reading a text file

    Hmm, I think you are reading a CSV file, not just a text file. And unless it's for learning Perl, consider using Text::CSV_XS (or the slightly slower pure-perl version Text::CSV) instead. Text::CSV_XS handles all of those ugly edge cases that a simple split can't handle - embedded quotes, embedded separation character, quoted values, to name just a few.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      Text::CSV uses Text::CSV_XS if it is available for the platform. It seems to me that using CSV_XS directly means the code won't work on a platform without XS capabilities whereas it would have worked if Text::CSV had been used. From reading the pod I don't see any advantage to using Text::CSV_XS directly. Have I missed something?

        Text::CSV uses Text::CSV_XS if it is available for the platform. It seems to me that using CSV_XS directly means the code won't work on a platform without XS capabilities whereas it would have worked if Text::CSV had been used. From reading the pod I don't see any advantage to using Text::CSV_XS directly. Have I missed something?

        Probably not. I think it's just a problem of the timeline, or an old habit.

        According to CPAN, Text::CSV 0.01 was released on 1997-Jul-31, followed by 1.00 on 2007-Nov-27, more than 10 years later. Text::CSV_XS 0.16 was released 1999-Feb-11, followed by several releases up to 0.23 released 2001-Oct-09. During that time, Text::CSV did not change at all. In 2007, both Text::CSV and Text::CSV_XS saw a maintainer change and have been updated since then. During that maintainer change, Text::CSV was "rewritten to make a wrapper to Text::CSV_XS and Text::CSV_PP".

        I learned about Text::CSV_XS between 2001 and 2007. During that time, Text::CSV seemed to be an unmaintained and incompatible "first shot" version, and most other modules of that time, including DBD::CSV, used Text::CSV_XS. DBD::CSV still depends on Text::CSV_XS.

        Installing Text::CSV should be sufficient, and works without requiring a compiler, but it is slower than the XS version. The Makefile.PL from Text::CSV hints that installing a sufficiently recent XS version makes Text::CSV faster, but it does not attempt to install the XS version, even if a C compiler is available.

        Text::CSV_XS, on the other hand, does not depend on Text::CSV, and does not require it to be installed. It requires a working C compiler, but then, it is faster than Text::CSV.

        It would be nice if Text::CSV would attempt to install the XS module if that is possible. This way, there would be no need to install Text::CSV_XS manually.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Split function
by bart (Canon) on Dec 03, 2012 at 13:10 UTC
    If you can't have it die, you can make it do that if split splits into only one part.
    (my ($s, $a, $c, $r) = split /[,\t]/) == 1 and die;
    or, alternatively, you can do
    die if not defined $a;
    edit code fixed, thanks ColonelPanic

      If you are going to do error checking, why not do it properly? Check that all fields are present:

      (my ($s, $a, $c, $r) = split /[,\t]/) < 4 and die;
      or:
      die if not defined $r;

      (Also, you had a typo in your code: extra paren)



      When's the last time you used duct tape on a duct? --Larry Wall
Re: Split function
by Anonymous Monk on Dec 03, 2012 at 11:40 UTC

    split should die if it finds the pipe and it should process if it finds comma or tab delimited.

    No, split shouldn't/won't die, that is now how it works

    If you want your program to die on pipe, use the match operator and match a pipe, examples in perlintro, read it

      Perhaps there's a bit of a language issue here, but I'm pretty sure the OP meant that the processing should work if split found four comma or space delimited columns, and die otherwise, not that split itself should raise an exception, even though it may have literally read that way.

      See my reply, below. I know it might not exactly hit the mark, as the specification was a little vague, but it should be easy to modify for different inputs or failure conditions.

Re: Split function
by pvaldes (Chaplain) on Dec 03, 2012 at 19:03 UTC

    The command whose name is written in this script shall die

    This note will not take effect unless the writer has the command’s face in their mind when writing his/her name.

    If the cause of death is written within the next 40 seconds of writing the command’s name, it will happen. If not specified, split will simply die of a heart attack

    mmmh... could I suggest that simply ignore the lines with "|", or maybe emit a warning (or warn)?

    TOGETHER WE CAN SAVE A LIFE!, (or at least several hours of strained compilation)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1006854]
Approved by marto
Front-paged by 2teez
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (3)
As of 2024-03-19 11:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found