Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Hi,
i am reading a text file. and my code looks like below:
open FH, "$INPUT_DIR/$input_file" or die "Couldn't Open File: $!";
while ( <FH> ) {
chomp;
my ($s, $a, $c, $r) = (split / [, \t]/, $_);
the split fucntion process the comma and tab delimited now.
Input file:
process:
clientserver,00001,AIT,SOURCE
clientserver 00001 AIT SOURCE
error:
clientserve|00001|AIT|SOURCE
split should die if it finds the pipe and it should process if it finds comma or tab delimited.
Re: Split function
by rjt (Curate) on Dec 03, 2012 at 12:07 UTC
|
while (<DATA>) {
chomp;
my ($s, $a, $c, $r) = split /[,\t]/;
die "Invalid string: $_" if !defined $r;
print "Processing: $_\n";
}
__DATA__
clientserver,00001,AIT,SOURCE
clientserve|00001|AIT|SOURCE
Output:
Processing: clientserver,00001,AIT,SOURCE
Invalid string: clientserve|00001|AIT|SOURCE at 1006854.pl line 6, <DA
+TA> line 2.
Just change the die and print lines to do what you actually need. | [reply] [Watch: Dir/Any] [d/l] [select] |
|
Hi ALL,
My requirement is:
i need to process a text file which is comma/tab delimited.
Example: INPUT File
ABC,DEF,GHI,JKL
code:
my ($a,$b,$c,$d) = split(/,\t/, $_);
will process this text file.
If a a text file conatins a INPUT file as below:
ABC|DEF|GHU|IJK
the same code :my ($a,$b,$c,$d) = split(/,\t/, $_);
should die.
| [reply] [Watch: Dir/Any] |
|
use strict;
use warnings;
while( my $line = <DATA> ) {
chomp $line;
my( $a, $b, $c, $d )
= split /(?(?=^[^|]*\|)(?{die "Pipe [|] detected in input."})|)[,\
+t]/,
$line;
print "[($a)($b)($c)($d)]\n";
}
__DATA__
ABC,DEF,GHI,JKL
ABC|DEF|GHI|JKL
This throws an exception from within the regex passed to split if the input string contains a pipe character. I wouldn't recommend bringing that to a code review, but given that none of the other solutions already provided seem to satisfy you, I am thinking that you'll only be happy when an exception is thrown as part of the split line. Despite the hackish nature of the code, it produces what you're requesting. Here's the output:
[(ABC)(DEF)(GHI)(JKL)]
Pipe [|] detected in input. at (re_eval 1) line 1, <DATA> line 2.
It would be a lot better to just follow the advice of bart's post, or Colonel_Panic's post, in this same thread. And if neither of those posts does what you need, rather than just repeating your question again, explain exactly how their code fails to meet your needs. I find it hard to believe that your requirement is for the exact line containing the split to throw an exception. It seems a lot more reasonable to just assure that an exception is thrown once split fails to produce reasonable output, or possibly to pre-screen the line of text and throw before you split, if a pipe character is found.
Update: Just for fun, an explanation of the regex:
(?(condition)true_regex|false_regex) creates a conditional. For our condition, we use a zero-width lookahead assertion, (?=^[^|]*|) that detects if a pipe character is found anywhere in the string. If that condition is satisfied, the "true_regex" gets tested. The "true_regex" that we use is a (?{code}) construct, which is used (or abused) to execute Perl code from within a regular expression. The codeabuse we execute is the die statement. For our "false_regex", we use an empty expression, which will not affect the rest of the split match. The remainder of the regex is just what we would normally pass to 'split'.
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Split function
by afoken (Chancellor) on Dec 03, 2012 at 17:14 UTC
|
i am reading a text file
Hmm, I think you are reading a CSV file, not just a text file. And unless it's for learning Perl, consider using Text::CSV_XS (or the slightly slower pure-perl version Text::CSV) instead. Text::CSV_XS handles all of those ugly edge cases that a simple split can't handle - embedded quotes, embedded separation character, quoted values, to name just a few.
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
| [reply] [Watch: Dir/Any] [d/l] |
|
| [reply] [Watch: Dir/Any] |
|
Text::CSV uses Text::CSV_XS if it is available for the platform. It seems to me that using CSV_XS directly means the code won't work on a platform without XS capabilities whereas it would have worked if Text::CSV had been used. From reading the pod I don't see any advantage to using Text::CSV_XS directly. Have I missed something?
Probably not. I think it's just a problem of the timeline, or an old habit.
According to CPAN, Text::CSV 0.01 was released on 1997-Jul-31, followed by 1.00 on 2007-Nov-27, more than 10 years later. Text::CSV_XS 0.16 was released 1999-Feb-11, followed by several releases up to 0.23 released 2001-Oct-09. During that time, Text::CSV did not change at all. In 2007, both Text::CSV and Text::CSV_XS saw a maintainer change and have been updated since then. During that maintainer change, Text::CSV was "rewritten to make a wrapper to Text::CSV_XS and Text::CSV_PP".
I learned about Text::CSV_XS between 2001 and 2007. During that time, Text::CSV seemed to be an unmaintained and incompatible "first shot" version, and most other modules of that time, including DBD::CSV, used Text::CSV_XS. DBD::CSV still depends on Text::CSV_XS.
Installing Text::CSV should be sufficient, and works without requiring a compiler, but it is slower than the XS version. The Makefile.PL from Text::CSV hints that installing a sufficiently recent XS version makes Text::CSV faster, but it does not attempt to install the XS version, even if a C compiler is available.
Text::CSV_XS, on the other hand, does not depend on Text::CSV, and does not require it to be installed. It requires a working C compiler, but then, it is faster than Text::CSV.
It would be nice if Text::CSV would attempt to install the XS module if that is possible. This way, there would be no need to install Text::CSV_XS manually.
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
| [reply] [Watch: Dir/Any] |
Re: Split function
by bart (Canon) on Dec 03, 2012 at 13:10 UTC
|
If you can't have it die, you can make it do that if split splits into only one part.
(my ($s, $a, $c, $r) = split /[,\t]/) == 1 and die;
or, alternatively, you can do
die if not defined $a;
edit code fixed, thanks ColonelPanic | [reply] [Watch: Dir/Any] [d/l] [select] |
|
(my ($s, $a, $c, $r) = split /[,\t]/) < 4 and die;
or:
die if not defined $r;
(Also, you had a typo in your code: extra paren)
When's the last time you used duct tape on a duct? --Larry Wall
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Split function
by Anonymous Monk on Dec 03, 2012 at 11:40 UTC
|
split should die if it finds the pipe and it should process if it finds comma or tab delimited.
No, split shouldn't/won't die, that is now how it works
If you want your program to die on pipe, use the match operator and match a pipe, examples in perlintro, read it
| [reply] [Watch: Dir/Any] |
|
Perhaps there's a bit of a language issue here, but I'm pretty sure the OP meant that the processing should work if split found four comma or space delimited columns, and die otherwise, not that split itself should raise an exception, even though it may have literally read that way.
See my reply, below. I know it might not exactly hit the mark, as the specification was a little vague, but it should be easy to modify for different inputs or failure conditions.
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Split function
by pvaldes (Chaplain) on Dec 03, 2012 at 19:03 UTC
|
The command whose name is written in this script shall die
This note will not take effect unless the writer has the command’s face in their mind when writing his/her name.
If the cause of death is written within the next 40 seconds of writing the command’s name, it will happen. If not specified, split will simply die of a heart attack
mmmh... could I suggest that simply ignore the lines with "|", or maybe emit a warning (or warn)?
TOGETHER WE CAN SAVE A LIFE!, (or at least several hours of strained compilation)
| [reply] [Watch: Dir/Any] |
|
|