Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Problem with split using a | seperator

by Grey Fox (Chaplain)
on Jan 23, 2009 at 16:26 UTC ( #738516=perlquestion: print w/replies, xml ) Need Help??

Grey Fox has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks;
I am having an issue trying to split a file whose fields are seperated by the '|' character. It seems to be treating the special character as undefined and splitting my file on each character.
#!/pw/prod/svr4/bin/perl use warnings; use strict; use Data::Dumper; # # Purpose: Remove fields from emtoc file to facilitate title compare. # # I/O: # Input: Complete Emtoc text file. # Output: Emtoc file with index info missing. # # History: # 01/23/09 - Created # my $FALSE = 0; my $TRUE = 1; my $debug = $FALSE; if ( $#ARGV < 0 ) { print "Usage: $0 [In File][Out File]\n"; exit(1); } my $emtocin = $ARGV[0]; my $emtocout = $ARGV[1] || 'cmpemtocout.txt'; # begin processing open( FDIN, $emtocin ) || die "Could not open $emtocin\n"; open( FDOUT, $emtocout ) || die "Could not open $emtocout\n"; while ( my $record = <FDIN> ) { print "Record is $record\n" if $debug; # # seperate fields according to the template my @fld = split("|", $record ); # # open output file and overwrite file my $outrecord = join( '|', $fld[0], $fld[3], $fld[4], $fld[5], $fld[6], $fld[7], $fld[8] +); print FDOUT "$outrecord\n"; } close FDIN; close FDOUT; print "End of $0\n";

Input Data

file-101.pdf|BOOKMARK||71-00-03 Testing/Operating Limits|Goto_V +iew_External|FIT_WIDTH|1|N/A file-102.pdf|BOOKMARK||71-00-05 Storage/Transport|Goto_View_Ext +ernal|FIT_WIDTH|1|N/A file-103.pdf|BOOKMARK||71-00-10 Component Replacement|Goto_View +_External|FIT_WIDTH|1|N/A file-104.pdf|BOOKMARK||LIST OF EFFECTIVE PAGES|Goto_View_Exte +rnal|FIT_WIDTH|1|N/A file-105.pdf|BOOKMARK||HIGHLIGHTS|Goto_View_External|FIT_WIDT +H|1|N/A file-106.pdf|BOOKMARK||TABLE OF CONTENTS|Goto_View_External|F +IT_WIDTH|1|N/A

Output Results

f|e|-|1|0|1|. f|e|-|1|0|2|. f|e|-|1|0|3|. f|e|-|1|0|4|. f|e|-|1|0|5|. f|e|-|1|0|6|.

Any help would be greatly appreciated

-- Grey Fox
"We are grey. We stand between the darkness and the light" B5

Replies are listed 'Best First'.
Re: Problem with split using a | separator
by toolic (Bishop) on Jan 23, 2009 at 16:53 UTC
    A style note: you could use an array slice to save yourself some typing. Instead of:
    my $outrecord = join( '|', $fld[0], $fld[3], $fld[4], $fld[5], $fld[6], $fld[7], $fld[8] +);


    my $outrecord = join '|', @fld[0, 3..8];

    Update: Added link to docs.

      Thanks toolic;

      Using the style you suggested made my code simpler in a few of my programs.

      -- Grey Fox
      "We are grey. We stand between the darkness and the light" B5
        if ( $#ARGV < 0 ) { print "Usage: $0 [In File][Out File]\n"; exit(1); } my $emtocin = $ARGV[0]; my $emtocout = $ARGV[1] || 'cmpemtocout.txt';
        Since you open to style suggestions, the above could be written:
        die "Usage: $0 [In File][Out File]\n" if @ARGV <1; my ($emtocin, $emtocout)= @ARGV; $emtocout ||= 'cmpemtocout.txt';
        In Perl you can multiple lvalues! In general using a subscript is not a good idea in Perl. @ARGV is more clear than $#ARGV and you will use the @array syntax LOTS! An "off by one error" is one of the if not, the most common error in programming. A big advantage of Perl is that it greatly reduces this chance. Think in term of number of things in the list, not in terms of last index in list.

        Perl has a ||= operator that is often overlooked because it doesn't exist in most languages. Here an undef evaluates to "false", so this sets $emtocout to the default value if not already defined, which evidently is what you want.

        Also be aware that "die" prints different things depending upon whether you put a "\n" at the end of the string or not. You get executing path and line number if you leave off the "\n". In your opens, would be able to tell which line had the problem.

        You can also apply the list slice operator to the list to begin with instead of waiting until the print (why save something you don't need?).

        my @fld = ( split(/\|/, $record ) )[0, 3..8]; #and print could look like this: print FDOUT join("\n",@fld),"\n"; #you don't need $outrecord
        Not trying to hyper-critical, just helpful. EDIT:
        print FDOUT join("\n",@fld),"\n"; #should have course been, print FDOUT join('|',@fld),"\n";
Re: Problem with split using a | seperator
by ikegami (Patriarch) on Jan 23, 2009 at 16:30 UTC
    split's argument is a regexp pattern. "|" is special in regexp patterns.
    split(/\|/, $record );

    Wouldn't hurt to use Text::CSV_XS...


      It's good to know that the split argument is a regex pattern. I changed it and it works great.

      -- Grey Fox
      "We are grey. We stand between the darkness and the light" B5
        There's also quotemeta, as in: split quotemeta '|', $record;

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://738516]
Approved by ikegami
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (4)
As of 2022-10-04 20:20 GMT
Find Nodes?
    Voting Booth?
    My preferred way to holiday/vacation is:

    Results (18 votes). Check out past polls.