Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

pipe delimited file problem

by lakeTrout (Scribe)
on Jan 22, 2007 at 02:10 UTC ( [id://595836]=perlquestion: print w/replies, xml ) Need Help??

lakeTrout has asked for the wisdom of the Perl Monks concerning the following question:

Hi All, I'm a novice and I've been out of the perl game for a LONG time, so bare with me here. I have a flatFile (pipe delimited), and here's the problem:

I need a regex or something that will allow to me add a "zero" or "empty" in instances where there is no data between pipes.

For example || would become |empty| -- I can handle that, but what about the instance where I have multiple nulls, like ||||? If I match on ||, won't that miss |||, so I would have to run it twice?

Thanks for any advice!

Replies are listed 'Best First'.
Re: pipe delimited file problem
by bobf (Monsignor) on Jan 22, 2007 at 03:49 UTC

    split has already been recommended, but you need to be aware of the default behavior of split with respect to empty leading or trailing fields.

    From the docs,

    By default, empty leading fields are preserved, and empty trailing ones are deleted. (If all fields are empty, they are considered to be trailing.) ... If LIMIT is specified and positive, it represents the maximum number of fields the EXPR will be split into, though the actual number of fields returned depends on the number of times PATTERN matches within EXPR. If LIMIT is unspecified or zero, trailing null fields are stripped (which potential users of pop would do well to remember).

    This behavior is illustrated in the following example:

    use strict; use warnings; while( my $str = <DATA> ) { chomp $str; print "\nsplitting $str with these LIMITs:\n"; foreach my $limit ( 0, 5, 7, -1 ) { my $result = munge( $str, $limit ); printf " %2d => [$result]\n", $limit; } } sub munge { my ( $str, $limit ) = @_; my @fields = split( '\|', $str, $limit ); @fields = map { $_ || 'empty' } @fields; # using a different delimiter to illustrate non-split fields return join( '!', @fields ); } __DATA__ 1|2|3|4|5|6 1|2||4|| |2|3||| |||||
    This produces the following output:
    splitting 1|2|3|4|5|6 with these LIMITs: 0 => [1!2!3!4!5!6] 5 => [1!2!3!4!5|6] 7 => [1!2!3!4!5!6] -1 => [1!2!3!4!5!6] splitting 1|2||4|| with these LIMITs: 0 => [1!2!empty!4] 5 => [1!2!empty!4!|] 7 => [1!2!empty!4!empty!empty] -1 => [1!2!empty!4!empty!empty] splitting |2|3||| with these LIMITs: 0 => [empty!2!3] 5 => [empty!2!3!empty!|] 7 => [empty!2!3!empty!empty!empty] -1 => [empty!2!3!empty!empty!empty] splitting ||||| with these LIMITs: 0 => [] 5 => [empty!empty!empty!empty!|] 7 => [empty!empty!empty!empty!empty!empty] -1 => [empty!empty!empty!empty!empty!empty]

    Note the effect on the empty leading and trailing fields when using a LIMIT of 0, a positive LIMIT less than the expected number of fields, a positive LIMIT greater than the expected number of fields, and a negative LIMIT.

    For your application, I think you either want to specify a LIMIT that corresponds to the number of expected fields (if you know this in advance), or a negative LIMIT.

    HTH

    Update: graff++; # I suspected I was overdoing the example :-)

      Thank you guys for the feedback!
Re: pipe delimited file problem
by imp (Priest) on Jan 22, 2007 at 02:27 UTC
    You can use a zero-width positive look behind, like this:
    use strict; use strict; while (<DATA>) { s/(?<=\|)\|/empty|/g; print; } __DATA__ |a||| |||| |||a|
    I usually find it easier to maintain code that uses split though.
Re: pipe delimited file problem
by rodion (Chaplain) on Jan 22, 2007 at 02:56 UTC
    For what it's worth, I second imp's suggestion to use split. Even though the split version
    $_ = join '|', map {$_ ||= 'empty' } split('\|',$_,-1);
    is a little longer than the regex
    s/(?<=\|)\|/empty|/g;
    It's closer to how one thinks of a delimited record, and so it's easier to change if you need to.

    One difference is that the split version puts the 'empty' string in the first field, if it's empty, while the regex version doesn't, So that may be the most significant difference for you.

    Update: Corrected missing negative limit in split. See graff's comment.

      But if you're going to use split in a situation like this, be sure you tell it not to drop "trailing nulls":
      $_ = join '|', map { $_ ||= 'empty' } split( /\|/, $_, -1 );
      The third arg to split() (set to "-1") is important here. Without it, one or more instances of "|" at the end of the string will simply be ignored, and the output could have a variable number of records per line (which might cause trouble downstream).
Re: pipe delimited file problem
by quester (Vicar) on Jan 22, 2007 at 06:47 UTC
    Another way to do it is to repeat the substitution until it fails; this may be less efficient but it often conserves brain time:
    while (s/\|\|/|empty|/g) {};
    or the semi-golf version
    1 while s/\|\|/|empty|/g;
Re: pipe delimited file problem
by johngg (Canon) on Jan 22, 2007 at 14:25 UTC
    To expand on imp's suggestion with look-behind assertions, you could use look-behind and look-ahead assertions in combination so that you find a point that is preceded by a pipe symbol or the beginning of the line and followed by a pipe symbol or the newline and do the substitution there. Look-behinds have to be fixed width, hence the alternation of two of them. I have set up a variable $litPipe to hold a literal pipe symbol to avoid a lot of escaping inside the regular expression.

    use strict; use warnings; my $litPipe = q{\|}; my $rxBetween = qr {(?x) (?: (?<=\A) | (?<=$litPipe) ) (?=$litPipe|\n) }; while (<DATA>) { s{$rxBetween}{EMPTY}g; print; } __END__ a|b|c|d|e f||h|i|j |l|m|n|o ||||t u|v|w|x|

    Here's the output

    a|b|c|d|e f|EMPTY|h|i|j EMPTY|l|m|n|o EMPTY|EMPTY|EMPTY|EMPTY|t u|v|w|x|EMPTY

    Because the regular expression pins down exactly where you want to do the substitution it copes well with beginning and end of line situations.

    I hope this is of use.

    Cheers,

    JohnGG

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://595836]
Approved by imp
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (8)
As of 2024-04-23 17:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found