http://www.perlmonks.org?node_id=465789

juo has asked for the wisdom of the Perl Monks concerning the following question:

I have been looking to split a line using multiple conditions but have failed to do so. Anybody has an idea.

FDR [62.10060.051-F] [62.10051.381] 0 1 0

For example I want to split the above on space but if I have brackets it should take the whole string and ignore spaces within the bracket area. So in total I want to have six fields. I would like to do this in one split line.

# This can only work untill the first bracket my @feeder_line = split/\s+\[/;

Replies are listed 'Best First'.
Re: Split using multiple conditions
by bart (Canon) on Jun 11, 2005 at 11:31 UTC
    That's an official FAQ: perlfaq 4: How can I split a [character] delimited string except when inside [character]?

    Personally, I'd be inclined to use the dual approach: match the stuff between brackets, or nonspaces.

    $_ = 'FDR [62.10060.051-F] [62.10051.381] [this includes spaces!] 0 1 +0'; @parts = /\[.*?\]|[^\[\]\ ]+/g; $\ = "\n"; print for @parts;

    Yes it can be that compact. Result:

    FDR [62.10060.051-F] [62.10051.381] [this includes spaces!] 0 1 0

    A limitation is that you can't easily split on single spaces, thus returning empty strings as a section.

Re: Split using multiple conditions
by mda2 (Hermit) on Jun 11, 2005 at 15:09 UTC
    The bart give a great response! But to understand your question... Your split regex need a quantifier:
    $_ = 'FDR [62.10060.051-F] [62.10051.381] 0 1 0'; @f1 = split/\s+\[/; #>> split only \s+ AND [ ... @f2 = split/\s+\[?/; #>> split \s+ OR \s+[ ... @f3 = split/\]?\s+\[?/; #>> split parts, without []... print join(" + ", @f1), "\n"; print join(" + ", @f2), "\n"; print join(" + ", @f3), "\n"; __END__ FDR + 62.10060.051-F] + 62.10051.381] 0 1 0 FDR + 62.10060.051-F] + 62.10051.381] + 0 + 1 + 0 FDR + 62.10060.051-F + 62.10051.381 + 0 + 1 + 0

    --
    Marco Antonio
    Rio-PM

Re: Split using multiple conditions
by ikegami (Patriarch) on Jun 11, 2005 at 16:15 UTC

    You can use a single expression like bart showed, but I find the following easier to understand (and maintain):

    # Seperate the fields. my @feeder_line = split /\s+/; # Clean up the data: # Remove the brackets from the 2nd and 3rd fields. foreach (@feeder_line[1, 2]) { s/^\[//; s/\]$//; }
Re: Split using multiple conditions
by dws (Chancellor) on Jun 11, 2005 at 21:29 UTC

    Nother alternative is to remove the brackets first, then split.

    my ($nobrackets = $_) =~ s/(\[|\])//g; my @feeder_line = split ' ', $nobrackets;

      Unfortunately, this doesn't do quite what the original poster asked for. Consider [id://bart]'s code snippet above and plug it into yours:

      $_ = 'FDR [62.10060.051-F] [62.10051.381] [this includes spaces!] 0 1 +0'; ($nobrackets = $_) =~ s/(\[|\])//g; @feeder_line = split ' ', $nobrackets; $\ = "\n"; print for @feeder_line; __END__ FDR 62.10060.051-F 62.10051.381 this includes spaces! 0 1 0

      N.B.: I have removed the mys because my ($nobrackets = $_) ... results in the error message Can't use global $_ in "my" at - line 1, near "= $_" (the correct syntax is (my $nobrackets = $_) ...