Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

check if string is valid with special cases

by ovedpo15 (Monk)
on May 15, 2018 at 10:13 UTC ( #1214533=perlquestion: print w/replies, xml ) Need Help??

ovedpo15 has asked for the wisdom of the Perl Monks concerning the following question:

Consider that following string: a,b,c,d,e,f
this string has 6 substrings and 5 commas between them. each one of the substrings can contain whatever symbol there is. At the end I would like to split it like this:

 my ($a,$b,$c,$d,$e,$f) = split(/,/ $string);

But first I would like to check that this string is valid meaning there is a substring between the commas.
I can use like this:  if(!defined($a) || !defined($b) || ... || !defined($f)) ...

but it doesn't look very good and it's too long. I would like somehow to check it with split or regex.
I also tried to use if(($string =~ tr/,//) != 5) ... but it isn't a good idea because I won't catch the "a,b,c,d,,f" case or if the one of the substrings will conatine a comma (for example: $b = "hello_world,bye";)

Replies are listed 'Best First'.
Re: check if string is valid with special cases
by swl (Deacon) on May 15, 2018 at 10:48 UTC

    If your strings could contain embedded commas then use Text::CSV_XS or similar to split it into an array, then all or any from List::Util to check that none or some are blank.

    use Text::CSV_XS; use List::Util qw /any/; my $csv = Text::CSV_XS->new; my @strings = ( 'a,b,c,d,e,f', 'a,b,,d,e,f', 'a,b,"c,and,some,commas",d,e,f', 'a,b,c,d,e', ); foreach my $string (@strings) { print "Checking $string\n"; my $status = $csv->parse ($string); my @array = $csv->fields; warn ($csv->error_input) if !$status; print "Did not get six entries in $string\n" if not @array == 6; print "one entry in $string is zero length\n" if any {!length $_} @array; print join ':', @array; print "\n\n"; }
Re: check if string is valid with special cases
by hippo (Chancellor) on May 15, 2018 at 11:07 UTC

    If I've understood your specification correctly then this should perform the tests you require.

    use strict; use warnings; use Test::More tests => 2; my $valid_in = 'a,b,c,d,e,f'; my $invalid_in = 'a,b,c,d,,f'; ok (test_it ($valid_in), "'$valid_in' is a valid argument"); ok (!test_it ($invalid_in), "'$invalid_in' is an invalid argument"); sub test_it { my ($string) = @_; my @fields = split (/,/, $string); my $emptyfields = grep { $_ eq '' } @fields; if (scalar @fields == 6 && $emptyfields == 0) { return @fields; } return 0; }
Re: check if string is valid with special cases (updated)
by haukex (Chancellor) on May 15, 2018 at 10:22 UTC

    TIMTOWTDI; personally I'd write:

    die "invalid format: $string" unless $string=~/\A[^,]+(?:,[^,]+){5}\z/;

    Update: Note your specification is a bit unclear, you say "each one of the substrings can contain whatever symbol there is" but then later on say that the strings can't* contain commas. What about, for example, "1,2,3, ,5,6" (which the above regex will call valid)? See also Re: How to ask better questions using Test::More and sample data.

    * Update 2: Sorry, I misunderstood your post, you're saying the strings can contain commas. See my reply below.

      Im sorry, I meant that the substring can conatin commas in it. nevertheless, is it possible to include a case of space (bad case - "1,2,3, ,4,5") for the regex you wrote? I think its the regex I need.
        the substring can conatin commas in it

        Sorry, this doesn't make sense to me. If the input string is "1,2,3,4,5,6,7", then does $a get "1,2", or does $b get "2,3", and so on...

        Or is this CSV, as swl correctly pointed out? Then you should use Text::CSV.

        If not, then as I said, please provide lots of examples of valid input with the expected output, as well as lots of examples of invalid input.

Re: check if string is valid with special cases (updated)
by AnomalousMonk (Bishop) on May 15, 2018 at 14:09 UTC

    I agree with others that this looks like a Text::CSV problem (and also that your various problem statements are rather vague).

    However, in a general regex parsing (and regexes are often not the best approach to parsing) situation, my approach tends to be something like

    c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my $s = 'a,b,c,d,e,f'; ;; my $sym = qr{ [^,] }xms; my $sep = qr{ , }xms; ;; $s =~ m{ \A $sym (?: $sep $sym){5} \z }xms or die qq{bad string: '$s' +}; ;; my ($u, $v, $w, $x, $y, $z) = $s =~ m{ $sym }xmsg; dd $u, $v, $w, $x, $y, $z; " ("a", "b", "c", "d", "e", "f")
    Once you know a string is valid, it's often quite easy to strip out sub-strings of interest. Oh, you say the separator pattern should include possible spaces? Then
        my $sep = qr{ \s* , \s* }xms;
    Oh, the "symbol" may be more than a single character and must also exclude spaces? Then
        my $sym = qr{ [^,\s]+ }xms;
    And so on. Separately defining $sym and $sep makes them easy to change and makes any change propagate throughout the code as necessary (DRY).

    And yes, test all this stuff! (Test::More and friends.)

    Update: Here's another approach to regex (again, maybe not the best option) parsing. It combines validation and extraction into a single regex, but the regex is significantly more complex and probably a bit slower. It also needs Perl version 5.10+ for the  \K operator.

    c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "use 5.010; ;; my $s = 'a,b, CC ,d , e,fgh '; ;; my $sym = qr{ [^,\s]+ }xms; my $sep = qr{ \s* , \s* }xms; ;; my $n_syms = my ($u, $v, $w, $x, $y, $z) = $s =~ m{ (?: \G (?! \A) $sep | \A \s*) \K $sym (?= (?: $sep $sym)* \s* \z) }xmsg; ;; $n_syms == 6 or die qq{bad string: '$s'}; dd $u, $v, $w, $x, $y, $z; " ("a", "b", "CC", "d", "e", "fgh")
    (Testing, testing...)


    Give a man a fish:  <%-{-{-{-<

Re: check if string is valid with special cases
by mr_ron (Hermit) on May 15, 2018 at 21:29 UTC
    Like most of the other answers I will recommend Text::CSV but with a hopefully helpful explanation. Your question is asking for comma separated fields but a field can also contain a comma (,) enclosed in quotes ("). What about a field that needs to have " characters in it as well? CSV has a standard way of handling all these issues and it is not so easy to do with one regex. What about a,b,c,"",e,f ? Again CSV will just handle it. The use of List::Util suggested by swl makes a good refinement but hopefully the example below is a start:
    #!/usr/bin/env perl use Modern::Perl; use Text::CSV; my $csv = Text::CSV->new; my @strings = ( 'a,"b,c",3,4,5,6', 'a,"b,c",3,4,5', 'a,"b,c",3,4,5,""', 'abc,def,"ghi,jkl",,mno,6' ); foreach my $s (@strings) { if (my $status = $csv->parse($s)) { if ( (grep { /\w/ } $csv->fields) != 6 ) { warn "row failed: ", $csv->string } } else { warn $csv->error_input } }
    Ron
Re: check if string is valid with special cases
by kcott (Bishop) on May 16, 2018 at 09:46 UTC

    G'day ovedpo15,

    You can do your validation with this condition:

    $string =~ y/,/,/ == -1 + grep length y/ //dr, split /,/, $string

    I ran these tests based on the ever-shifting goal posts throughout this thread. :-)

    $ alias perle alias perle='perl -Mstrict -Mwarnings -Mautodie=:all -E' $ perle 'my $x = "a,b,c"; say +($x =~ y/,/,/ == -1 + grep length y/ // +dr, split /,/, $x) ? "Y" : "N"' Y $ perle 'my $x = "a,b ,c"; say +($x =~ y/,/,/ == -1 + grep length y/ / +/dr, split /,/, $x) ? "Y" : "N"' Y $ perle 'my $x = "a, ,c"; say +($x =~ y/,/,/ == -1 + grep length y/ / +/dr, split /,/, $x) ? "Y" : "N"' N $ perle 'my $x = "a, ,c"; say +($x =~ y/,/,/ == -1 + grep length y/ // +dr, split /,/, $x) ? "Y" : "N"' N $ perle 'my $x = "a,,c"; say +($x =~ y/,/,/ == -1 + grep length y/ //d +r, split /,/, $x) ? "Y" : "N"' N

    — Ken

Re: check if string is valid with special cases
by Veltro (Friar) on May 15, 2018 at 15:03 UTC
    -

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1214533]
Approved by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2019-10-18 16:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?