Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Split criteria

by harishnuti (Beadle)
on Nov 05, 2008 at 09:09 UTC ( [id://721582]=perlquestion: print w/replies, xml ) Need Help??

harishnuti has asked for the wisdom of the Perl Monks concerning the following question:


Hello Monks, Iam having trouble in splitting below data
!A001ST!,!98!,!1!,!01/10/1999!,!EUROPEENNE!,!0!,!EUR!,!6!,!7!,!0!,!98! +,!1! !A001ST!,!AD,CD!,!1!,!20/05/2004!,!ANDORRA!,!0!,!EUR,USD!,!6!,!7!,!0!, +!AD!,!1!

i want to split the above data by comma.
but there are comma's inside ! ! which i want to retain.
pls help me in framing the split criteria. iam doing something below.
#!/usr/bin/perl use strict; use warnings; print "Splitting data by comma as a delimiter \n"; while (<DATA>){ my @data = split /\,/,$_; print "Data splitted is as below \n"; print join("==>",@data),"\n"; # i will do something else with splitted data in this iteratio +n } __DATA__ !A001ST!,!98!,!1!,!01/10/1999!,!EUROPEENNE!,!0!,!EUR!,!6!,!7!,!0!,!98! +,!1! !A001ST!,!AD,CD!,!1!,!20/05/2004!,!ANDORRA!,!0!,!EUR,USD!,!6!,!7!,!0!, +!AD!,!1!

output is not what i expected.
Splitting data by comma as a delimiter Data splitted is as below !A001ST!==>!98!==>!1!==>!01/10/1999!==>!EUROPEENNE!==>!0!==>!EUR!==>!6 +!==>!7!==>!0!==>!98!==>!1! Data splitted is as below !A001ST!==>!AD==>CD!==>!1!==>!20/05/2004!==>!ANDORRA!==>!0!==>!EUR==>U +SD!==>!6!==>!7!==>!0!==>!AD!==>!1!

as you can see above !AD==>CD! are also splitted.
i needed as below
!A001ST!==>!AD,CD!==> and so on

Replies are listed 'Best First'.
Re: Split criteria
by GrandFather (Saint) on Nov 05, 2008 at 09:54 UTC

    The standard technique is to use Text::xSV or Text::CSV. In this case it has to be Text::CSV because you are using an unusual quote character (!). Consider:

    #!/usr/bin/perl use strict; use warnings; use Text::CSV; my $csv = Text::CSV->new ({quote_char => '!'}); print "Splitting data by comma as a delimiter \n"; while (<DATA>){ next unless $csv->parse ($_); my @data = $csv->fields (); print "Data splitted is as below \n"; print join("==>",@data),"\n"; # i will do something else with splitted data in this iteratio +n } __DATA__ !A001ST!,!98!,!1!,!01/10/1999!,!EUROPEENNE!,!0!,!EUR!,!6!,!7!,!0!,!98! +,!1! !A001ST!,!AD,CD!,!1!,!20/05/2004!,!ANDORRA!,!0!,!EUR,USD!,!6!,!7!,!0!, +!AD!,!1!

    Prints:

    Splitting data by comma as a delimiter Data splitted is as below A001ST==>98==>1==>01/10/1999==>EUROPEENNE==>0==>EUR==>6==>7==>0==>98== +>1 Data splitted is as below A001ST==>AD,CD==>1==>20/05/2004==>ANDORRA==>0==>EUR,USD==>6==>7==>0==> +AD==>1

    Perl reduces RSI - it saves typing
Re: Split criteria
by BrowserUk (Patriarch) on Nov 05, 2008 at 09:47 UTC
Re: Split criteria
by frayoyo (Monk) on Nov 05, 2008 at 09:51 UTC
    Or you could use a regex with zero-lookahead-pattern:
    my $line = q(!A001ST!,!AD,CD!,!1!,!20/05/2004!,!ANDORRA!,!0!,!EUR,USD! +,!6!,!7!,!0!,!AD!,!1!); # only commas followed by "!" split fields my @data = split(/\,\s*(?=!)/, $line); print "Data splitted is as below \n"; print join("==>",@data),"\n";
    From perldoc perlre:
    "(?=pattern)" A zero-width positive look-ahead assertion. For example, "/\w+(?=\t)/" matches a word followed by a tab, without including the tab in $&.
Re: Split criteria
by cdarke (Prior) on Nov 05, 2008 at 12:16 UTC
    Maybe I'm missing something, but the change seems quite simple:
    my @data = /(!.*?!)/g;
    Gives:
    Splitting data by comma as a delimiter Data splitted is as below !A001ST!==>!98!==>!1!==>!01/10/1999!==>!EUROPEENNE!==>!0!==>!EUR!==>!6 +!==>!7!==>!0!==>!98!==>!1! Data splitted is as below !A001ST!==>!AD,CD!==>!1!==>!20/05/2004!==>!ANDORRA!==>!0!==>!EUR,USD!= +=>!6!==>!7!==>!0!==>!AD!==>!1!
Re: Split criteria
by repellent (Priest) on Nov 05, 2008 at 09:37 UTC
    Which comma inside ! ! do you wish to retain? There is a comma inside !,! too. You need to disambiguate the comma somehow.

    Perhaps:
    my $line = q(!A001ST!,!AD,CD!,!1!,!20/05/2004!,!ANDORRA!,!0!,!EUR,USD! +,!6!,!7!,!0!,!AD!,!1!); # assume fields begin and end with ! ! my @data = $line =~ /^!(.*)!$/ && split(/!,!/, $1);
Re: Split criteria
by arkturuz (Curate) on Nov 05, 2008 at 09:45 UTC
    You have an unfortunate field delimiter, so something like this might help:
    while(<DATA>) { my $str = $_; $str =~ s/!,!/!:::!/g; my @parts = split(':::', $str); print join("==>", @parts), "\n"; }
    Output:
    !A001ST!==>!98!==>!1!==>!01/10/1999!==>!EUROPEENNE!==>!0!==>!EUR!==>!6 +!==>!7!==>!0!==>!98!==>!1! !A001ST!==>!AD,CD!==>!1!==>!20/05/2004!==>!ANDORRA!==>!0!==>!EUR,USD!= +=>!6!==>!7!==>!0!==>!AD!==>!1!
      That would break as soon as there's a field containing ':::'.
      A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Split criteria
by flamewise (Initiate) on Nov 05, 2008 at 16:02 UTC
    Could you try working with zero-width look-behind and look-ahead assertions, that make sure you'll only split on , between two ! characters?

    Try to split with this:
    my @data = split /(?<=!),(?=!)/,$_;
    This only works if all your data is guaranteed to be embedded in a pair of exclamation marks.

      Thanks for this, i got clear idea on splitting.. once again thanks to all, my purpose is solved from above

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://721582]
Approved by repellent
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2024-04-24 02:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found