Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Stuck in komplexer regex, at least for me

by ultibuzz (Monk)
on Mar 26, 2007 at 15:14 UTC ( #606597=perlquestion: print w/ replies, xml ) Need Help??
ultibuzz has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,
the problem is that the first 0 shoud not be touched but all following untill number !=0, except there is only one 0

i managed it as far as u can see in my code,but these 2 problems left

this is the pattern s/(?<=\d{2})0*(?<=\d{4})0+//

this shoud be produced

given        -> after regex
215000007801 -> 21507801
300000324002 -> 30324002
890000457651 -> 89457651
210004563401 -> 214563401
201045139158 -> 20145139158

kd ultibuzz

Comment on Stuck in komplexer regex, at least for me
Download Code
Re: Stuck in komplexer regex, at least for me
by johngg (Abbot) on Mar 26, 2007 at 15:49 UTC
    From your output it looks as if you want to replace the first occurrence of multiple zeros with a single zero, any subsequent multiple zero group should be left alone. This seems to work

    #!/usr/local/bin/perl # use strict; use warnings; print map { s{0{2,}}{0}; $_ } <DATA>; __END__ 215000007801 300000324002 890000457651 210004563401 201045139158

    and the output is

    21507801 30324002 890457651 2104563401 201045139158

    I hope this is of use.

    Cheers,

    JohnGG

    Update: Looks like I've misunderstood. If you say the first zero should not be touched, why does 890000457651 become 89457651 and not 890457651? Perhaps you could clarify what you require.

      sorry forgot to say that the rule first 0 no touch only applies when its after position 3
      kd ultibuzz

        I'm afraid things are becoming more confusing rather than less. From the first four examples in your desired output I can guess at these rules:

        Look for a sequence of two or more zeros

        If the sequence found starts in position 1, 2 or 3, (counting from 1 but note that Perl usually counts from 0), replace all the zeros by nothing

        If the sequence starts from position 4 or greater, replace all but one of the zeros with nothing

        I can't see how the fifth example fits into anything you have described.

        Cheers,

        JohnGG

      print map { s{0{2,}}{0}; $_ } <DATA>;

      I know this is completely OT and that in this particular case it wouldn't make a difference, but we recommend all the time against slurping files in all at once unless really needed, so please do not spread the word against this recommendation:

      s/0{2,}/0/, print while <DATA>;

      is not terribly more verbose. (I also changed the curlies as delimiters in the s operator because in this case they seemed confusing to me.)

        we recommend all the time against slurping files in all at once unless really needed

        Do we? How strange. Why?

        Seems to me that slurping is a perfectly valid technique in the right circumstances, for instance, parsing command output or working with small to middling data sets. People have even gone to the trouble of writing modules to support the idiom.

        In this case it would appear that ultibuzz has a data set of 5000000 numbers so slurping, as it turns out, would definitely not be appropriate.

        Cheers,

        JohnGG

Re: Stuck in komplexer regex, at least for me
by Moron (Curate) on Mar 26, 2007 at 15:55 UTC
    I think you are suffering slightly from the popular compulsion to do everything in the regexp. Personally, I would have gone for the simplest thing that came to mind e.g.:
    my $keep = substr( $_, 0, 2 ); $_ = substr( $_, 2, length($_) ); s/0+/0; $_ = $keep . $_;
    and then also reassured myself that such a simple regexp should gain significantly in performance and should win the trade-off against having the extra substr operations which are cheap by comparison to m//.

    -M

    Free your mind

Re: Stuck in komplexer regex, at least for me
by kyle (Abbot) on Mar 26, 2007 at 16:18 UTC

    For the benefit of other monks who might be trying their hand at this:

    use Test::More; my %output_for = ( '215000007801' => '21507801', '300000324002' => '30324002', '890000457651' => '89457651', '210004563401' => '214563401', '201045139158' => '20145139158', ); plan tests => scalar keys %output_for; sub solution { # (this doesn't work) $_[0] =~ s{ \A ([^0]+ 0) ([^0]*) 0+ ([^0]+0) }{$1$2$3}x # || # $_[0] =~ s{ \A ([^0]+) 0+ ([^0]) }{$1$2}x; } while ( my ($input, $correct_output) = each %output_for ) { my $orig_input = $input; solution( $input ); is( $input, $correct_output, "Solved '$orig_input'" ); }
Re: Stuck in komplexer regex, at least for me
by saintly (Scribe) on Mar 26, 2007 at 16:26 UTC
    Hmm, I don't think I understand the rule...

    201045139158 -> 20145139158
    Why is the 2nd 0 eliminated?

      Because it's not Thursday after dark. If it were, the 0 would have been replaced with a jack and he'd be on the way to a royal fizzbin. Unless he got a kronk, but that'd just be bad luck.

      I think the problem is that he doesn't understand his own spec well enough to either write the regex nor to explain it to us. It's probably time for the OP to sit down, go back to square one, and enumerate just what it is that's trying to be accomplished.

Re: Stuck in komplexer regex, at least for me
by saintly (Scribe) on Mar 26, 2007 at 17:17 UTC
    Well, I can make a ruleset that seems to fit:
    1. Always keep the first two numbers intact
    2. If any zeros start at position 3, remove them all.
    3. If you didn't do that, then if two or more zeros start after position 3, truncate them to a single 0
    4. If you didn't do either of the first two techniques, then remove a single 0 starting at position 3 or later entirely.

    What happens if a single 0 starts at position 3? What does '35010333356' turn into?

    Here's my code:
    #!/usr/bin/perl use strict; use warnings; use Test::More; my %output_for = ( '215000007801' => '21507801', '300000324002' => '30324002', '890000457651' => '89457651', '210004563401' => '214563401', '201045139158' => '20145139158', ); plan tests => scalar keys %output_for; while ( my ($input, $correct_output) = each %output_for ) { my $orig_input = $input; $input = &compress_numstring( $input ); is( $input, $correct_output, "Solved '$orig_input'" ); } sub compress_numstring { my $starting_num = shift; return $starting_num unless(defined $starting_num && $starting_num =~ /^(\d{2})(\d+)/); my( $keep, $modify ) = ($1,$2); ( $modify =~ s/^0+// ) || ( $modify =~ s/0{2,}/0/ ) || ( $modify =~ s/(?<!0)0([1-9]|$)/$1/ ); return $keep . $modify; }
    It passes the validation test, but it may fail further tests since the rules aren't clearly spelled out.
      (for the regex fanboys:
      sub compress_numstring { substr($_[0], 2) =~ s/(^0+|(0)0+|(?<!0)0([1-9]|$))/$2$3/; return $_[0]; }

      does the same thing as the other function, but with more job security)
      Unfortunately, 'use warnings' will complain.
Re: Stuck in komplexer regex, at least for me
by ultibuzz (Monk) on Mar 26, 2007 at 19:17 UTC

    sorry for the bad explanation, i try to explain it better
    the following regex will produce this

    s/(?<=\d{2})0*(?<=\d{4})0+// output 215000007801 -> 2157801 300000324002 -> 30324002 890000457651 -> 89457651 210004563401 -> 214563401 201045139158 -> 201045139158

    these 2 outputs are wrong
    201045139158 -> 201045139158
    215000007801 -> 2157801
    shoud be
    215000007801 -> 21507801
    201045139158 -> 20145139158


    now i try to explain the rule better,counting starts at 0
    the 0 , 1 digits shoud be untouched
    if the 2 digit is a 0 and the next digit is a 0 remove all 0 untill first non 0
    if the 2 digit a 0 and the next is a non 0 remove the 0
    if the 3 digit is a 0 and follows by a 0 then remove all 0 except the 0 at 3 digit
    if the 3 digit is a 0 and the next digit is a non 0 remove only the 0
    if the 4 digit is a 0 remove it and all following 0 untill non 0

    i hope that makes it a bit clearer, i know its kinda confusing

    kd ultibuzz

    i will test the help u all already given tomorrow at office,thx alot

      Hi, You need an anchor '^' to make sure the matchings start from the beginning of your strings..and your requirements might be written into two patterns which would be much easier to understand(the order of two s/// expressions matters)..
      #!/usr/bin/perl
      use warnings;
      use strict;
      
      while(<DATA>) {
          s/^(\d\d[1-9])0(?=[1-9])/$1/;
          s/^(\d\d(?:[1-9]0)?)0+/$1/;
      
          print;
      }
      
      __DATA__
      215000007801
      300000324002
      890000457651
      210004563401
      201045139158
      
      Regards,
      Xicheng

        your right 2 patterns look easyer,

        i am testing atm 5 million numbers and afterwards they will check with the system, then i know if all fit are some fail.

        same testing atm for the regex fanboy pattern ;)

        thx alot for the quick and very good help
        kd ultibuzz



        UPDATE:there is a problem with numbers like

        215100069395
        215100069395
        215100153821
        they shoud change into
        215169395
        215169395
        2151153821
        
        but they remained unchanged

        UPDATE 2:i have it running with an if loop, if digit 2 or 3 is 0 use new pattern else my old one ^^
        this isn't nice at all and i don't like it ;)

Re: Stuck in komplexer regex, at least for me
by chrism01 (Friar) on Mar 26, 2007 at 22:53 UTC
    Just out of curiosity, can you tell us what this does/is for in real life?
    Seems like a wierd set of rules

    Cheers
    Chris

      sure
      we get 12 digit numbers from another firm these numbers can vary from 4 -12 digits and are fileld up with 0
      the problem is we get an letter how they fill up the numbers but they miss several possibilitys so i get the problem here decoding this 12 digit numbers to remove the right 0 and not 0 that are part of the digit.

      the explanation was 2 sentencis wich dosnt really help for anything ^^
      and we are not in the position to force a change in their process then we woud get these numbers in paper form so we need to do workarounds to get the right numbers, or atleast not many failurs :D

      kd ultibuzz

        I feel the need to ask, might it be easier to convert your numbers to the other firm's numbers? Or maybe not try to convert anyone's numbers, but use both yours and theirs and match them up using a database or something? Maybe if I understood how and why they seem to add 0 in very odd places, it would be easier to figure out how to deal with this. I'm not a regex person, so this is my stab at this: I'm sure there is a serious performance hit for not using a regex.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://606597]
Approved by kyle
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (10)
As of 2014-09-23 22:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (241 votes), past polls