Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

anish_batra: Here is why your solution is broken in almost the same way as the original poster's code. This will be a slight oversimplification.

The regexp engine loves to find matches. It's its duty to find them. You are giving it all the tools it needs to match the following string:

$data = "Johnson%Andrew%BX321%Accountant";

Here's why:

  • .* will greedily match as big of a string as possible (or nothing at all), so long as a '%' character comes next. In this case, on first pass, it will match "Johnson%Andrew%BX321", stopping just before the "%Accountant" portion of the string...
  • Next, the RE engine moves on to the second .*% term. Oh oh.... for this to match, it needs to backtrack to the first subexpression again.
  • Back to the first sub-expression... The original .*% has been told it was too greedy. Now it tries again and this time matches, "Johnson%Andrew%".
  • Now the second subexpression is allowed to match "BX321%"
  • Finally, 'A' is matched from "Accountant".
  • The regexp engine has done its job: It found a way to make "Johnson%Andrew%BX321%Accountant" match.

But that's not what the OP actually wanted to have happen. He wanted strings like "Johnson%Andrew%AX321%Accountant" to pass, and "Johnson%Andrew%BX321%Accountant" to fail. You simply showed him another way to get the wrong result again. And, in fact, your solution results in some backtracking within the RE engine, so not only does it provide false positives, it does so inefficiently.

Either you didn't understand the question, or you did understand it, but didn't test your code. There's no shame in considering a solution that doesn't work. The problem is when it gets posted. This is the third or fourth answer in a row that you've provided which fails to meet the OP's simple requirements. My suggestion always test your code with a variety of possibly valid data-sets before posting answers... at least until accurate responses become second nature. To be honest, I'm still hesitant to post regexp responses until after I've tested them -- they're so easy to get wrong. But the lesson should be test your solutions before posting.

The Monastery welcomes learners. That's one of the biggest reasons we're here. We all started somewhere. And answering questions is a great way to consider new problems and to learn from them. I'm not suggesting that you refrain from answering. I'm suggesting (and as a fellow PerlMonk asking) that you test your code before posting it. One doesn't learn much from posting broken solutions. One learns by studying how to create a valid solution.

Furthermore, it does your fellow PerlMonks a disservice posting broken code. Sure, there's more than one way to do it. But another newcomer may not immediately recognize that your solutions have bugs, may use them, and may find out the hard way. That's not good for the user, for Perl, or for the Perl community.

One suggestion I have... if you're unsure about a solution, you might even consider chatting about it in the CB before posting it. Put it in your scratchpad and say, "Is [pad://anish_batra] a valid solution to [id://123456]?" If it's a good idea, post it. If it's wrong, the folks in the chatterbox will probably gladly explain why.


Dave


In reply to Re^2: Regex help needed by davido
in thread Regex help needed by ghosh123

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2024-03-28 19:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found