Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^3: I think regex Should Help Here... but How!?

by ozboomer (Friar)
on Feb 16, 2014 at 03:37 UTC ( [id://1075073]=note: print w/replies, xml ) Need Help??


in reply to Re^2: I think regex Should Help Here... but How!?
in thread I think regex Should Help Here... but How!?

(see the updates I've made to the OP)

Sure, the ugly code produces the output I want -- that the supplied category (albeit, with an additional trailing dot), which comes from the list of possible categories, does match the category in the record (again, doctored with a trailing dot), when considering the 'level of matching' required.

Although it's not my current application, perhaps think of the number of postings in the Usenet hierarchies. The data might be:

comp.lang.c,100
comp.lang.beta,23
comp.lang.java.help,123
comp.object,12
alt.3d,12
alt.animals.llama,1423
...

The types of question I'm looking to answer:

"How many postings are there in the 'comp' hierarchy and below?"

For this question, we can say:

Matches: comp, comp.lang, comp.lang.c... (the group names all start with 'comp')

Do not Match: alt, alt.3d, alt.animals.llama... (the group names do not start with 'comp')

"How many postings are there in the 'alt.*' hierarchy and below?"

For this question, we can say:

Matches: alt.3d, alt.animals.llama... (the group names all start with 'alt.{something}' and {something} is non-null)

Do not match: alt, comp, comp.lang.c... (the group names do NOT start with 'alt.{something}' and {something} is non-null)

Conceptually, it's such a simple thing: "Does RECORD CAT start with the TEST string?" ...

TEST         RECORD          MATCHES?

comp         comp.lang       Yes
comp         comp            Yes
comp         comp.hw         No

comp.lang    comp            No
comp.lang    comp.lang       Yes
comp.lang    comp.lang.c     Yes
comp.lang    comp.lang.c++   Yes
comp.lang    alt             No
comp.lang    alt.test        No

This is why I was thinking there must be a simple regex thing to say "give me the first 2 items from the category string" (using parentheses and a dot or end-of-string as the separator - 'comp.lang') and I'll compare that to the start of the record string (in a simple regex: /^$rec_string\.*$/ or something).....

...shaking his head in bewilderment...

  • Comment on Re^3: I think regex Should Help Here... but How!?

Replies are listed 'Best First'.
Re^4: I think regex Should Help Here... but How!?
by AnomalousMonk (Archbishop) on Feb 16, 2014 at 14:50 UTC

    I had thought I had given a 'pure' regex approach (for what its worth) that satisfied your original request, one that can easily be adjusted for the terminal-dot versus no-terminal-dot alternatives, which of these you require being a point I still do not quite grasp. Your response to kcott's reply below indicates you are satisfied with the code you have now, so I will not comment further along these lines.

    However, I would encourage you to become familiar with regular expression techniques and be wildered no more! In addition to the valuable links given by others in this thread, I have found Jeffrey Friedl's (admittedly rather expensive) book Mastering Regular Expressions to be very helpful; see his site.

    Update: Or perhaps I should have said "be less wildered", for even though I've been using and studying regexes a long time now, I still regularly trip over them and fall flat on my face! But hang in there and enlightenment will come.

Re^4: I think regex Should Help Here... but How!?
by AnomalousMonk (Archbishop) on Feb 16, 2014 at 20:55 UTC

    Nope, just couldn't leave it alone. Here's a solution to analyzing the Usenet hierarchies data. Note this is cumulative: repeated entries add together.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1075073]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2024-04-25 06:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found