Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: Re: Regular Expression Builder

by Anonymous Monk
on Aug 30, 2002 at 17:05 UTC ( #194182=note: print w/ replies, xml ) Need Help??


in reply to Re: Regular Expression Builder
in thread Regular Expression Builder

But i dont think this will scale very well... (and probably has subtle problems anyway)

One quibble is that because \d is a subset of \w then a string such as "abc123def" will get \w{9} in your version. Here's a slightly improved version (for some definition of improved)

my $string=" \aabc123def!*#\n"; $string=~s{ ([[:digit:]]+) |([[:alpha:]]+) |([[:punct:]]+) |([[:space:]]+) |([[:cntrl:]]+) |(.) } { defined($1) ? '[[:digit:]]{'.length($1).'}' : defined($2) ? '[[:alpha:]]{'.length($2).'}' : defined($3) ? '[[:punct:]]{'.length($3).'}' : defined($4) ? '[[:space:]]{'.length($4).'}' : defined($5) ? '[[:cntrl:]]{'.length($5).'}' : "\Q$+\E" # anything else? }gex; print $string;

But it still has problems (for example, \n is in both :space: and :cntrl: so "\n\a" produces [[:space:]]{1}[[:cntrl:]]{1}, but "\a\n" produces [[:cntrl:]]{2}).


Comment on Re: Re: Regular Expression Builder
Select or Download Code
Re: Re: Re: Regular Expression Builder
by demerphq (Chancellor) on Aug 30, 2002 at 17:18 UTC
    One quibble is that because \d is a subset of \w then a string such as "abc123def" will get \w{9} in your version.

    Yup. But personally I consider that a feature not a bug. :-) After all ldkjdlkjf2098kklls probably isnt [[:alpha:]]+\d+[[:alpha:]]+

    But we are both in agreement that there isnt a good way to do this, although as we both have shown there are a variety of bad ways to do it... BTW, is the . really necessary? I dont think it is as the s/// will just skip the char if it doesnt match.

    Oh and I considered using something like you post here, but I fgured that considering I tend not to use the POSIX char classes that much probably others wouldnt either.

    :-)

    Yves / DeMerphq
    ---
    Software Engineering is Programming when you can't. -- E. W. Dijkstra (RIP)

      But we are both in agreement that there isnt a good way to do this

      Agreed. Anything that tries to generalize beyond "\Q$string\E" requires a variety of assumptions.

      BTW, is the . really necessary? I dont think it is as the s/// will just skip the char if it doesnt match.

      Just trying to be careful :-) In case I neglected something with those classes, and that something also required escaping, then just leaving it in the string wouldn't result in a valid re. (I don't tend to use POSIX char classes either, so I wasn't sure exactly how inclusive I was being).

      After all ldkjdlkjf2098kklls probably isnt [[:alpha:]]+\d+[[:al­pha:]]+
      I did play around once with a small script that did these things, when I was bored, and no, it probably isn't. But what I did was take several strings and tried to derive a common expression out of them - first looking for similarities, like a sequence of numbers in the middle, or whitespace at the end, or whatever, and then built sub-regexes from the parts. I think starting with splitting on non-words and such gave so-so results for things like email addresses..

      Of course, I never really got any really usable results, but it was a fun exercise. :) What I wanted to say was that do decide which it should be you need a decent sample of several strings that should all match. Then it is sometimes possible to get something to build upon. Maybe. :)


      You have moved into a dark place.
      It is pitch black. You are likely to be eaten by a grue.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://194182]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (12)
As of 2015-07-02 12:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (37 votes), past polls