Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Regular Expression Builder

by demerphq (Chancellor)
on Aug 30, 2002 at 16:19 UTC ( #194158=note: print w/ replies, xml ) Need Help??


in reply to Regular Expression Builder

I doubt that there is a robust way to do this, but heres a really simple way:

my $string="123 abcdef"; $string=~s{(\d+)|(\w+)|(\s+)} { defined($1) ? '\\d{'.length($1).'}' : defined($2) ? '\\w{'.length($2).'}' : '\\s{'.length($3).'}' }ge; print $string; __END__ \d{3}\s{1}\w{6}
But i dont think this will scale very well... (and probably has subtle problems anyway)

Yves / DeMerphq
---
Software Engineering is Programming when you can't. -- E. W. Dijkstra (RIP)


Comment on Re: Regular Expression Builder
Download Code
Re: Re: Regular Expression Builder
by Anonymous Monk on Aug 30, 2002 at 17:05 UTC
    But i dont think this will scale very well... (and probably has subtle problems anyway)

    One quibble is that because \d is a subset of \w then a string such as "abc123def" will get \w{9} in your version. Here's a slightly improved version (for some definition of improved)

    my $string=" \aabc123def!*#\n"; $string=~s{ ([[:digit:]]+) |([[:alpha:]]+) |([[:punct:]]+) |([[:space:]]+) |([[:cntrl:]]+) |(.) } { defined($1) ? '[[:digit:]]{'.length($1).'}' : defined($2) ? '[[:alpha:]]{'.length($2).'}' : defined($3) ? '[[:punct:]]{'.length($3).'}' : defined($4) ? '[[:space:]]{'.length($4).'}' : defined($5) ? '[[:cntrl:]]{'.length($5).'}' : "\Q$+\E" # anything else? }gex; print $string;

    But it still has problems (for example, \n is in both :space: and :cntrl: so "\n\a" produces [[:space:]]{1}[[:cntrl:]]{1}, but "\a\n" produces [[:cntrl:]]{2}).

      One quibble is that because \d is a subset of \w then a string such as "abc123def" will get \w{9} in your version.

      Yup. But personally I consider that a feature not a bug. :-) After all ldkjdlkjf2098kklls probably isnt [[:alpha:]]+\d+[[:alpha:]]+

      But we are both in agreement that there isnt a good way to do this, although as we both have shown there are a variety of bad ways to do it... BTW, is the . really necessary? I dont think it is as the s/// will just skip the char if it doesnt match.

      Oh and I considered using something like you post here, but I fgured that considering I tend not to use the POSIX char classes that much probably others wouldnt either.

      :-)

      Yves / DeMerphq
      ---
      Software Engineering is Programming when you can't. -- E. W. Dijkstra (RIP)

        But we are both in agreement that there isnt a good way to do this

        Agreed. Anything that tries to generalize beyond "\Q$string\E" requires a variety of assumptions.

        BTW, is the . really necessary? I dont think it is as the s/// will just skip the char if it doesnt match.

        Just trying to be careful :-) In case I neglected something with those classes, and that something also required escaping, then just leaving it in the string wouldn't result in a valid re. (I don't tend to use POSIX char classes either, so I wasn't sure exactly how inclusive I was being).

        After all ldkjdlkjf2098kklls probably isnt [[:alpha:]]+\d+[[:al­pha:]]+
        I did play around once with a small script that did these things, when I was bored, and no, it probably isn't. But what I did was take several strings and tried to derive a common expression out of them - first looking for similarities, like a sequence of numbers in the middle, or whitespace at the end, or whatever, and then built sub-regexes from the parts. I think starting with splitting on non-words and such gave so-so results for things like email addresses..

        Of course, I never really got any really usable results, but it was a fun exercise. :) What I wanted to say was that do decide which it should be you need a decent sample of several strings that should all match. Then it is sometimes possible to get something to build upon. Maybe. :)


        You have moved into a dark place.
        It is pitch black. You are likely to be eaten by a grue.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://194158]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (4)
As of 2014-11-01 11:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (229 votes), past polls