Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

(jeffa) 5Re: More Variable length regex issues

by jeffa (Bishop)
on Jun 09, 2003 at 04:36 UTC ( [id://264234]=note: print w/replies, xml ) Need Help??


in reply to Re: (jeffa) 3Re: More Variable length regex issues
in thread More Variable length regex issues

You keep on using that word 'split' ... i do not think you know what it means. ;) Consider the following:
my $str = 'foo,bar,moo,cow'; my @value = $str =~ m/(\w+)\,?/g; print "@value\n"; @value = split(',',$str); print "@value\n"
They both achieve the same results, and guess which one is easier to understand?

You say have non-repeatable fields, how does using a regex make this easier than split? What do you think split uses to split? A regex! Besides, oro has a family of split functions. You could always do a series of splits if multiple delimiters are used:

my $str = 'a,b,c:d,e,f:g,h,i'; my @part = split(':',$str); foreach my $part (@part) { my @subpart = split(',',$part); print "@subpart\n"; }
The split functions found in the org.apache.oro package can do this, you just have to jump through more hoops. ;) Not that it matters, but one of my beefs about Java is not being able to process lists easily like you can in Perl:
print $_,$/ for map split(',',$_), split(':', $str);
Best of luck.

jeffa

L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)

Replies are listed 'Best First'.
Re: (jeffa) 5Re: More Variable length regex issues
by dextius (Monk) on Jun 09, 2003 at 05:14 UTC
    I am not clearly explaining this issue.. Your examples are not exactly detailing my criteria because I am not fully explaining my problem, I apologize.

    I have a string of characters that use the same delimiter. Some of the fields are mandatory, some are optional, and some may be repeated infinitely. I want to extract those values AND validate the fields all at once within a single regular expression. I want these values to be available to me afterward. A simple example..

    use Data::Dumper; my $foo = "one,123,a s d f,a,b,c,d,e,f,g,h"; my @bar = $foo =~ /^([a-z]{3}),([0-9]{3}),([a-z\s]{1,7}),(?:([a-z]),|( +[a-z]$)){1,}/; print Dumper(\@bar);

    Consider everything after the 3rd element to repeat, possibly to infinity, but we need to make sure they are single characters, otherwise I want the entire regex to fail immediately.

    Again, thank you for your time, you have spent more than enough time working with me, and I very much so appreciate it..

      Whoa, whoa, whoa there. Why do you have the (arbitrary) requirement that everything has to be done in the regex? IMHO, long regexen are what lead to the stereotype of perl looking like line-noise. I would suggest using split, and then validating the elements that you need to validate in separate statements. If you'd like, you can gather up your validation and pack it in to a subroutine. Just try to think of the poor bastard who has to come behind you and maintain the code.

      Also, minor nit, "infinite" ne "arbitrary". If there were an infinite number of fields, not only would you have run out of disk space by now, but you couldn't do anything with it, since you couldn't hold it in memory. ;) Arbitrary means "as much as you want", whereas infinite means "without end".

      thor

        Why do we have to apologise to those that don't take the time to see beyond the terse but immensely powerful notation that makes up the language 'perl 5 regex'?

        My Dad would never have managed to wrap his brain around [(x-h)^2]/a^2 + [(y-k)^2]/b^2 = 1., although that didn't stop him from accurately (within obvious tolorances) cutting an oval from a piece of 1/2 inch ply using nothing but a piece of string, two nails and a piece of chalk.

        To him, the whole concept of algebraic notation was an anathema, but it's doubtful if there are many people reading this for whom that formula isn't eminently readable. The difference? Education. My father left school aged 12 and started his 10 years apprenticship as a carpenter aged 14. He never had the opportunity to learn algebra.

        The following short extract from here

        When 235U captures a neutron, the resulting 236U nucleus emits g-rays as it deexcites to the ground state about 15% of the time, and undergoes fission about 85%. The fission process is somewhat analogous to the oscillations of a liquid drop. Using the liquid drop model Bohr and Wheeler calculated the critical energy Ec needed by the 236U nucleus to undergo fission. For this nucleus, the critical energy is 5.3 MeV, which is less than the 6.4 MeV of excitation energy produced when 235U captures a neutron. The capture of a neutron by 235U therefore produces an excited state of 236U that has more than enough energy to break apart. On the other hand, the critical energy for fission of the 239U nucleus is 5.9 MeV. The capture of a neutron by a 238U nucleus produces an excitation energy of only 5.2 MeV. Therefore, when a neutron is captured by 238U to form 239U, the excitation energy is not great enough for fission to occur. In this case, the excited 239U> nucleus deexcites by g-emission, and then decays to Np239 by b-decay, and then again to 239Pu by b-decay.

        A fissioning nucleus can break into 2 medium-mass fragments in many different ways. Depending on the particular reaction, 1, 2 or 3 neutrons may be emitted. The average number of neutrons emitted in the fission of 235Uis about 2.5.

        describes (roughly) the same thing as n + 235U --> 141Ba + 92Kr + 3n

        Now they both mean precious little to me, but to those that live and work in the field of nuclear physics, I'm pretty sure that the latter concise form is an infinitely less unweildy and more practical to work with in correspondance, notes, reports and papers as well as in aggregate works in which this formula is only a part.

        In the same way, regexes are simply a short-hand notation that allow the capturing of complex aggregate programming concepts in a concise, weildy fashion.

        For people to dismiss regexes, much less the whole of perl(*), as "line noise" because they haven't bothered to take the time to understand them, and the power they represent, requires no apology from us, but from them.

        (*) As one who did exactly this, professionally, twice, I hereby apologise to the perl community at large, and Mr. Wall in particular for this heinous crime!


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


      Do you still think that you have to perform this task with one regular expression? (and a horribly, unreadable, broken one at that.) split is perfectly cabable of stopping after it finds the, say, 3rd element. Then you can do something different with the rest:
      use Data::Dumper; my $foo = 'one,123,a s d f,a,b,c,bad,e,f,g,h'; my @first = split(',',$foo,4); my @rest = split(',',pop @first); print Dumper \@first, \@rest; for (0..$#rest) { die "index $_ is bad: '$rest[$_]'" if length($rest[$_]) != 1; }
      Think in chunks. Don't try to swallow the whole pill at once.

      jeffa

      L-LL-L--L-LL-L--L-LL-L--
      -R--R-RR-R--R-RR-R--R-RR
      B--B--B--B--B--B--B--B--
      H---H---H---H---H---H---
      (the triplet paradiddle with high-hat)
      

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://264234]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (6)
As of 2024-04-23 13:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found