samizdat has asked for the
wisdom of the Perl Monks concerning the following question:
Hi, all 
Esteemed codegrokkers, I need to create a regex which will parse either xyz = 'some long exp'or xyz = some long exp with no more than one space separating partsfrom a long string containing multiple examples of this.
TIA for your assistance! :D
UPDATE: example as requested:
drsubc = agauss(0, 1, 3) delm1 = '0 + 0.045u*distm1' delm2 = '0 + 0.07u*distm2' delm3 = '0 + 0.07u*distm3' delm4 = '0 + 0.07u*distm4' delmt = '0 + 0.07u*distmt' delml = '0.16u + 0.43u*distml' delam = '0.32u + 0.86u*distam' dele1 = '0 + 0.25u*diste1' dele2 = '0 + 0.25u*diste2' delma = '0.16u + 0.6u*distma' pmsxt = 'npmsxt + 12.5u*dpmsxt' tih = 0.35u capct = '0.50u + 0.13u*xdcapct' capcti = '0.55u + 0.13u*xdcapct' m1t = '0.41u + 0.05u*xdm1t' m1ti = '0.36u + 0.05u*xdm1t' m2t = '0.48u + 0.057u*dm2t' m3t = '0.48u + 0.057u*dm3t' m4t = '0.48u + 0.057u*dm4t' mtt = '0.48u + 0.057u*dmtt' qtt = '0.242u + 0.0202u*dqtt' htt = '0.242u + 0.0202u*dhtt' mlt = '2.0u + 0.2u*dmlt' amt = '4.0u + 0.4u*damt' e1t = '3.0u + 0.5u*de1t' e2t = '4.0u + 0.5u*xde1mat' mat = '4.0u + 0.4u*dmat' m1m2t = '0.35u + 0.05u*dm1m2t'
The goal is to separate out all the parameters (or function definitions) and their value expressions.
Re: how to find what's not there with a regex?
by pbeckingham (Parson) on Aug 24, 2005 at 13:42 UTC

#! /usr/bin/perl
use strict;
use warnings;
while (<DATA>)
{
chomp;
print "[$_]\n" for /\s*([^=]+\s+=\s+'[^']+'\S+\s+=(?:\s+[^=]+)+(?:(
+?=\s+\S+\s+=)$))/msg;
}
__DATA__
xyz = 'some long exp' xyz = some long exp with no more than one
+space separating parts a = b c d = e
drsubc = agauss(0, 1, 3) delm1 = '0 + 0.045u*distm1'
+ delm2 = '0 + 0.07u*distm2' delm3 = '0 + 0.07u*distm3'
+ delm4 = '0 + 0.07u*distm4' delmt = '0
+ + 0.07u*distmt' delml = '0.16u + 0.43u*distml'
+ delam = '0.32u + 0.86u*distam' dele1 = '0 + 0.2
+5u*diste1' dele2 = '0 + 0.25u*diste2'
+ delma = '0.16u + 0.6u*distma' pmsxt = 'npmsxt + 12.5
+u*dpmsxt' tih = 0.35u capct = '0.50u +
+ 0.13u*xdcapct' capcti = '0.55u + 0.13u*xdcapct'
+ m1t = '0.41u + 0.05u*xdm1t' m1ti = '0.36u + 0.05u*xdm1t'
+ m2t = '0.48u + 0.057u*dm2t' m3t = '0.4
+8u + 0.057u*dm3t' m4t = '0.48u + 0.057u*dm4t'
+ mtt = '0.48u + 0.057u*dmtt' qtt = '0.242u +
+0.0202u*dqtt' htt = '0.242u + 0.0202u*dhtt' m
+lt = '2.0u + 0.2u*dmlt' amt = '4.0u + 0.4u*dam
+t' e1t = '3.0u + 0.5u*de1t' e2t = '
+4.0u + 0.5u*xde1mat' mat = '4.0u + 0.4u*dmat'
+ m1m2t = '0.35u + 0.05u*dm1m2t'
Generates the output:
[xyz = 'some long exp']
[xyz = some long exp with no more than one space separating parts]
[a = b c]
[d = e]
[drsubc = agauss(0, 1, 3) ]
[delm1 = '0 + 0.045u*distm1']
[delm2 = '0 + 0.07u*distm2']
[delm3 = '0 + 0.07u*distm3']
[delm4 = '0 + 0.07u*distm4']
[delmt = '0 + 0.07u*distmt']
[delml = '0.16u + 0.43u*distml']
[delam = '0.32u + 0.86u*distam']
[dele1 = '0 + 0.25u*diste1']
[dele2 = '0 + 0.25u*diste2']
[delma = '0.16u + 0.6u*distma']
[pmsxt = 'npmsxt + 12.5u*dpmsxt']
[tih = 0.35u ]
[capct = '0.50u + 0.13u*xdcapct']
[capcti = '0.55u + 0.13u*xdcapct']
[m1t = '0.41u + 0.05u*xdm1t']
[m1ti = '0.36u + 0.05u*xdm1t']
[m2t = '0.48u + 0.057u*dm2t']
[m3t = '0.48u + 0.057u*dm3t']
[m4t = '0.48u + 0.057u*dm4t']
[mtt = '0.48u + 0.057u*dmtt']
[qtt = '0.242u + 0.0202u*dqtt']
[htt = '0.242u + 0.0202u*dhtt']
[mlt = '2.0u + 0.2u*dmlt']
[amt = '4.0u + 0.4u*damt']
[e1t = '3.0u + 0.5u*de1t']
[e2t = '4.0u + 0.5u*xde1mat']
[mat = '4.0u + 0.4u*dmat']
[m1m2t = '0.35u + 0.05u*dm1m2t']
pbeckingham  typist, perishable vertebrate.
Almost. I think you're on the right track, because your solution's caught all but drsubc and tih correctly. Let me study what you've done, and thanks very much!
Re: how to find what's not there with a regex?
by Eimi Metamorphoumai (Deacon) on Aug 24, 2005 at 13:29 UTC

Please read How (Not) To Ask A Question. In particular, could you please post some sample data, along with exactly what parts you're trying to extract, and what the criteria are? Reread your question from the point of view of someone who doesn't already know what you want, and I think you'll see that you're leaving out pretty much everything we need to know.
Update: Now there's some data, but still no real specification of how your parameters are separated or what's really going on here. It looks like this may do what you want, but if not you'll have to step back for a moment and think about what you're doing.
#!/usr/bin/perl l
use strict;
use warnings;
use Data::Dumper;
my %variables;
undef $/;
$_ = <DATA>;
while (s/\s*(\w+)\s*=\s*([^=]+)\s*\z//){
$variables{$1} = $2;
}
print Dumper(\%variables);
__DATA__
drsubc = agauss(0, 1, 3) delm1 = '0 + 0.045u*distm1'
+ delm2 = '0 + 0.07u*distm2' delm3 = '0 + 0.07u*distm3'
+ delm4 = '0 + 0.07u*distm4' delmt = '0 + 0
+.07u*distmt' delml = '0.16u + 0.43u*distml' delam = '0
+.32u + 0.86u*distam' dele1 = '0 + 0.25u*diste1'
+ dele2 = '0 + 0.25u*diste2' delma = '0.16u + 0.
+6u*distma' pmsxt = 'npmsxt + 12.5u*dpmsxt' ti
+h = 0.35u capct = '0.50u + 0.13u*xdcapct'
+ capcti = '0.55u + 0.13u*xdcapct' m1t = '0.41u + 0.
+05u*xdm1t' m1ti = '0.36u + 0.05u*xdm1t' m2t = '
+0.48u + 0.057u*dm2t' m3t = '0.48u + 0.057u*dm3t'
+ m4t = '0.48u + 0.057u*dm4t' mtt = '0.48u
++ 0.057u*dmtt' qtt = '0.242u + 0.0202u*dqtt'
+ htt = '0.242u + 0.0202u*dhtt' mlt = '2.0u + 0.2u*dmlt
+' amt = '4.0u + 0.4u*damt' e1t =
+ '3.0u + 0.5u*de1t' e2t = '4.0u + 0.5u*xde1mat'
+ mat = '4.0u + 0.4u*dmat' m1m2t = '0.35u +
+0.05u*dm1m2t'
Re: how to find what's not there with a regex?
by inman (Curate) on Aug 24, 2005 at 15:43 UTC

Reversing the initial input makes the regex easier. The resulting array needs reversing and every item in the array needs reversing.
my $data = reverse <DATA>;
my @answers = map {scalar reverse $_}
reverse $data =~ /(.*?\s*?=\s.*?\w+)/g;
print "$_\n" foreach (@answers);
That's brilliant. You're absolutely right, that makes it much simpler!!!
 [reply] 
Re: how to find what's not there with a regex?
by ikegami (Pope) on Aug 24, 2005 at 13:50 UTC

This works with your data:
while (<>) {
chomp;
while (
/
(\w+) # Name ($1)
\s* # Spaces (optional)
= # Equal sign
\s* # Spaces (optional)
(
' # Quote
[^']* # Nonquotes
' # Quote
 # or
[^'\s]+ # Nonspacesquotes
)
/xg
) {
my ($name, $expr) = ($1, $2);
$expr = substr($expr, 1, 1)
if substr($expr, 0, 1) eq "'";
print("var: $name, expr: $expr\n");
}
}
Updated to catch unquoted expressions.
Output:
var: drsubc, expr: agauss(0, < Doesn't work :(
var: delm1, expr: 0 + 0.045u*distm1
var: delm2, expr: 0 + 0.07u*distm2
var: delm3, expr: 0 + 0.07u*distm3
var: delm4, expr: 0 + 0.07u*distm4
var: delmt, expr: 0 + 0.07u*distmt
var: delml, expr: 0.16u + 0.43u*distml
var: delam, expr: 0.32u + 0.86u*distam
var: dele1, expr: 0 + 0.25u*diste1
var: dele2, expr: 0 + 0.25u*diste2
var: delma, expr: 0.16u + 0.6u*distma
var: pmsxt, expr: npmsxt + 12.5u*dpmsxt
var: tih, expr: 0.35u < Works :)
var: capct, expr: 0.50u + 0.13u*xdcapct
var: capcti, expr: 0.55u + 0.13u*xdcapct
var: m1t, expr: 0.41u + 0.05u*xdm1t
var: m1ti, expr: 0.36u + 0.05u*xdm1t
var: m2t, expr: 0.48u + 0.057u*dm2t
var: m3t, expr: 0.48u + 0.057u*dm3t
var: m4t, expr: 0.48u + 0.057u*dm4t
var: mtt, expr: 0.48u + 0.057u*dmtt
var: qtt, expr: 0.242u + 0.0202u*dqtt
var: htt, expr: 0.242u + 0.0202u*dhtt
var: mlt, expr: 2.0u + 0.2u*dmlt
var: amt, expr: 4.0u + 0.4u*damt
var: e1t, expr: 3.0u + 0.5u*de1t
var: e2t, expr: 4.0u + 0.5u*xde1mat
var: mat, expr: 4.0u + 0.4u*dmat
var: m1m2t, expr: 0.35u + 0.05u*dm1m2t
That works with the quoted variant, ikegami, but not the unquoted variant, like the first function. How do I say 'anything including spaces up to the first occurrence of more than one space in a row'?
 [reply] 

while (<>) {
chomp;
while (
/
(\w+) # An identifier.
\s* = \s* # Equal with opt spaces.
(
(?:
(?! \s+ \w+ \s* = ) # Stop if we see the next formula.
. # A chararacter.
)+
)
/xg
) {
my ($name, $expr) = ($1, $2);
$expr = substr($expr, 1, 1)
if substr($expr, 0, 1) eq "'";
print("var: $name, expr: $expr\n");
}
}
Output:
var: drsubc, expr: agauss(0, 1, 3) < Works
var: delm1, expr: 0 + 0.045u*distm1
var: delm2, expr: 0 + 0.07u*distm2
var: delm3, expr: 0 + 0.07u*distm3
var: delm4, expr: 0 + 0.07u*distm4
var: delmt, expr: 0 + 0.07u*distmt
var: delml, expr: 0.16u + 0.43u*distml
var: delam, expr: 0.32u + 0.86u*distam
var: dele1, expr: 0 + 0.25u*diste1
var: dele2, expr: 0 + 0.25u*diste2
var: delma, expr: 0.16u + 0.6u*distma
var: pmsxt, expr: npmsxt + 12.5u*dpmsxt
var: tih, expr: 0.35u < Works
var: capct, expr: 0.50u + 0.13u*xdcapct
var: capcti, expr: 0.55u + 0.13u*xdcapct
var: m1t, expr: 0.41u + 0.05u*xdm1t
var: m1ti, expr: 0.36u + 0.05u*xdm1t
var: m2t, expr: 0.48u + 0.057u*dm2t
var: m3t, expr: 0.48u + 0.057u*dm3t
var: m4t, expr: 0.48u + 0.057u*dm4t
var: mtt, expr: 0.48u + 0.057u*dmtt
var: qtt, expr: 0.242u + 0.0202u*dqtt
var: htt, expr: 0.242u + 0.0202u*dhtt
var: mlt, expr: 2.0u + 0.2u*dmlt
var: amt, expr: 4.0u + 0.4u*damt
var: e1t, expr: 3.0u + 0.5u*de1t
var: e2t, expr: 4.0u + 0.5u*xde1mat
var: mat, expr: 4.0u + 0.4u*dmat
var: m1m2t, expr: 0.35u + 0.05u*dm1m2t
How do I say 'anything including spaces up to the first occurrence of more than one space in a row'?
A literal translation (untested) would be
/(?>.*?(?= ))/s.
 [reply] 

I fixed it while you were replying :)
 [reply] 
Re: how to find what's not there with a regex?
by BrowserUk (Pope) on Aug 24, 2005 at 13:57 UTC

#! perl slw
use strict;
while( <DATA> ) {
print "$1 : ", $2$3 while m[
(\w+) ## the name
\s+=\s+ ## the =
(?: ## Either
' ( [^']+ ) ' ## all the nonquotes between quotes
 ## or
(.*?) ## the minimum
)
\s{2,} ## absorb the two or more spaces
]gx;
}
=results
P:\test>junk
drsubc : agauss(0, 1, 3)
delm1 : 0 + 0.045u*distm1
delm2 : 0 + 0.07u*distm2
delm3 : 0 + 0.07u*distm3
delm4 : 0 + 0.07u*distm4
delmt : 0 + 0.07u*distmt
delml : 0.16u + 0.43u*distml
delam : 0.32u + 0.86u*distam
dele1 : 0 + 0.25u*diste1
dele2 : 0 + 0.25u*diste2
delma : 0.16u + 0.6u*distma
pmsxt : npmsxt + 12.5u*dpmsxt
tih : 0.35u
capct : 0.50u + 0.13u*xdcapct
capcti : 0.55u + 0.13u*xdcapct
m1t : 0.41u + 0.05u*xdm1t
m1ti : 0.36u + 0.05u*xdm1t
m2t : 0.48u + 0.057u*dm2t
m3t : 0.48u + 0.057u*dm3t
m4t : 0.48u + 0.057u*dm4t
mtt : 0.48u + 0.057u*dmtt
qtt : 0.242u + 0.0202u*dqtt
htt : 0.242u + 0.0202u*dhtt
mlt : 2.0u + 0.2u*dmlt
amt : 4.0u + 0.4u*damt
e1t : 3.0u + 0.5u*de1t
e2t : 4.0u + 0.5u*xde1mat
mat : 4.0u + 0.4u*dmat
m1m2t : 0.35u + 0.05u*dm1m2t
=cut
__DATA__
drsubc = agauss(0, 1, 3) delm1 = '0 + 0.045u*distm1'
+ delm2 = '0 + 0.07u*distm2' delm3 = '0 + 0.07u*distm3'
+ delm4 = '0 + 0.07u*distm4' delmt = '0
+ + 0.07u*distmt' delml = '0.16u + 0.43u*distml'
+ delam = '0.32u + 0.86u*distam' dele1 = '0 + 0.2
+5u*diste1' dele2 = '0 + 0.25u*diste2'
+ delma = '0.16u + 0.6u*distma' pmsxt = 'npmsxt + 12.5
+u*dpmsxt' tih = 0.35u capct = '0.50u +
+ 0.13u*xdcapct' capcti = '0.55u + 0.13u*xdcapct'
+ m1t = '0.41u + 0.05u*xdm1t' m1ti = '0.36u + 0.05u*xdm1t'
+ m2t = '0.48u + 0.057u*dm2t' m3t = '0.4
+8u + 0.057u*dm3t' m4t = '0.48u + 0.057u*dm4t'
+ mtt = '0.48u + 0.057u*dmtt' qtt = '0.242u +
+0.0202u*dqtt' htt = '0.242u + 0.0202u*dhtt' m
+lt = '2.0u + 0.2u*dmlt' amt = '4.0u + 0.4u*dam
+t' e1t = '3.0u + 0.5u*de1t' e2t = '
+4.0u + 0.5u*xde1mat' mat = '4.0u + 0.4u*dmat'
+ m1m2t = '0.35u + 0.05u*dm1m2t'
Examine what is said, not who speaks  Silence betokens consent  Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco.  Rule 1 has a caveat!  Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
dnwrs = agauss('cnr_res/3',1,3)
An even loonier case... thanks, all, for the help. I think I'm going to have to go back to the original multiline source and see if these are more identifiable there.
 [reply] 

#! perl slw
use strict;
while( <DATA> ) {
m[(\w+)\s+=\s+'?(.+)'?]
and print "$1 : $2"
for split /\s{2,}(?=\w+\s+=)/, $_;
}
__END__
P:\test>junk
drsubc : agauss(0, 1, 3)
delm1 : 0 + 0.045u*distm1'
dnwrs : agauss('cnr_res/3',1,3)
delm2 : 0 + 0.07u*distm2'
delm3 : 0 + 0.07u*distm3'
delm4 : 0 + 0.07u*distm4'
delmt : 0 + 0.07u*distmt'
delml : 0.16u + 0.43u*distml'
delam : 0.32u + 0.86u*distam'
dele1 : 0 + 0.25u*diste1'
dele2 : 0 + 0.25u*diste2'
delma : 0.16u + 0.6u*distma'
pmsxt : npmsxt + 12.5u*dpmsxt'
tih : 0.35u
capct : 0.50u + 0.13u*xdcapct'
capcti : 0.55u + 0.13u*xdcapct'
m1t : 0.41u + 0.05u*xdm1t'
m1ti : 0.36u + 0.05u*xdm1t'
m2t : 0.48u + 0.057u*dm2t'
m3t : 0.48u + 0.057u*dm3t'
m4t : 0.48u + 0.057u*dm4t'
mtt : 0.48u + 0.057u*dmtt'
qtt : 0.242u + 0.0202u*dqtt'
htt : 0.242u + 0.0202u*dhtt'
mlt : 2.0u + 0.2u*dmlt'
amt : 4.0u + 0.4u*damt'
e1t : 3.0u + 0.5u*de1t'
e2t : 4.0u + 0.5u*xde1mat'
mat : 4.0u + 0.4u*dmat'
m1m2t : 0.35u + 0.05u*dm1m2t'
Examine what is said, not who speaks  Silence betokens consent  Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco.  Rule 1 has a caveat!  Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
Re: how to find what's not there with a regex?
by davidrw (Prior) on Aug 24, 2005 at 13:53 UTC

Maybe something like this (you could do %matches instead of @matches if desired, as well):
my @matches = $input =~ m/\b(\w+)\s+=( '.*?'( \S+)+)/sg;
Match the LHS and then the equals sign, and then either a singlequoted string or a sequence of single_spaceword sets. 
QM

