Re: Determing what part of a regex matched.

by blakem (Monsignor)
on Mar 07, 2003 at 10:25 UTC ( #241101=note: print w/replies, xml ) Need Help??

in reply to Determing what part of a regex matched.

Here is how I would tokenize it... note that \d is a subset of \w, so any tokenizer that uses both is probably broken.
#!/usr/bin/perl -wT use strict; my $text = 'The world is foo 2!'; my (@words,@numbers,@spaces,@others); while((pos($text)||0) ne length($text)) { if ($text =~ /\G([a-zA-Z_]+)/gc) { push @words, $1; # or call whatever handler you want } elsif ($text =~ /\G(\d+)/gc) { push @numbers, $1; } elsif ($text =~ /\G(\s+)/gc) { push @spaces, $1; } elsif ($text =~ /\G([^\w\s]+)/gc) { push @others, $1; } else { warn "tokenizer is broken\n"; } } print "W: @words\n"; print "N: @numbers\n"; print "S: @spaces\n"; print "O: @others\n"; __END__ W: The world is foo N: 2 S: O: !


