use Benchmark qw(cmpthese timethese); $rex1=qr! (<[^/]([^>]*[^/>])?>)|(]*>)|(<[^>]*/>)| ( [\p{Lu}\p{Ll}\p{Lt}\p{Nd}\p{Nl}\p{No}\x{4e00}-\x{9fa5}\x{3007}\x{3021}-\x{3029}] (?: [-\p{Lu}\p{Ll}\p{Lt}\p{Nd}\p{Nl}\p{No}._:''\x{4e00}-\x{9fa5}\x{3007}\x{3021}-\x{3029}]* [\p{Lu}\p{Ll}\p{Lt}\p{Nd}\p{Nl}\p{No}\x{4e00}-\x{9fa5}\x{3007}\x{3021}-\x{3029}] )? )!x; $rex2=qr!(<[^/]([^>]*[^/>])?>)|(]*>)|(<[^>]*/>)| ((\p{Lu}|\p{Ll}|\p{Lt}|\p{Nd}|\p{Nl}|\p{No}|[\x{4e00}-\x{9fa5}]|\x{3007} |[\x{3021}-\x{3029}])((\p{Lu}|\p{Ll}|\p{Lt}|\p{Nd}|\p{Nl}|\p{No}|[-._:''] |[\x{4e00}-\x{9fa5}]|\x{3007}|[\x{3021}-\x{3029}])*(\p{Lu}|\p{Ll}|\p{Lt} |\p{Nd}|\p{Nl}|\p{No}|[\x{4e00}-\x{9fa5}]|\x{3007}|[\x{3021}-\x{3029}]))?)!x; $html=get_text(); # fix this! my $test={'rex1'=>'$count1=0; $count1++ while $html=~/$rex1/g;', 'rex2'=>'$count2=0; $count2++ while $html=~/$rex2/g;',}; timethese(-1,$test); cmpthese(-1,$test); print "$count1\n$count2\n"; sub get_text { return <<'EOFTEXT'; *** PUT SOME HTML/XML HERE *** EOFTEXT }