You dont need to use regular expressions to solve your problem.
An order of magnitude less CPU intensive to use index and substr. Make sure your XML tags are ALWAYS the same before choosing to use index. '<GENE>' and '<Gene>' and '< GENE>' are totally different to index.
$string = 'Furthermore , expression of <GENE> Vpu </GENE> in Jurkat T
+cells rendered them more susceptible to <GENE> Fas </GENE> - induced
$start = 0;
while (($beg = index($string, '<GENE>', $start)) > -1)
$end = index($string, '</GENE>', $start)+7;
substr($string, $beg, ($end-$beg), '');
$start = $end;
I do notice a double space left where the GENE tag used to be though.