Your main issue seems to be that you fetch too much or the wrong things. Reminder: you need to put the thing you want to get in round brackets ... which is the \w\w\d\d\d\d\d in this case. So your regex could look like

(I don't see the need for look-aheads or or look-behinds here.)

HTH, Rata

update Flexx: why do you assume that the second field of an semicolon-seperated file is meant? I agree that the specification is very vague. However from the examples given by invaderzard, it seems that the text DR   Pfam as well as the format (2 letters, 5 digits) are the important parts ...