Thank you everyone for the great help.
I ended up using CPD with very good result. Amazingly enough it even ran straight from the link to the java web start.
I was worried that any automated tool might have problems as the php also contains html and vml(ugh). But the output shows clearly that about 20 or so php files (significantly) have in common in the order of 100-150 lines of code in various (specified) places.
So after doing this dedupe, should cut another several thousand lines of code. Trying to get to a code base where it actually becomes maintainable by some mere mortal like myself or someone else. the code was all written by a single author.
the hardest line to type correctly is: stty erase ^H