Thanks for your recommendations !!
I modified a lot in my code, it is now about 6 times faster !:
* use vec to get the byte from the file string.
* removed all "." or ".=" operators in often used places
* coded position bytes new (use "&" instead of regex,binary strings and no loops)
* pre calculated "for loop of crc" and stored it into a look up table (256 Bytes).
I modified step by step. But: My version with arrays instead of using strings and vec is faster (e.g crc-look up table, no change of array size over the program). Could it be that this is different in win, which I use, and linux ?
In the Profile of -MDevel::NYTProf I can see a big time at the last statement of a subroutine. e.g. 3.8 seconds at the last statement ($fpos++;) of readByte() @ 1.700.000 calls. This time is nearly independent of the statement itself. Inserting a return makes it slower: 4s for the return. The statement is than rated as fast(200ms). Where does this time come from ? Following is not clear for me: I tried to insert the content of the subroutine into the code instead using this subroutine. This version is in the profiler 15% faster, but comparing this two versions without profiler, it shows less than 1% speed difference. So not worth doing it. Any idea what happens here ?