Laurent_R:
I'm dismantling a large (5GB) binary file archive, and the first 36 bytes of each file entry is stuff I haven't determined the purpose of. Then comes the filename (variable length) and the data. The filename appears to be unicodey terminated by a 0, so it looks like: (letter, 0, letter, 0, ..., letter, 0, 0, 0). Since the filename is variable length, it felt like a regex would be the simplest to use to dismantle it.
Normally when exploring things like this, I take things apart, and as I find the patterns, I improve the parsing. This file freely seems to mix binary, unicode and normal ASCII, I'm still thinking about how to dismantle it best. I also don't know much about the internal structure of the file yet, other than from a very gross overview. I could look it up on the 'net, but I like figuring stuff out as much as I can first before looking at the answer in the back of the book.
...roboticus
When your only tool is a hammer, all problems look like your thumb.
| [reply] |
OK, roboticus, thank you for answering, I now understand your context.
| [reply] |
If you wanted to grab the first 36 chars from the start of a string, then grab the first subsequent group that was terminated by (and did not contain) a \0\0\0 sequence, what regex would you use?
| [reply] [d/l] |
/^.{36,}?\0\0\0/
| [reply] [d/l] |
But that doesn't differentiate between the first 36 chars and the subsequent whatsit, and includes the \0\0\0, and needs the use of $& or a substr operation to access what was matched. My guess (supported by roboticus's later post) was that the chunks were wanted separately, sans terminator. Given that assumption, the regex didn't seem so strange. But there are many paths...
| [reply] [d/l] [select] |
If you wanted to grab the first 36 chars from the start of a string, then grab the first subsequent group that was terminated by (and did not contain) a \0\0\0 sequence, what regex would you use?
Yes, AnomalousMonk, you are right, if I wanted to do that, I would probably use a regex very similar to what roboticus used. I was really wondering why he wanted to do something a bit strange like that, and he has not provided an answer which explains it all.
| [reply] |