Re: Regex trouble w/ embedded 0s?

Replies are listed 'Best First'.
Re^2: Regex trouble w/ embedded 0s? by roboticus (Chancellor) on Jul 11, 2014 at 12:25 UTC
Laurent_R: I'm dismantling a large (5GB) binary file archive, and the first 36 bytes of each file entry is stuff I haven't determined the purpose of. Then comes the filename (variable length) and the data. The filename appears to be unicodey terminated by a 0, so it looks like: (letter, 0, letter, 0, ..., letter, 0, 0, 0). Since the filename is variable length, it felt like a regex would be the simplest to use to dismantle it. Normally when exploring things like this, I take things apart, and as I find the patterns, I improve the parsing. This file freely seems to mix binary, unicode and normal ASCII, I'm still thinking about how to dismantle it best. I also don't know much about the internal structure of the file yet, other than from a very gross overview. I could look it up on the 'net, but I like figuring stuff out as much as I can first before looking at the answer in the back of the book. ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply]
Re^3: Regex trouble w/ embedded 0s? by Laurent_R (Canon) on Jul 11, 2014 at 17:31 UTC
OK, roboticus, thank you for answering, I now understand your context.	[reply]
Re^2: Regex trouble w/ embedded 0s? by AnomalousMonk (Archbishop) on Jul 11, 2014 at 12:09 UTC
If you wanted to grab the first 36 chars from the start of a string, then grab the first subsequent group that was terminated by (and did not contain) a `\0\0\0` sequence, what regex would you use?	[reply] [d/l]
Re^3: Regex trouble w/ embedded 0s? by choroba (Cardinal) on Jul 11, 2014 at 12:40 UTC
I dunno, prolly `/^.{36,}?\0\0\0/` [download] لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re^4: Regex trouble w/ embedded 0s? by AnomalousMonk (Archbishop) on Jul 11, 2014 at 13:00 UTC
But that doesn't differentiate between the first 36 chars and the subsequent whatsit, and includes the `\0\0\0`, and needs the use of `$&` or a substr operation to access what was matched. My guess (supported by roboticus's later post) was that the chunks were wanted separately, sans terminator. Given that assumption, the regex didn't seem so strange. But there are many paths...	[reply] [d/l] [select]
Re^3: Regex trouble w/ embedded 0s? by Laurent_R (Canon) on Jul 11, 2014 at 17:38 UTC
If you wanted to grab the first 36 chars from the start of a string, then grab the first subsequent group that was terminated by (and did not contain) a \0\0\0 sequence, what regex would you use? Yes, AnomalousMonk, you are right, if I wanted to do that, I would probably use a regex very similar to what roboticus used. I was really wondering why he wanted to do something a bit strange like that, and he has not provided an answer which explains it all.	[reply]


Think about Loose Coupling
	PerlMonks