Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^4: Match all Non-0 and Letters

by haukex (Archbishop)
on Jun 27, 2017 at 08:19 UTC ( [id://1193667]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Match all Non-0 and Letters
in thread Match all Non-0 and Letters

This is certainly a reasonable guess

Yes, it's just a guess, but I felt I wanted to provide a possible alternative to the guess that the OP doesn't know their specifications and doesn't know what hex is.

anomalies, likewise, could now be tackled in the context of that now-successfully-decoded integer (not text ...) data stream. In general, does not make good sense to me to attack the file with regular expressions

Well, in my hypothetical situation of a serial data stream corrupted by noise, unfortunately decoding into integers first and then inspecting those integers for bad values won't work. The reason is that the corruption on such streams can include bytes inserted or dropped, meaning that it's entirely possible that none of the incoming data is aligned on 32-bit boundaries. In such a case, one needs a state machine to reacquire synchronization with the data stream, so actually in this case Perl's regular expressions are a decent tool for that job. Note how none of the valid values in the following stream are aligned on 4-byte boundaries:

my $datastr = "BEEF00000001AB0000000200000700000003F00D"; print "$_\n" for $datastr=~/0{7}[0-9]/g; __END__ 00000001 00000002 00000003

Replies are listed 'Best First'.
Re^5: Match all Non-0 and Letters
by anonymized user 468275 (Curate) on Jun 27, 2017 at 14:09 UTC
    I asked the question 'are you sure it's corruption', (yet to be answered by OP) because Occam's razor makes the proposition that the data is always hex more likely than the OP notion that the correct 'uncorrupted' format should be 8 decimal digits with leading zeros - the latter proposition would require a bizarre explanation (weirdly written COBOL program?) to say the least without even getting into how a hypothetical hex gremlin performed the alleged corruption on top of that.

    (Occam's razor: that the simplest of competing theories be preferred to the more complex or that explanations of unknown phenomena be sought first in terms of known quantities.)

    One world, one people

      the latter proposition would require a bizarre explanation to say the least

      What you call "bizarre" is in my experience completely normal. I myself would not design a data format in this way, but have worked with plenty of binary data formats that do make somewhat strange choices like for example storing a value from 0 to 9 in a 32-bit field. Just a month ago I finished implementing a driver for a proprietary network protocol that, among other things, has a "flag" field in which only the lowest 3 bits are used, which is 32 bits wide. As for how the corruption might have gotten there I already explained a possibility, which again, in the ECE world is, despite being avoidable, unfortunately still completely normal.

      So as I said, given that the OP seemed to be clear on the expected format, I just wanted to provide a different perspective for the explanation.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1193667]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2024-04-20 00:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found