Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Code review: Start and End markers to ensure Serial data is valid

by stevieb (Canon)
on Aug 03, 2019 at 20:34 UTC ( #11103824=perlquestion: print w/replies, xml ) Need Help??

stevieb has asked for the wisdom of the Perl Monks concerning the following question:

Changing it up a bit here. I'm doing some prototype code that will eventually be translated into C/XS, but I'm having some brain fart moments here trying to sort out whether my logic is, actually logical.

Because the distribution isn't available yet, it's not test-worthy. What I'm looking for is eyes on the logic itself, not whether it works or not. What's supposed to happen is this:

caller sends in a start data marker ([), an end data marker (]), a digit that represents the number of times the start/end markers should be seen before identifying we have data (3), and finally, a test "reset" identifier (!) for the transmitter to say "wait, reset that crap, let's start over"

I've got it working in basic test cases, but I'm wondering if some keen-eyed Monks can have a glance at the situation (I've left everything as simple-to-understand if() statements for this review) to see how I could expand on the assurance aspect of ensuring the receiver only gets good data, and continues to work if not.

Receiver (rx):

use warnings; use strict; use RPi::Serial; my $s = RPi::Serial->new('/dev/ttyUSB0', 9600); my $data; my ($rx_started, $rx_ended) = (0, 0); while (1){ if ($s->avail){ my $data_populated = rx('[', ']', 3, '!'); if ($data_populated){ print "$data\n"; rx_reset(); } } } sub rx { my ($start, $end, $delim_count, $rx_reset) = @_; my $c = chr($s->getc); # getc() returns the ord() val on a char* p +erl-wise if ($c ne $start && ! $rx_started == $delim_count){ rx_reset(); return; } if ($c eq $rx_reset){ rx_reset(); return; } if ($c eq $start){ $rx_started++; return; } if ($c eq $end){ $rx_ended++; } if ($rx_started == $delim_count && ! $rx_ended){ $data .= $c; } if ($rx_started == $delim_count && $rx_ended == $delim_count){ return 1; } } sub rx_reset { $rx_started = 0; $rx_ended = 0; $data = ''; }

Sender (tx):

use warnings; use strict; use RPi::Serial; my $s = RPi::Serial->new('/dev/ttyS0', 9600); for (qw(x [[[hello]]] ! [[[world]]] a b)){ # x = no start marker seen yet, reset # [[[ = start data ok # hello = data # ]]] = end data ok # ! = RX reset command # [[[ = start data ok # world = data # ]]] = end data ok # a = no start marker set, reset # b = no start marker set, reset $s->puts($_); }

Works fine in such a situation, here's the output after starting rx in one window, and running tx in another window after that:

$ perl examples/rx.pl hello world

I apologize for the lack of a testable example, but this is a thought-process-only kind of post I'm trying to get ideas from by eye at this time.

Thanks all,

-stevieb

Replies are listed 'Best First'.
Re: Code review: Start and End markers to ensure Serial data is valid (updated)
by haukex (Chancellor) on Aug 03, 2019 at 21:22 UTC

    This is a really classic use for state machines. The more variables and code paths you introduce, the more complex your logic gets and the more cases you have to cover. Instead, the really basic state machine structure looks either like this:

    my $state = 'idle'; while ( my $input = <> ) { if ( $state eq 'idle' ) { if ( $input =~ /FOO/ ) { $state = 'foo'; } elsif ( $input =~ /BAR/ ) { $state = 'bar'; } elsif ( $input =~ ... else { die "unknown input: $input" } } elsif ( $state eq 'foo' ) { if ( $input =~ /FOO/ ) { $state = 'foo'; } elsif ( $input =~ ... else { die "unknown input: $input" } } elsif ( $state eq ... else { die "internal error: bad state $state" } }

    Or like this:

    my $state = 'idle'; while ( my $input = <> ) { if ( $input =~ /FOO/ ) { if ( $state eq 'idle' ) { $state = 'foo'; } elsif ( $state eq 'foo' ) { $state = 'foo'; } elsif ( $state eq ... else { die "internal error: bad state $state" } } elsif ( $input =~ /BAR/ ) { if ( $state eq 'idle' ) { $state = 'bar'; } elsif ( $state eq ... else { die "internal error: bad state $state" } } elsif ( $input =~ ... else { die "unknown input: $input" } }

    Here's a solution to parsing your example data that uses only the state variables $state and $curdata to hold state, with output being temporarily stored in @outdata. I didn't implement the handling of the []3 sequence or !, since those can be implemented as extensions of this pattern.

    Note that it takes a bit of discipline to stick to this pattern and to remember to cover all cases in all states. For example, if you really are making the number of opening and closing brackets variable, you'll need two more state variables: one to store the expected number of brackets, and one to count the number of seen brackets. You'll need to remember reset these variables in all the right places. <update> In fact, it'd be the most formal approach to always cover every input in every state (or every state with every input), and to set all the state variables in every branch of the code, and think about what the output should be for every branch of the code. Of course, that gets quite redundant, and if a variable hasn't changed, it doesn't need to be re-set. But the takeaway here is that one at least needs to think about it, and sticking with the formal structure above helps in doing so. </update>

    In the following code, I've opted to make three different start states and two different end states, which may seem wasteful, but often encoding a couple of extra bits into the state is the "safer" way to code a state machine, instead of adding yet another state variable (and if you draw out a state diagram, it'll be easier to understand). If all the string comparisons bother you, then use constants and use integer comparisons on $state.

    use warnings; use strict; use Test::More tests=>2; is_deeply parse_data('x[[[hello]]]![[[world]]]ab'), ['hello','world']; is_deeply parse_data('x[y[[z[[[[[[[[[foo]1]]2]]]3'), ['[[[[[[foo]1]]2']; sub parse_data { my @indata = split //, shift; ; my $state = 'await_start1'; my $curdata = undef; my @outdata; for my $c (@indata) { if ( $state eq 'await_start1' ) { if ( $c eq '[' ) { $state = 'await_start2' } } elsif ( $state eq 'await_start2' ) { if ( $c eq '[' ) { $state = 'await_start3' } else { $state = 'await_start1' } } elsif ( $state eq 'await_start3' ) { if ( $c eq '[' ) { $state = 'in_data'; $curdata = '' } else { $state = 'await_start1' } } elsif ( $state eq 'in_data' ) { if ( $c eq ']' ) { $state = 'got_end1' } else { $curdata .= $c; $state='in_data' } } elsif ( $state eq 'got_end1' ) { if ( $c eq ']' ) { $state = 'got_end2' } else { $curdata .= ']'.$c; $state='in_data' } } elsif ( $state eq 'got_end2' ) { if ( $c eq ']' ) { push @outdata, $curdata; $curdata = undef; $state = 'await_start1'; } else { $curdata .= ']]'.$c; $state='in_data' } } else { die "internal error: bad state $state" } } return \@outdata; }

    In general, when designing serial protocols, there are two routes to take: packet-based and stream-based. Packet-based protocols usually transmit a header that often includes a fixed start sequence (so that synchronization can be reacquired if a byte is lost or inserted) plus length byte(s) (depending on the maximum packet size), and a checksum at the end. Stream-based protocols, like this one, usually just include some kind of start sequence (like your [[[), an end sequence, and hopefully a checksum at the end. Personally I also include line-based protocols in this category, they're just a slight variation on the stream idea. Stream-based protocols have the issue that the start sequence and any other special bytes need to be escaped (which is probably why you've made the number of [s variable?), while packet-based protocols don't have that issue - instead, they have the problem that if transmitted over a system where bytes can be dropped or inserted, reacquiring synchronization in such cases takes more code, although state machines can handle that too, of course. Note that there are of course other ways to escape data than to wrap a bunch of [[[]]]s around it - think quoted strings, for example :-)

    Update: Minor edits for clarity. Also, if your goal, as the title says, is to ensure the data is transmitted correctly, then you really should use CRCs. For short pieces of data over mediums that are unlikely to corrupt the data, even something as simple as a CRC-8 is better than nothing.

    Update 2: A couple more edits in the text with some more thoughts.

Re: Code review: Start and End markers to ensure Serial data is valid
by roboticus (Chancellor) on Aug 04, 2019 at 17:30 UTC

    stevieb:

    At first, I was going to give a reply somewhat like haukex's, but use a table-driven state machine, but as haukex already mentioned state machines with implementations, I thought I'd just talk about how I try to ensure my state machine code has as few error(s) as possible. Heh, I intended to make it short, but it still rambled on a bit longer than expected. 8^P

    First, even for a simple state machine, I'll sketch it out on a piece of paper with plenty of whitespace, as I tend to draw/erase a good bit while refining it. I'll also make the nodes kinda big with some partitions in them, like this:

    +-----------------------+ | Node_Name (notes) | +-------+--------+------+ | Enter | Remain | Exit | +-------+--------+------+

    Generally, I try to keep the Enter/Remain/Exit boxes empty, but those are the boxes I use to document special actions to take when entering the node from a different node, when remaining in the same node after consuming an event, or to perform immediately before exiting the node. Your code gets complicated if you have too many actions occurring at various times. So I start off by putting in the actions I expect to need where I think I'll need them. As I refine the machine, I'll try to remove things or move them to places that are more consistent with the rest of the machine to simplify implementation. I also allow edges to have actions, which I'll put in parenthesis, if I need them.

    So I tried to draw a machine for your problem and I got this far:

    +--------------------+ +--------------------+ | Idle |--[BEG]->| Open | +------+------+------+ +------+------+------+ | | | |<-[RST]--|cnt=1 | | | +------+------+------+ +------+------+------+ ^ ^ ^ | ^ | | | +-[RST]-+ [eps:cnt==3] | [BEG(++cnt)] [RST] [eps:cnt==3] | | | [END(--cnt)] | (trim,publish) | | | | | | | | +-----+ +--------------------+ | v | Close | | +--------------------+ +------+------+------+ +--| Data | |cnt=1 | | |<-[END]--+-----+-------+------+ +------+------+------+ | |addChar| | | ^ +-----+-------+------+ [END(--cnt)] | [BEG(++cnt)] | [else] | | | +---------+

    (Whew! ASCII art is difficult to draw as well as ugly. I should really try to start learning the Perlmonks codebase and try to implement my dream extension: a *limited* SVG handler to allow simple text/line/box/circle drawings.)

    During and after drawing a machine, I like to imagine what the problems are at each possible state, and look for edges that aren't handled to see what implications arise. Thinking about the variables we want to use to capture the state and looking at what cases we want to handle when examining each edge, you can come up with various questions and/or test cases to think about. A few I came up with are:

    • Are the BEG/END delimiters allowed to be inside the data?
      "[[[data[more[data]]]" ==?== "datamoredata"
      "[[[data[[[]data]]]" ==?== ( "data[[[]data" or "data[[[data" )
    • Do the BEG/END delimiters allow you to escape-out noisy junk?
      "[[[data]junk[data]]]" ==?== "datadata"

    After coming up and deciding on what to do about all the special cases, I then start to review the machine to see how to simplify it. By this, I don't necessarily mean to reduce the number of states and/or edges (though if possibilities arise, I'll certainly try to do so). Rather, I want to simplify the implementation of the machine.

    Having actions occurring in various places in the machine makes implementation confusing. For example, I have actions occurring on edges, and in the onEntry, and onRemain slots on some nodes. Sometimes a bit of thinking can let you move some actions around to reduce the number of places you have to think about actions.

    Another thing in a state machine that can cause complexity are "special" edge types, such as the "else" edge type (meaning anything but END/BEG on the Close state) and the epsilon nodes that activate if we're in a state (such as Open/Close) and a condition happens to be true.

    Sometimes you can simplify the code by adding edges/nodes to the state machine: it may make the machine a bit larger, but can let you put actions in more consistent locations and/or let you remove some custom edge types you may otherwise need.

    Ah, well, enough rambling on this. I've got a couple Raspberry Pi computers to bring on-line. BTW: Since you're the Perlberry Pi guy, is there a hardcopy option available or coming soon for "Programming the Raspberry Pi with Perl"? I'm not really an eBook sorta guy, I'd rather buy something I can put on the bookshelf. (Yeah, I frequently print PDFs so I can tuck 'em in my clipboard and read 'em at lunch or whatever.)

    Edit: Added code tags to test cases to prevent linkification.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: Code review: Start and End markers to ensure Serial data is valid
by Anonymous Monk on Aug 04, 2019 at 21:55 UTC
    Before you bother to re-invent a serial communication protocol, have you thoroughly examined every one that already exists? Didn't think so.
      "Before you bother to re-invent a serial communication protocol, have you thoroughly examined every one that already exists?"

      All? No. I've researched several different techniques because although UART does have standard protocol, there's no one defined packet protocol. Also no, I don't have the time to "thoroughly examine(d) every one that already exists".

      Also, my question was an extremely high-level one looking for base ideas so that I can consolidate a packet protocol that I can use interchangeably with I2C, SPI, UART and the custom parallel hardware interface I'm designing into a PDIP IC without redesigning (or, as you say, re-inventing) TCP that'll work on my Linux/Unix systems, as well as on micro-controller devices with as small as 2KB of RAM, so I'm not looking for an already-created exact solution.

      So, your "Didn't think so" is my "I've designed a multi-I2C-bus bit-bang technology for the Arduino/ESP that I'm currently porting to the Pi/Linux, so re-inventing helps me when need be".

      Re-inventing a wheel isn't necessarily a bad thing; I've done it many times, as have all experienced developers/engineers. Reverse engineering and re-engineering something is a valuable learning exercise, and sometimes, the re-invention is better than the original :)

        I can consolidate a packet protocol that I can use interchangeably with I2C, SPI, UART and the custom parallel hardware interface I'm designing into a PDIP IC without redesigning (or, as you say, re-inventing) TCP that'll work on my Linux/Unix systems, as well as on micro-controller devices with as small as 2KB of RAM, so I'm not looking for an already-created exact solution.

        At work, we have a protocol that should fit your needs. Unfortunately, it is not available for the public. But I think I can give some hints:

        • The protocol is actually a very tiny protocol stack.
        • At the lowest level for transport over RS232, RS485, and similar, we have a message transport layer that uses some selected byte values to indicate start of message, end of message, and escape to allow transporting these three bytes without confusing the receiver.
        • When using SPI, the CS line is used to indicate start and end of message, no start/end/escaping bytes needed.
        • When using UDP/IP, we use a UDP packet per message, also no start/end/escaping bytes needed.
        • We haven't used a parallel port so far, but assumung something like an old-fashioned PC parallel port, we could either use a spare status / control line to mark a message frame (like CS does for SPI), or we could simply use the parallel port as a byte transport layer like RS232, and use start, stop, and escape byte.
        • There is also an IC transport, specified but not yet implemented, very similar to SPI.
        • TCP/IP and Bluetooth transports emulate a simple serial port.
        • New message transport layers can be added as needed.
        • On top of that message transport layers, we have a fixed-size header with some addressing, length, and optionally flags, an optional checksum and an optional payload. Part of the addressing is used to address devices on a bus, if we use a bus. Another part of the addressing is used to access various functions in software. In this respect, the message layer is VERY similar to IC, and IC was one of the main inspirations for the entire protocol.
        • Our protocol allows to transport a payload containing an entire message, so you can transport messages through a chain of several connected devices in a hierarchy (a tree of nodes).
        • Usually, we use a master-slave setup, but the protocol can also be used in a peer-to-peer setup if you have a full-duplex communication path.
        • Using some standardized messages, we can detect and identify devices on a bus, even hot-plugged.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11103824]
Approved by johngg
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (4)
As of 2020-05-31 17:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If programming languages were movie genres, Perl would be:















    Results (175 votes). Check out past polls.

    Notices?