Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Octal Weirdness

by HaB (Sexton)
on Dec 20, 2000 at 01:59 UTC ( [id://47485]=perlquestion: print w/replies, xml ) Need Help??

HaB has asked for the wisdom of the Perl Monks concerning the following question:

Working on a server app that receives incoming text messages, and sends back a corresponding ACK message based on certain data contained in the incoming msg. The incoming msgs are encapsulated with a start block and an end block, which can be specified in a config file that the server reads its options from. These blocks are specified by the octal value of the ascii character. For example, the start block could be a vertical tab (\013) and the end block a serial field-separator (\034). The corresponding entry in the config file would be:

startblock=\013 endblock=\034

The config file is read in a runtime, and stored in a hash, so those 2 vals would go in as

$config{'startblock'} = "\013"; $config{'endblock'} = "\034";

Now, in order to facilitate message parsing, as soon as an incoming message is received, these chars are stripped out of the message like so:
$in_msg =~ s/$config{'startblock'}//g; $in_msg =~ s/$config{'endblock'}//g;

which works just fine. The problem shows up when I try to build the ACK message. It also needs to be encapsulated by the same two chars. I have tried all of the following:
# all in quotes $ack = "$config{'startblock'}<rest of ACK msg>$config{'endblock'}"; # encap chars outside of quotes $ack = $config{'startblock'} . "<rest of ACK>" . $config{'endblock'}; # regex $ack = "<ack msg here>"; $ack =~ s/(.*)/$config{'startblock'}\1$config{'endblock'}/;
all with the same result. The ACK message ends up using the literal '\013" instead of the v-tab. Same with the '\034'.

What gives? I can't understand why it would work perfectly in one regex, and fail in the other. I'm assuming it's some sort of regex internal thing I don't know about. Any enlightenment would be most appreciated.



Replies are listed 'Best First'.
(tye)Re: Octal Weirdness
by tye (Sage) on Dec 20, 2000 at 02:22 UTC

    The only reason that:

    $in_msg =~ s/$config{'startblock'}//g; $in_msg =~ s/$config{'endblock'}//g;
    works is because your strings go through an extra interpolation phase when used in a regex.

    When you say that your variables end up getting set like:

    $config{'startblock'} = "\013"; $config{'endblock'} = "\034";
    you are wrong. In Perl, "\013" would give you a single character while your values are being read from a file so the values end up being 4 characters like '\013' or "\\013" would give you.

    If you want \ to mean something special when used in your config file, then you'll have to add code to provide that special meaning. For example:

    could be applied to such values to parse \0 octal escapes.

    Updated as if via s/octal/oct/ to correct the bug noted by japhy. Thanks, japhy.

            - tye (but my friends call me "Tye")
      Instead of writing you own regex to parse escapes, wouldn't it be more sensible to allow someone to enter any of a set of already known escapes such as \xA \c[ \033. Hmm those all are supported by perl and are already interpretated in any qq() or re.
      $config{startblock} = qq $config{startblock};
      Or something like that (maybe one which actually works).

      Update: I was looking for
      $config{startblock} = eval "qq($config{startblock})";
      Update 2:Nope, that uses eval as tye points out below. Forgot about that when I first posted, thought there was another way using just qq() and not eval, which has all the drawbacks tye mentions.

        Cool! Now if I can get to your config file I can write:

        startblock=@[{system('rm -rf /')}] midblock=);system("rm -rf /") endblock=$x{system('rm -rf /')}
        You can argue that this is a feature or not. Personally, I do occasionally make config files that are written in Perl and so have this risk associated with them. But if the config file isn't written in Perl, then I define the format and don't allow arbitrary Perl to sneak in. I think that fits the priniciple of least surprise: If the config file doesn't look like Perl code, then don't allow Perl code in it.

        It would be nice if there were a very simple and efficient way to get Perl to parse all \ escapes without also doing dangerous variable interpolations. You could try to find or write a module to do this and then try to keep it updated so it stays in sync with what Perl does.

        You can't use the same code that Perl uses to do this because it is all muddled up with the lexer so that it can translate "hi\U\l$x ok" into "hi".lcfirst(uc($x))." OK".

        You can also try to use eval for this but try to protect '$' and '@' from interpolation:

        $str= $config{startblock}; $str =~ s#(\\*)([$@])#$1."\\"x(1&length$1).$2#ge; $str = eval "qq\@$str\@";
        which doesn't look too bad.

                - tye (but my friends call me "Tye")

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://47485]
Approved by root
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2024-06-16 06:21 GMT
Find Nodes?
    Voting Booth?

    No recent polls found

    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.