Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Regex for replacing hidden character: « and L

by neversaint (Deacon)
on Mar 18, 2006 at 04:46 UTC ( [id://537639]=perlquestion: print w/replies, xml ) Need Help??

neversaint has asked for the wisdom of the Perl Monks concerning the following question:

Dear Masters,
I have a exectuable that I capture as array in Perl. The construct looks like this:
my @output = `./some_executable.out -f $some_param`; my %nhash; foreach my $output (@output) { chomp($output); my ( $tm, $val ) = split( " ", $output ); print "BEFORE: $tm\n"; # this regex is not robust because it can't capture « $tm =~ s/L//; print "AFTER: $tm\n"; $nhash{$tm} = $val; }
Now, the problem is that in various Linux box the string ($tm) may have different endings:
# sometime this: $tm = 'fooL'; # other time this: $tm = 'foo«';
depending on CPU I'm using. My question are:
  • is there a universal metacharacter, I can use to remove this sort of line ending?
  • what is the cause of this kind of different endings?


---
neversaint and everlastingly indebted.......

Replies are listed 'Best First'.
Re: Regex for replacing hidden character: « and L
by davido (Cardinal) on Mar 18, 2006 at 05:13 UTC

    It would be very difficult to guess what the cause of '<<' and 'L' as line endings might be, without knowing what utility is generating these endings. The most common line ending differences don't involve "L" and "<<", and are particular to differences in how Windows, Linux/Unix, and Macs end their lines. But you said that all the various line ending varieties are being seen on different linux boxes, so that eliminates the Windows and Mac interference.

    There isn't a universal metacharacter that matches the letter "L" and the character "<<" (sorry for my use of << instead of your nifty character; my keyboard doesn't have that key and I don't know its ordinal value or HTML entity name), while excluding all other characters. But Perl's regular expressions allow you to create your own character classes, and you can even build your own characters based on ordinal values. For example, if the '<<' character has an ordinal value of 1b (in hex. ...and it doesn't, but just for example...), you could substitute it and 'L' out of existance like this:

    s/[L\x1b]//;

    Dave

Re: Regex for replacing hidden character: « and L
by GrandFather (Saint) on Mar 18, 2006 at 05:18 UTC

    Almost always the answer to the sort of probelm involves line end sequence characters. « is character 171 (0xab) and it is not at all clear to me how the normal line end characters (\n and \r) could mutate into that unless something is translating all unprintable characters into that. Assuming that to be the case:

    Perl is expecting the line seperator character sequence to be a new line character ("\n"). If the application you are running is generating output for Windows or a Mac, or for various other purposes (network output for example), the line end may be a sequence of "\r\n" (for Windows and network) or "\r" for a Mac. I think it most likely that the odd character you are seeing is a "\r" which you could handle as:

    $tm =~ s/[L\r]//;

    However I doubt that you really want an 'L' in there so most likely what you want is:

    $tm =~ s/\r//;

    DWIM is Perl's answer to Gödel
      If we ever see a day when there are line ending incompatibility occurring even between different versions of linux, ACK! In this case, it looks as though the same script is being run on multiple linux machines, but the odd character business is being seen on the box on which the command is run. If so, I sure hope that is not a newline problem. In neversaint's script he splits on whitespace and $tm is at the beginning of the thing being split, so this might make you wonder. :)

      neversaint, if you always have to remove the last character of $tm, maybe you should just chop $tm rather than fuss with removing the char via s///.

      Update: wording change in first paragraph

Re: Regex for replacing hidden character: « and L
by graff (Chancellor) on Mar 18, 2006 at 22:56 UTC
    Based on your snippet (as correctly pointed out in an earlier reply), it looks like line-termination characters have nothing at all to do with your problem (I don't understand why other replies have been dwelling on this I'm guessing the first two replies were thrown off by your reference to "line endings" -- d'oh!).

    To figure out what is going on, you might try something like this on your various linux boxes:

    ./some_executable.out -f some_param | od -txC -a
    (Look up the man page for "od" to see if other options might make its output more useful to you.)

    Is the executable copied as a binary file from one linux box to the other, or is it compiled from source code on each box? If the latter, is it compiled with the same configuration each time, or do things like libraries, locale, etc, differ from one box to the next?

    Whatever it is, that executable is printing one or more lines of output, your script is reading each line as an array element, you are removing the line-feed from each line, and then assigning the first two space-separated tokens on the line to $tm and $val.

    And you're finding some goofy character at the end of that first token ($tm), which you'd like to remove, no matter what it may be;  chop $tm should work fine for that (as pointed out by virtualsue above).

    But you'll want to look at it carefully, in detail. If the executable is returning some sort of multi-byte wide character, you might need something like "od" to see all the bytes. Come back with more detail if you get stumped, but hopefully, "od" will make things clearer for you.

    (Updated to clarify that I wasn't implying any misunderstanding on virtualsue's part.)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://537639]
Approved by idsfa
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-04-24 07:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found