This is my answer to Ovid's Perl is C: my
entry to the 4th Obfuscated Perl Contest.
#include <sys/types.h>
#include <sys/stat.h>
#include <stdio.h>
#include <fcntl.h>
#define open(a,b) open(b,a)
#define $ARGV argv
#define $i i
#define x : /* aren't four #define way too much?
unshift @ARGV, $_ = $ARGV[0]; "*/
main(int argc, char *argv[]) { // "; {
int m=1, i[14]; char * pp; int p=-1;
int q, F=3; char * qq = "Hello\, world!\n";
i[12]=537463307; i[13]=3085; //,; $_ = "" if(length!=2);
if(m+-p?(argc>1&&!strcmp(argv[1],"-p"))?p+i? 1 : 1 x 0 x 0) {
printf(qq/*\bThe Perl Journal\n/#*/
); exit(0); }
qq="=;#"; argv[0][0]='\0'; memset(i,0,48);
$i[10]=($i[11]=(q/*\b/&&scalar@ARGV))-1;#*/=0) + argc)-1;
do{
if($i[11]<2) { $i[10]=1; q/*/&&*F=*STDIN;#*/=F=0;
} else { open(O_RDONLY, $ARGV[$i[11]-$i[10]]);//; *F=*O_RDONLY;
}
while(read(F, $i, 1)>0) {
++$i[4]^(q=/*.=,$_=$i);#*/0); pp=i;
$i[3]+=m=( *pp^0x0A)?/*\n=;#*/0:1; for(qq=&i[12];*qq;*pp^*qq++||
+(q=1));
if(m=/*[ \n\f\r\xB]=#*/q
) { if($i[1]){$i[$i[1]]++; $i[1]=0; }} else { $i[1]=2;}
}
if($i[1]){$i[$i[1]]++;};
printf("%7d %7d %7d %s\n",$i[3],$i[2],$i[4],$ARGV[$i[11]-$i[10]]);
close(F);
if($i[11]>2){for($i[1]=2;$i[$i[1]+4]+=$i[$i[1]];$i[1]++){$i[$i[1]]
+=0;};$i[1]=0;}
} while(--$i[10]);
if($i[11]>2) { printf("%7d %7d %7d total\n",$i[7],$i[6],$i[8]); }
}
You can name is wc.c and compile it as a C
program and a Perl program. The name says it all.
Be careful: one line contains an embedded tabulation.
(Ovid - Spoiler) Re: C is Perl
by Ovid (Cardinal) on Dec 23, 2001 at 01:24 UTC
|
<Samuel Jackson Voice>
Allow me to retort!
</Samuel Jackson Voice>
This code is brilliant++ and I am humbled. Since this is listed as a
follow-up to my post, I suppose it's only fitting that I post the spoiler :)
To understand what the program is supposed to do, try man wc.
Note, to write this spoiler, I used this this snippet:
perl -MO=Deparse wc.c > deparse.pl
When figuring out the code, I cleaned up the formatting and removed the "useless" code, so your output will be different from mine.
To get the C code, I used this:
gcc wc.c -E > wc.txt
Highlight the following section to see the code.
# note the comma. This will unshift the first command line
# argument onto @ARGV and set $_
unshift @ARGV, $_ = $ARGV[0];
# clear $_ if its length is not 2
$_ = '' if length $_ != 2;
# if the first argument is -p or -P, print The Perl Journal
if (/-p/i)
{
# \cH is a backspace
printf "*\cHThe Perl Journal\n";
exit 0;
}
# set $i[11] to the length of @ARGV (remember, this is one greater
# than the number of arguments) and set $i[10] to the original length
# (number of files on the command line
$i[10] = ($i[11] = scalar @ARGV) - 1;
# check the while{} at the end
# this will execute once for each file
do
{
# if we had no command line arguments, we need to set $i[10] to 1
+to ensure
# the loop will exit and we need to read our arguments from STDIN.
if ($i[11] < 2)
{
$i[10] = 1;
*F = *STDIN;
}
# if we had arguments, we want to open each file in turn. Note tha
+t O_RDONLY
# is a filehandle, not a constant. Also note that with the while a
+t the end
# of this loop, $i[10] is being decremented, so we can loop throug
+h the
# files this way.
else
{
open O_RDONLY, $ARGV[$i[11] - $i[10]];
*F = *O_RDONLY;
}
# read one byte at a time from the file, until EOF.
while (read(F, $i, 1) > 0)
{
# increment $i[4] by one (thus, this will be the file size)
++$i[4];
# set $_ to whatever byte we read
$_ = $i;
# if $_ is a newline, then the match will return 1, thus setti
+ng $i[3] to
# the number of lines in the file. ( *pp^0x0A) is superfluous
+ in the
# Perl program
# But it is used in the C program. My C is pretty rusty, but
+here goes:
# i[3] += m = ( *pp^0x0A ) ? 0 : 1;
# pp has been set to i, the last character read. Here, we
+do an XOR
# with 0x0A (a newline character). If any bits are set, we
+know it's
# not a newline, so m is set to 0, else m is set to 1. i[3
+] is
# incremented by m.
# The /* in the regex appears to be an artifact left over from
+ an
# embedded comment
# in the obfu: $i[3]+=m=( *pp^0x0A)?/*\n=;#*/0:1;
$i[3] += m[( *pp^0x0A)?/*\n];
#-------------------------------------
# The following section is rather confusing. It is a word count. It
+works by
# setting $i[1] to a true value when it encounters a white space chara
+ter and
# then incrementing $[2] by one when it encounters a non-whitespace ch
+aracter
# (whitespace as defined by the character class in the match)
# Again, we see the /* as an artifact from the original file:
# if(m=/*[ \n\f\r\xB]=#*/q
# Ff we match a space, newline, formfeed, carriage return, or cntl
+-B(?)
# I believe this is where the embedded tab should be, but on my sy
+stem,
# it was transformed to a space.
if (m[/*[ \n\f\r\xB]])
{
# if we've set $i[1], then we want to increment $i[2] by o
+ne
# and reset $i[1] to 0 (false)
if ($i[1])
{
++$i[$i[1]];
$i[1] = 0;
}
}
# if we didn't match, set $i[1] to 2 (which is the index of th
+e array
# element we wish to increment for counting the above characte
+rs). For
# the most part, this means, "set this variable if we have a n
+on-
# whitespace character"
else
{
$i[1] = 2;
}
}
#-------------------------------------
# if we got this far and $i[1] is true (it will be set to 2), then
+ we have
# an extra word that we didn't account for, so we add 1 to the wor
+d count
if ($i[1])
{
++$i[$i[1]];
}
# print number of lines, word count, file size, and the name of th
+e file
printf "%7d %7d %7d %s\n", $i[3], $i[2], $i[4], $ARGV[$i[11] - $i[
+10]];
close F;
# if we had more than one argument, we need to total the results
if ($i[11] > 2)
{
# This is setting $i[6] to $i[8] by adding whatever is in @i[2
+..4] and
# then resetting that value. When we get to $i[5], it's never
+ been set
# and is evaluated as zero. This causes the entire expression
# '$i[$i[1] + 4] += $i[$i[1]]' to return a zero, evaluating as
+ false and
# thus terminating the loop.
for ($i[1] = 2; $i[$i[1] + 4] += $i[$i[1]]; ++$i[1])
{
$i[$i[1]] = 0;
}
$i[1] = 0;
}
} while --$i[10];
# if we had more than one argument, we need to print the results
if ($i[11] > 2)
{
printf "%7d %7d %7d total\n", $i[7], $i[6], $i[8];
}
I'm not going to break it down, but if you know any C, this should be
relatively easy to follow by tracing through the above code. The logic
is the same (though there are a few parts that I don't get).
One difference is that when you pass it -p as the first argument, it
prints "Hello, world!\n" istead of "The Perl Journal".
#include <sys/types.h>
#include <sys/stat.h>
#include <stdio.h>
#include <fcntl.h>
main(int argc, char *argv[])
{
int m=1, i[14];
char * pp;
int p=-1;
int q, F=3;
char * qq = "Hello\, world!\n";
/* red herring? */
i[12]=537463307; i[13]=3085;
if (m+-p?(argc>1&&!strcmp(argv[1],"-p"))?p+i? 1 : 1 : 0 : 0)
{
printf(qq); exit(0);
}
qq="=;#"; argv[0][0]='\0';
memset(i,0,48);
i[10]=(i[11]=(q =0) + argc)-1;
do{
if(i[11]<2)
{
i[10]=1; q =F=0;
}
else
{
open( argv[i[11] - i[10]], 0 ) ;
}
while(read(F, i , 1)>0)
{
++i[4]^(q=0);
pp=i;
i[3] += m=(*pp^0x0A)? 0:1;
for(qq=&i[12];*qq;*pp^*qq++||(q=1));
if(m=q)
{
if(i[1])
{
i[i[1]]++;
i[1]=0;
}
}
else
{
i[1]=2;
}
}
if(i[1])
{
i[i[1]]++;
};
printf("%7d %7d %7d %s\n", i[3], i[2], i[4], argv[i[11]-i[10]]);
close(F);
if(i [11]>2)
{
for( i[1]=2;i[i[1]+4]+=i[i[1]];i[1]++)
{
i[i[1]]=0;
};i [1]=0;
}
} while(-- i [10]);
if(i[11]>2)
{
printf("%7d %7d %7d total\n",i [7],i [6],i [8]);
}
}
|
|
Cheers,
Ovid
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats. | [reply] [d/l] [select] |
|
To complete Ovid's very good explanation, I must says
he is absolutely right about the embedded tab. In fact it
was put at the 15th column, so that it looks
like a space.
Here is a line-by-line explanation, which is an excerpt
from the SOLUTION file I sent along with the program:
Most of the counting information is held in @i alias int
i[]... Here's
what it holds:
- 0: current char to test, and some other junk
- 1: 2 if in word, 0 otherwise
- 2: number of words in the current file
- 3: number of lines in the current file
- 4: number of chars in the current file
- 5: 0, so that the "add to the total" for loop can end
- 6: total number of words
- 7: total number of lines
- 8: total number of chars
- 9: 0, same as i[5]
- 10: number of files left to parse
- 11: number of command-line parameters (C's argc)
- 12: 4 tokens disguised as an int (537463307 is SPACE, TAB, LF, VT)
- 13: 2 tokens (and two '\0') disguised as an int (3085 is NUL, NUL, FF, CR)
- 13: 2 tokens (and two '\0') disguised as an int (202178560 is FF, CR, NUL,
NUL)
Note: i[13] can have two different values, depending on your hardware...
(specifically big or little endian... the included wc.c file is meant
for x86 CPUs)
The first four lines are #include ignored by Perl (and useless to
it).
The following #defines are ignored by Perl too, but a C comment starts
at the end of the last one, allowing Perl to start doing something: in
this case, we put an empty string in @ARGV, so that it's the same size
as argv. We also hide away the C declaration of main() in a string.
//
lets us fall back on our feets and start the real initialisation.
Line 11, some C variables are declared, while a useless match is performed,
and fed to int(), all in a void context.
Line 12 and 13 the same trick
is used, with a single quote this time. By the end of the line we make
sure $_ will hold the first command-line parameter, if it's not longer
than two characters (to test for equality with -p). Then we check that
equality, with a little bit of ???:::... Here is a commented
version:
if(m+-p?(argc>1&&!strcmp(argv[1],"-p"))?p+i? 1 : 1 x 0 x 0) {
Perl: ^-string to match -p, thanks to ? -^
the alternative for ?: is 1 or 1 x 0 x 0, that is to say 1 and ""
C: m+-p equals 2, so it's always true
argc>1&&!strcmp(argv[1],"-p") is true if the first param is -p
we don't care about the value of p+i (but it's probably true)
x being replaced by : by the preprocessor, we check for
T?something?T?1:1:0:0 which depending on 'something' equals the
third value if 'something' is false, and either the 1st or 2nd
values if 'something' is true.
To sum up, if the first parameter is -p, we print either
"Hello, world!"
or "The Perl Journal", and then exit.
While defining qq as the original C string, we hide the 6 chars
delimiting
words into i[12] and i[13]... By the way, if the C version of
the program
doesn't work for words for you, you might have to change i[13] from
3085
to 202178560... It depends on your endian's size.
Line 18, we store the total number of command line arguments (C meaning),
and the same number minus one, which is the total number of files to check.
Lines 19-33, the while loop checks each file. If there is no file, line 20
takes care of it, and prepare STDIN to be opened (F is made
equal to 0,
or *STDIN, depending on your view). NB: no file means argc==1
i.e. i[11]<2.
Line 21 we open the current file (thanks to the open C macro). Its file
descriptor is none other than 3. At least, it behaved so on the various
machines I tried.
Line 23, our while loop reads the file byte by byte (maybe I could improve
performance here, by using a bigger buffer...)
Line 24, one more byte. Plus, $_ is made equal to the current char,
as well
as the C pointer pp.
Then, line 25, we check for newline within the Perl match (in C, we use a
null xor for equality, while in Perl we use the match, which does what we
want, because the string is only one byte long, anyway...).
At the end of this line, the C for loop tests our current char against
the values stored in i[12] and i[13] (that's where the endian
comes into
play!). If so, we raise the q flag.
Line 26, we check our byte in Perl, with a regex. In C, we check the
flag.
So, line 27, we are in a word. We increment i[2] (that's the value
of i[1])
and unraise the "in word" flag (i[1]). Else, we raise the "in word"
flag.
When the file is entierely read, line 29, we check and count the last word,
if any. Line 30, we print the results (if our file was stdin, we made sure
its name was "" line 17...).
Then we close the file.
Line 32, the counts are added to the grand total. Thanks to i[5]
and i[9],
our for loop ends when we want.
Line 34, if we had more than one file, we print the grand total.
OK, there are some differences in behavior between these progs and the
original GNU wc... Mainly error handling when one of the files doesn't
exists.
| [reply] |
Re: C is Perl
by stefp (Vicar) on Dec 28, 2001 at 00:35 UTC
|
Un programme multilangage mérite un commentaire multilingue.
On tombe sur le C, puis on tombe sur le perl pour ensuite tomber sur le c.. But there is a missing PostScript. :)
-- stefp | [reply] |
|
|