Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

failing to use getdents system call on Linux

by glasswalk3r (Friar)
on Nov 02, 2016 at 13:56 UTC ( [id://1175138]=perlquestion: print w/replies, xml ) Need Help??

glasswalk3r has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

Some time ago I wrote using Linux getdents syscall and after some help I was able to get a system call to the getdents to work on Linux (worth to mention that the trick is restrict to files over NFS).

Everything was working fine until I recently tried again to use it: binary data returned is not following anymore the expected pattern (using unpack for that) and I don't know why. The pattern was created by following the getdents documentation and I'm running in two problems:

  1. I'm not getting the dentry name (most important thing)
  2. I'm not getting the dirent related length, so I'm running in a infinite loop

I'm not sure if it is a issue with h2ph usage (I just went over /usr/include and got everything from there) or with the interpreter itself.

Trying to debug, I went over the perl code and C (working) code and forced both to write to binary files (open with :raw in Perl and fopen/fwrite in C) and compared the output of hexdump from both:

From C:

$ head hardlist.txt 00000000 42 ed 09 00 00 00 00 00 01 00 00 00 00 00 00 00 |B........ +.......| 00000010 18 00 2e 00 00 00 00 04 e2 6c 08 00 00 00 00 00 |......... +l......| 00000020 0d 8e 0e 00 00 00 00 00 18 00 2e 2e 00 00 00 04 |......... +.......| 00000030 cb ed 09 00 00 00 00 00 a1 07 1e 00 00 00 00 00 |......... +.......| 00000040 e2 6c 08 00 00 00 00 00 0d 8e 0e 00 00 00 00 00 |.l....... +.......| 00000050 18 00 2e 2e 00 00 00 04 cb ed 09 00 00 00 00 00 |......... +.......| 00000060 a1 07 1e 00 00 00 00 00 20 00 78 61 61 61 61 61 |........ +.xaaaaa| 00000070 61 61 61 66 66 00 00 08 54 f0 09 00 00 00 00 00 |aaaff...T +.......| 00000080 cb ed 09 00 00 00 00 00 a1 07 1e 00 00 00 00 00 |......... +.......| 00000090 20 00 78 61 61 61 61 61 61 61 61 66 66 00 00 08 | .xaaaaaa +aaff...|

From Perl

head perldata.txt 00000000 42 ed 09 00 00 00 00 00 01 00 00 00 00 00 00 00 |B........ +.......| 00000010 18 00 2e 00 00 00 00 04 e2 6c 08 00 00 00 00 00 |......... +l......| 00000020 0d 8e 0e 00 00 00 00 00 18 00 2e 2e 00 00 00 04 |......... +.......| 00000030 cb ed 09 00 00 00 00 00 a1 07 1e 00 00 00 00 00 |......... +.......| 00000040 20 00 78 61 61 61 61 61 61 61 61 66 66 00 00 08 | .xaaaaaa +aaff...| 00000050 54 f0 09 00 00 00 00 00 f3 a6 28 00 00 00 00 00 |T........ +.(.....| 00000060 20 00 78 61 61 61 61 61 61 61 62 64 75 00 00 08 | .xaaaaaa +abdu...| 00000070 f7 f0 09 00 00 00 00 00 c8 13 67 00 00 00 00 00 |......... +.g.....| 00000080 20 00 78 61 61 61 61 61 61 61 62 6a 77 00 00 08 | .xaaaaaa +abjw...| 00000090 c5 ee 09 00 00 00 00 00 6e 5c 6d 00 00 00 00 00 |........n +\m.....|

The filenames in both cases are xaaaaaaaaff, so the filenames as coming in both cases, but not coming in the expected sequence in the Perl code. This is a sample of files under the directory:

[me@localhost ~]$ ls -l sample | head | cut -d ' ' -f 10 xaaaaaaaaaa xaaaaaaaaab xaaaaaaaaac xaaaaaaaaad xaaaaaaaaae xaaaaaaaaaf xaaaaaaaaag xaaaaaaaaah xaaaaaaaaai

I can see the same results of hexdump from the Perl debugger (using x command or Data::HexDump) on the $buf scalar, so my guess I'm missing something over there.

I tried running those tests both with the standard perl shipped with the Linux distribution and with v5.24.0 compiled with perlbrew. Results are the same.

Here is both the C and Perl code used for testing:

C code:

#define _GNU_SOURCE #include <dirent.h> /* Defines DT_* constants */ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <sys/stat.h> #include <sys/syscall.h> #define handle_error(msg) \ do { perror(msg); exit(EXIT_FAILURE); } while (0) struct linux_dirent { long d_ino; off_t d_off; unsigned short d_reclen; char d_name[]; }; #define BUF_SIZE 1024*1024*5 int main(int argc, char *argv[]) { int fd, nread; char buf[BUF_SIZE]; struct linux_dirent *d; int bpos; char d_type; FILE *fn; fd = open(argc > 1 ? argv[1] : ".", O_RDONLY | O_DIRECTORY); fn = fopen("/home/alcjunio/hardlist.output", "w"); if (fd == -1) handle_error("open"); for ( ; ; ) { nread = syscall(SYS_getdents, fd, buf, BUF_SIZE); if (nread == -1) handle_error("getdents"); if (nread == 0) break; for (bpos = 0; bpos < nread;) { d = (struct linux_dirent *) (buf + bpos); if (d->d_ino != 0) { printf("%s\n", (char *) d->d_name); fwrite(d,sizeof(d),sizeof(d),fn); } bpos += d->d_reclen; } } fclose(fn); exit(EXIT_SUCCESS); }

Perl code:

#!/usr/bin/env perl use warnings; use strict; use File::Spec; use Getopt::Std; use Fcntl; use constant BUF_SIZE => 4096; require 'syscall.ph'; my %opts; getopts( 'd:', \%opts ); die 'option -d <DIRECTORY> is required' unless ( ( exists( $opts{d} ) ) and ( defined( $opts{d} ) ) ); sysopen( my $fd, $opts{d}, O_RDONLY | O_DIRECTORY ); open(my $out,'>:raw','perldata.bin') or die $!; while (1) { my $buf = "\0" x BUF_SIZE; my $read = syscall( &SYS_getdents, fileno($fd), $buf, BUF_SIZE ); if ( ( $read == -1 ) and ( $! != 0 ) ) { die "failed to syscal getdents: $!"; } print $out($buf); close($out); # last; # from this point forward, this will enter in a infinite loop beca +use $len is being set with 0 last if ( $read == 0 ); while ( $read != 0 ) { my ( $ino, $off, $len, $name ) = unpack( "LLSZ*", $buf ); unless ( ( $name eq '.' ) or ( $name eq '..' ) ) { my $path = File::Spec->catfile( $opts{d}, $name ); print $path,"\n"; } substr( $buf, 0, $len ) = ''; $read -= $len; } }

And here is the output from strace by running both programs (only after opening the directory:

From C program:

open("/home/alcjunio/hardlist.output", O_WRONLY|O_CREAT|O_TRUNC, 0666) + = 4 getdents(3, /* 1002 entries */, 5242880) = 32048 fstat(1, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0 ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fffd778f920) = -1 ENOTTY (I +nappropriate ioctl for device) mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, +0) = 0x7f696e37b000 fstat(4, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, +0) = 0x7f696e37a000 write(4, "B\355\t\0\0\0\0\0\1\0\0\0\0\0\0\0\30\0.\0\0\0\0\4\342l\10\0\ +0\0\0\0"..., 4096) = 4096 write(4, "\1\357\t\0\0\0\0\0\261\207\223\7\0\0\0\0 \0xaaaaaaaaqw\0\0\1 +0"..., 4096) = 4096 write(4, "\334\357\t\0\0\0\0\0\211\234s\20\0\0\0\0 \0xaaaaaaaaze\0\0\1 +0"..., 4096) = 4096 write(4, "\222\356\t\0\0\0\0\0\224\22\364\27\0\0\0\0 \0xaaaaaaaamt\0\0 +\10"..., 4096) = 4096 write(4, "\4\356\t\0\0\0\0\0\262\203\307\37\0\0\0\0 \0xaaaaaaaahk\0\0\ +10"..., 4096) = 4096 write(1, ".\n..\nxaaaaaaaaff\nxaaaaaaabdu\nxaa"..., 4096) = 4096 write(4, "\230\355\t\0\0\0\0\0>\327\16)\0\0\0\0 \0xaaaaaaaadg\0\0\10". +.., 4096) = 4096 write(4, ",\356\t\0\0\0\0\0\271\343\0050\0\0\0\0 \0xaaaaaaaaix\0\0\10" +..., 4096) = 4096 write(4, "W\356\t\0\0\0\0\0002\247p8\0\0\0\0 \0xaaaaaaaako\0\0\10"..., + 4096) = 4096 write(4, "\203\356\t\0\0\0\0\0\4!\274?\0\0\0\0 \0xaaaaaaaame\0\0\10".. +., 4096) = 4096 write(4, "\325\357\t\0\0\0\0\0\"\216\34G\0\0\0\0 \0xaaaaaaaayx\0\0\10" +..., 4096) = 4096 write(1, "\nxaaaaaaabhm\nxaaaaaaaalr\nxaaaaaa"..., 4096) = 4096 write(4, "\245\356\t\0\0\0\0\0\341P#O\0\0\0\0 \0xaaaaaaaanm\0\0\10"... +, 4096) = 4096 write(4, "I\357\t\0\0\0\0\0\374]\206V\0\0\0\0 \0xaaaaaaaatq\0\0\10"... +, 4096) = 4096 write(4, "\346\357\t\0\0\0\0\0\232\351\226`\0\0\0\0 \0xaaaaaaaazo\0\0\ +10"..., 4096) = 4096 write(4, "r\360\t\0\0\0\0\0%CTi\0\0\0\0 \0xaaaaaaabew\0\0\10"..., 4096 +) = 4096 write(4, "t\360\t\0\0\0\0\0007\222^q\0\0\0\0 \0xaaaaaaabey\0\0\10"..., + 4096) = 4096 getdents(3, /* 0 entries */, 5242880) = 0 write(4, "\235\356\t\0\0\0\0\0\316H\265z\0\0\0\0 \0xaaaaaaaane\0\0\10" +..., 2688) = 2688 close(4) = 0 munmap(0x7f696e37a000, 4096) = 0 write(1, "aaaaaama\nxaaaaaaaado\nxaaaaaaaajw"..., 3813) = 3813 exit_group(0) = ?

From Perl program:

open("sample/", O_RDONLY|O_DIRECTORY) = 3 ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff56bc6bf0) = -1 ENOTTY (I +nappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 fstat(3, {st_mode=S_IFDIR|0775, st_size=36864, ...}) = 0 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 open("perldata.bin", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4 ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff56bc6b50) = -1 ENOTTY (I +nappropriate ioctl for device) lseek(4, 0, SEEK_CUR) = 0 fstat(4, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 getdents(3, /* 128 entries */, 4096) = 4080 write(4, "B\355\t\0\0\0\0\0\1\0\0\0\0\0\0\0\30\0.\0\0\0\0\4\342l\10\0\ +0\0\0\0"..., 4096) = 4096 close(4) = 0 close(3) = 0 rt_sigaction(SIGHUP, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGINT, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGQUIT, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGILL, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGTRAP, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGABRT, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGBUS, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGFPE, NULL, {0x1, [FPE], SA_RESTORER|SA_RESTART, 0x3cb8 +830030}, 8) = 0 rt_sigaction(SIGKILL, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGUSR1, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGSEGV, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGUSR2, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGPIPE, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGALRM, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGTERM, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGSTKFLT, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGCONT, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGSTOP, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGTSTP, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGTTIN, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGTTOU, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGURG, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGXCPU, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGXFSZ, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGVTALRM, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGPROF, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGWINCH, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGIO, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGPWR, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGSYS, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_2, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_3, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_4, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_5, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_6, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_7, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_8, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_9, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_10, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_11, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_12, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_13, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_14, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_15, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_16, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_17, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_18, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_19, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_20, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_21, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_22, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_23, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_24, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_25, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_26, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_27, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_28, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_29, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_30, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_31, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGRT_32, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGABRT, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGIO, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGSYS, NULL, {SIG_DFL, [], 0}, 8) = 0 exit_group(0) = ?

UPDATE

The issue was with unpack, which was not using the native short and long sizes. Quoting pack documentation:

The integer types s, S , l , and L may be followed by a ! modifier to specify native shorts or longs. As shown in the example above, a bare l means exactly 32 bits, although the native long as seen by the local C compiler may be larger. This is mainly an issue on 64-bit platforms. You can see whether using ! makes any difference this way:
printf "format s is %d, s! is %d\n", length pack("s"), length pack("s! +"); printf "format l is %d, l! is %d\n", length pack("l"), length pack("l! +");

This is what I got in my 64 bits Linux box:

[me@localhost ~]$ ./test3.pl format s is 2, s! is 2 format l is 4, l! is 8

Thanks to Corion and the anonymous monk too shy to identify himself. :-)

Alceu Rodrigues de Freitas Junior
---------------------------------
"You have enemies? Good. That means you've stood up for something, sometime in your life." - Sir Winston Churchill

Replies are listed 'Best First'.
Re: failing to use getdents system call on Linux
by Corion (Patriarch) on Nov 02, 2016 at 14:03 UTC

    Have you looked at what Perl and C pass to the syscalls and what they get back using truss or strace?

    If that data is still as you'd expect it, then it's maybe how you unpack things. Maybe you can show us a small representative piece of data and the corresponding code that unpacks the data? That would help us get closer to seeing what you see on your machine.

      Thanks Corion, I updated the node with the information you requested (inside readmore tags).

      Alceu Rodrigues de Freitas Junior
      ---------------------------------
      "You have enemies? Good. That means you've stood up for something, sometime in your life." - Sir Winston Churchill

        Your C code uses a much larger buffer. Maybe that circumvents some edge case that your Perl logic doesn't handle.

        Also, the buffer handling is different between your C and Perl programs. With Perl, you cut from the front of your buffer while with C, you walk a pointer over it. I would rewrite the Perl program to use substr maybe to mimic the C code closer.

        Also, is there a reason why you're avoiding readdir?

Re: failing to use getdents system call on Linux
by Anonymous Monk on Nov 06, 2016 at 19:41 UTC

    C code has the loop condition (bpos < nread); perl variant hasn't got one. The syscall() does not truncate $buf, you'll have to do it yourself to avoid running into zero records.

      But the logic in the Perl program is different. It subtracts from $read and shortens the buffer by the processed bytes.

      I think the Perl program could run into an infinite loop if $read becomes negative, but that doesn't seem to be the problem here...

        Aye, sorry, scratch my hasty that. The syscall() must return a good value. OP problem seems to be in the unpack template. Should try native types instead: "L!L!SZ*". Perl L is 4 bytes...

        Oh yes, infinite loop is a problem here, since I'm getting 1 or zero from $len. Here is the updated code:

        The part changed to Pod is exactly the one that put me in a "infinite" loop (probably not infinite, but long enough to create a very large file, but I killed the process once it reached GB).

        As you can see in the code, I used the same buffer size as in the C code... but results are the same:

        Alceu Rodrigues de Freitas Junior
        ---------------------------------
        "You have enemies? Good. That means you've stood up for something, sometime in your life." - Sir Winston Churchill

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1175138]
Approved by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (8)
As of 2024-03-28 17:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found