HuckinFappy has asked for the wisdom of the Perl Monks concerning the following question:
I'm still hunting the origin of the "Bad File Descriptor" errors we get at random points in our tools.
I found another lead....I've searched here and Googled, but no love so far. Any time the "Bad File Descriptor" shows up in my application logs, I get an error showing up in the Windows System Event Viewer stating:
"Application popup: perl.exe - Application error: The instruction at "0x77c46fa3" referenced memory at "0x0188d000". The memory could not be "read""
A couple of interesting(?) tidbits:
The first memory location remains static. On every error I've found, it's "0x77c46fa3".
The second memory location is static per machine. On different machines it may be "0x0188d000", or "0x01889000", or "0x0188c000", but it remains constant across errors on a given machine
The second memory location is always something similar to what I describe above: 0x0188{$FOO}000, where $FOO is the only bit changing
I'm way over my head at this point, but I'm hoping this will help someone point me in the right direction.
Re: Win32 - Memory can not be "read"
by GrandFather (Saint) on Oct 04, 2006 at 20:58 UTC
|
A slight translation may help (but probably not). the 'instruction at "0x77c46fa3"' bit means that an instrustion at that (run time) address. If you can get a symbol dump of the code you can in principle find the (machine language) instruction that is causeing the trouble and should be able to identify the routine that is involved. A much more useful thing generally is to get a stack dump so you can figure out somthing of the context of the failure. You should be able to get at least a partial stack dump from the System Event Viewer.
The 'memory could not be "read"' means an invalid access. The instruction was trying to access memory that the process doesn't own. Normally that means a bad pointer - either uninitialised, or trashed in some fashion. It may be an access beyond the end of an array for example.
What version of Perl and can you post some code that you implicate in generating the problem? How often does it happen?
DWIM is Perl's answer to Gödel
| [reply] |
|
thanks for the translation. That's roughly in line with what I thought it was telling me. unfortunately, this is all very sporatic, so I'm not sure how to go about capturing a symbol dump.
I also don't see any stack dump in the System Event Viewer
This is perl 5.8.5 (I'll include perl -V at the end of this reply)
I don't have a specific code I'm suspicious of. I've see this happen apprently when my scripts are:
- Running make
- Using File::Find to gather find a collection of related files
- Running 'reg query' to get registry settings
- Spawning a variety of other system commands
Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
Platform:
osname=MSWin32, osvers=4.0, archname=MSWin32-x86-multi-thread
uname=''
config_args='undef'
hint=recommended, useposix=true, d_sigaction=undef
usethreads=undef use5005threads=undef useithreads=define usemultip
+licity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cl', ccflags ='-nologo -Gf -W3 -MD -Zi -DNDEBUG -O1 -DWIN32 -D
+_CONSOLE -DNO_STRICT -DHAVE_DES_FCRYPT -DPERL_IMPLICIT_CONTEXT -DPER
+L_IMPLICIT_SYS -DUSE_PERLIO -DPERL_MSVCRT_READFIX',
optimize='-MD -Zi -DNDEBUG -O1',
cppflags='-DWIN32'
ccversion='', gccversion='', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=10
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64
+', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='link', ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf -l
+ibpath:"t:\perl\5.8.5\lib\CORE" -machine:x86'
libpth=\lib t:\perl\5.8.5\lib
libs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib
+ comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netap
+i32.lib uuid.lib wsock32.lib mpr.lib winmm.lib version.lib odbc32.li
+b odbccp32.lib msvcrt.lib
perllibs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool
+.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib n
+etapi32.lib uuid.lib wsock32.lib mpr.lib winmm.lib version.lib odbc3
+2.lib odbccp32.lib msvcrt.lib
libc=msvcrt.lib, so=dll, useshrplib=yes, libperl=perl58.lib
gnulibc_version='undef'
Dynamic Linking:
dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -opt:
+ref,icf -libpath:"t:\perl\5.8.5\lib\CORE" -machine:x86'
Characteristics of this binary (from libperl):
Compile-time options: MULTIPLICITY USE_ITHREADS USE_LARGE_FILES PERL
+_IMPLICIT_CONTEXT PERL_IMPLICIT_SYS
Built under MSWin32
Compiled at Aug 12 2004 13:37:45
@INC:
c:/xgsPerl/5.8.5/lib
c:/xgsPerl/site/5.8.5/lib
c:/xgsPerl/site/lib
.
| [reply] [d/l] |
|
I'm not sure how to go about capturing a symbol dump
Does the pop-up you get have an option (button) that provides a dump ? I've a vague notion there's a "More Info" type button you can select, which produces the dump to a file ... but I'm not altogether sure.
Cheers, Rob
| [reply] |
|
|
| [reply] |
|
|
|
Re: Win32 - Memory can not be "read"
by jbert (Priest) on Oct 05, 2006 at 08:21 UTC
|
Are the boxes under quite a lot of load? Does this load vary during the day and does the occurence of the problem relate to this in any way?
You mention that the scripts are spawning children. Are they waiting for them to exit or running in parallel with them? How about the child scripts? Does task manager show a shedload of child processes running?
If you can't capture the problem at the time, turn on performance monitoring and graph these things over the day, so you can look for spikes/ceilings around the time of the problem. Also, don't just look at the bad machines, duplicate all the measurements on the 'good' ones and look for differences.
Do the machines tend to get sick at approx the same time (suggests an external, i.e. network factor)? Look at the network topology, do the sick machines share any factors there (same switch?)?
Some possibilities: 'memory' (vm exhaustion?), number of handles per process, maximum process stack depth (not perl stack, the underlying C stack), number of threads per proc, total number of procs running on the box, total number of threads on the box, etc.
Since its intermittent, it could be a more classic race, caused by general slowdown etc. So, lastly, can you 'induce' an episode by adding some load to one of your machines? Try different types of load (a cpu burner, a mem hog, a ping flood, a process which starts a lot of children).
It sounds like a build environment, so I wouldn't imagine the perl is too complex - is this right? Or is there a lot of hairy code in there? And again, what about the child processes?
Hope this helps, intermittent probs are always tough.
Good luck.
| [reply] |
|
Thanks for all the ideas jbert. The machines are under relatively significant load, but not in terms of number of processes. These dual processor machines have the following on them:
- A perl client script (well, 2...one per processor)
- Client polls server to get next available build job
- Client spawns perl script (and then waits for it) which:
- converts make template into make file
- runs make
- client notifies server job is complete and gets next one
So there's not a lot going on in parallel, but the network/cpu/memory can get pretty hammered (the local disk is almost a noop in most of these cases)
The perl is slightly mroe complicated than you see in most build environments, but it's not rocket science by any means.
I was able last night to re-enable the pop-ups, and caught some data this morning. It's Greek to me, but I'll include it here for the sake of completeness. First, the last few frames of the stack trace (I have it all for anyone interested):
>msvcrt.dll!77c46fa3()
perl58.dll!Perl_newFOROP(interpreter * my_perl=0x00225ffc, long flags=
+0, char * label=0x01867624, unsigned long forline=32, op * sv=0x00000
+000, op * expr=0x018956f4, op * block=0x018955dc, op * cont=0x0000000
+0) Line 3877 + 0x9 C
perl58.dll!Perl_yyparse(interpreter * my_perl=0x0173adfc) Line 257 +
+0x18 C
perl58.dll!S_doeval(interpreter * my_perl=0x01890668, int gimme=0, op
+* * startop=0x00000000, cv * outside=0x00000000, unsigned long seq=26
+30) Line 2817 + 0x6 C
perl58.dll!Perl_pp_require(interpreter * my_perl=0x0167a5ac) Line 331
+4 + 0x3a C
perl58.dll!Perl_runops_standard(interpreter * my_perl=0x00225ffc) Lin
+e 23 + 0xc C
And the disassembly of the memory around where it faulted:
77C46F9B and edx,3
77C46F9E cmp ecx,8
77C46FA1 jb 77C46FCC
77C46FA3 rep movs dword ptr [edi],dword ptr [esi]
77C46FA5 jmp dword ptr [edx*4+77C470B8h]
77C46FAC mov eax,edi
77C46FAE mov edx,3
77C46FB3 sub ecx,4
I did go look up op.c, which is where Perl_newFOROP() is defined, and found line 3877, which is:
Copy(loop,tmp,1,LOOP);
As I say, it's Greek to me, but maybe someone sees something significant here?
Thanks to all, I know we're on the fringe of "is this a perl issue or a Windows problem", which means we're moving from something I know something about into the huge unknown for me (completely linux-centric) | [reply] [d/l] [select] |
|
perl58.dll!Perl_newFOROP( ??? Doing a for loop ???
interpreter * my_perl=0x00225ffc,
long flags=0,
char *label=0x01867624,
unsigned long forline=32, ??? Does this mean line 32 of the sourc
+e file ???
op * sv=0x00000000,
op * expr=0x018956f4,
op * block=0x018955dc,
op * cont=0x00000000
) Line 3877 + 0x9 C
perl58.dll!Perl_yyparse( ??? parsing ???
interpreter * my_perl=0x0173adfc
) Line 257 + 0x18 C
perl58.dll!S_doeval( ??? Evaling the code ???
interpreter * my_perl=0x01890668,
int gimme=0,
op * * startop=0x00000000,
cv * outside=0x00000000,
unsigned long seq=2630
) Line 2817 + 0x6 C
perl58.dll!Perl_pp_require( ??? Loading a module ???
interpreter * my_perl=0x0167a5ac
) Line 3314 + 0x3a C
perl58.dll!Perl_runops_standard(
interpreter * my_perl=0x00225ffc
) Line 23 + 0xc C
The stack trace suggests that the problem occurs when you are processing a for loop. And the assember instruction where the trap is occuring is the rep move ... which supports that.
From that, I'd hazard a wild guess that you have a for loop that running off the end of the data it is copying, and from the stack trace, it looks like that for loop is located in a module--maybe at line 32 of the file?
Like I say, that's mostly guesswork, but if your script doesn't have too many dependancies it might be worth looking at line 32 of each of them and seeing if there is a for loop there, before dismissing this completely :)
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
|
|
This appears to be related to this ActiveState bug report against ActivePerl 809 (built on 5.8.3) and also this Perlbug ticket for 5.8.0 on Linux. It's listed as unconfirmed and medium priority by ActiveState. It's stalled waiting for example code that triggers it on perlbug.
There seems to be a pretty good explanation of what appears to be going on (although I can't vouch for its accuracy). It seems from the two separate bug reports against two different dot releases on two different platforms that the bug came from upstream of ActiveState and affects at least four dot releases of 5.8 so far. I don't see anything in change logs saying it's been fixed in newer versions, but I'll admit I may have missed it. Also, it may be lumped into one of the "many things fixed" lines somewhere in a perldelta.
I can't find an ActivePerl built from 5.8.5 so I guess it's pretty certain you're using something else. Another Perl on Windows project or built from source? On cygwin or not?
Since the bug reports are waiting for an example or more of the problem, I'd suggest replying to stmpeters on the Perl 5 RT system at the above-mentioned bug #34450 about your code. If it turns out to be a known issue fixed in a newer version, at least the bug tracker will be updated to show that and you'll be told what version was first fixed.
Of course, this being PerlMonks, someone with the skills, time, and access to look into this might be reading the thread right now. I wouldn't hold my breath waiting for that, though.
Might be good to test a newer version first. If a test box with 5.8.7 or 5.8.8 doesn't seem to fix it, you might consider a bug report after that.
| [reply] |
|
|
|
|
Re: Win32 - Memory can not be "read"
by Jack B. Nymbol (Acolyte) on Oct 05, 2006 at 04:47 UTC
|
Hey H. Fappy,
Have you tried the "start" command, it may be better suited to your needs. Not sure when that showed up (w2k??) but it works for me on XP.
JB | [reply] |
|
|