Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Where to find info on low level perl internals names?

by Anonymous Monk
on Oct 25, 2011 at 10:37 UTC ( #933592=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hiya Monks,

I'm trying to grok the infamous 'B' Module.
http://search.cpan.org/perldoc?B

This documentation assumes that the reader knows a fair amount about perl's internals including such things as SVs, OPs and the internal symbol table and syntax tree of a program.

I've read through the most obvious places... http://perldoc.perl.org/DB.html
http://perldoc.perl.org/perldebug.html
http://perldoc.perl.org/perldebguts.html
/usr/libdata/perl5/i386-openbsd/5.10.1/CORE/perl.h
/usr/libdata/perl5/i386-openbsd/5.10.1/CORE/handy.h

I'd like to find info on the naming convention and/or name definitions. Basically, what do the names mean and why are they named that way?

NAME    KNOWN           WILD GUESSES
SV      is a scalar     (WTF does the 'V' stand for?)
RV      is a Reference  (well, it was until Perl 5.11.*)
PV      is a string     (WTF does 'P' stand for?)
IV      is an integer
NV      is a number
LV      is ?            ("LeftValue?" LVALUE? Local? )
GV      is ?            (General? Global?, Garbage-Collected?)
AV      is ?            (Array?)
HV      is ?            (Hash?)
CV      is ?
FM      is ?
IO      is ?

PVIV    string and integer slots
PVNV    string and number slots
PVLV    is ?            (string of left value, e.g. func name)

OP                      (Operation?)
COP
LOOP                    (Loop Operation?)
PMOP
PVOP                    (String Operation?)
SVOP                    (Scalar Operation?)
UNOP
BINOP
LOGOP                   (Logic Operation? e.g. 'and' 'or')
PADOP
LISTOP

Much to the surprise of my local prune vendor, 'BM' is Boyer-Moore, a string search algorithm, but as far as I know, it's a no longer a used type/name. Digging around for the obvious is helpful... but it's like trying to learn by osmosis.

$ perl -e 'print(ref(\pos) . "\n");' LVALUE $ perl -e 'use B; print(ref(B::svref_2object(\substr "foo",1)) . "\n") +;' B::PVLV $ perl -MO=Terse -ce '$v=2; $v=2.3; $v="txt"; print index($v,"x",0);' LISTOP (0x7d61afc0) leave [1] OP (0x7df6e560) enter COP (0x8b34b340) nextstate BINOP (0x7d61aa60) sassign SVOP (0x7d61a9c0) const IV (0x85306280) 2 UNOP (0x7d61af20) null [15] SVOP (0x7d61afe0) gvsv GV (0x86712440) *v COP (0x8b34b380) nextstate BINOP (0x7d61afa0) sassign SVOP (0x7cd4e6c0) const NV (0x86712460) 2.3 UNOP (0x7d61af60) null [15] SVOP (0x7d61ae60) gvsv GV (0x86712440) *v COP (0x7f6ffe40) nextstate BINOP (0x7df6edc0) sassign SVOP (0x7df6ee60) const PV (0x86712510) "txt" UNOP (0x7df6ed00) null [15] SVOP (0x7df6eca0) gvsv GV (0x86712440) *v COP (0x7f6fff80) nextstate LISTOP (0x7df6ef60) print OP (0x7df6ef20) pushmark LISTOP (0x7d61ac80) index [1] OP (0x7df6ed60) null [3] UNOP (0x7df6eee0) null [15] SVOP (0x7df6eda0) gvsv GV (0x86712440) *v SVOP (0x7d61ad00) const GV (0x867124b0) "x" SVOP (0x7df6eec0) const IV (0x86712470) 0 -e syntax OK $ perl -MO=Terse -ce '@a = qw(1 two 3);foreach (@a) {print "$_\n";}' LISTOP (0x8b2d0ec0) leave [1] OP (0x81b42e00) enter COP (0x8945b100) nextstate BINOP (0x818e7080) aassign [2] UNOP (0x83c67f60) null [142] OP (0x83c67fe0) pushmark SVOP (0x83c67f80) const PV (0x888cf470) "1" SVOP (0x83c67ca0) const PV (0x888cf510) "two" SVOP (0x83c67fa0) const PV (0x888cf4b0) "3" UNOP (0x83c67ce0) null [142] OP (0x8a367fa0) pushmark UNOP (0x83c67f40) rv2av [1] SVOP (0x7e65a4c0) gv GV (0x888cf440) *a COP (0x88486f80) nextstate BINOP (0x87348f00) leaveloop LOOP (0x88486ec0) enteriter OP (0x87348f80) null [3] UNOP (0x873489a0) null [142] OP (0x87348ee0) pushmark UNOP (0x87348a20) rv2av [3] SVOP (0x87348680) gv GV (0x888cf440) *a SVOP (0x87348fc0) gv GV (0x7f67f170) *_ UNOP (0x87348d80) null LOGOP (0x87348f60) and OP (0x87348fa0) iter LISTOP (0x87348e40) lineseq COP (0x8945b140) nextstate LISTOP (0x87348c20) print OP (0x87348d20) pushmark UNOP (0x87348f40) null [67] OP (0x87348e20) null [3] BINOP (0x83c67c60) concat [4] UNOP (0x83c679c0) null [15] SVOP (0x83c67f00) gvsv GV (0x7f67 +f170) *_ SVOP (0x87348da0) const PV (0x888cf49 +0) "\n" OP (0x87348e80) unstack -e syntax OK

If you know were the names are documented, or know that they are undocumented, please kick the knowledge downstairs to the unwashed. Thanks!

Comment on Where to find info on low level perl internals names?
Download Code
Re: Where to find info on low level perl internals names?
by BrowserUk (Pope) on Oct 25, 2011 at 10:57 UTC

    A few:

    (WTF does the 'V' stand for?)

    Also wild-assed guess: 'V' stands for variable. So SV stands for Scalar Variable.

    (WTF does 'P' stand for?)

    'P' stands for Pointer; as in the value stored here is a pointer to the (string) value as opposed to IV where the value stored here is the Integer Variable itself; or NV where the value is the value of the Number Variable itself.

    LV is ? ("LeftValue?" LVALUE? Local? )

    Left value in the C language lvalue sense of, a value that can appear on the left side of the assignment operator. (eg. not a constant).

    GV is ? (General? Global?, Garbage-Collected?)

    Global Variable.

    • CV => Code Variable (a perl subroutine pointer.)
    • IO => io object handle: Like STDIN, STDOUT STDERR, ARGV etc.

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      GV is a "glob", short for "typeglob", as in "a bunch of types". They are used as symbol table entries, although they can exist outside the symbol table, so they aren't necessarily global. open my $fh, ... populates $fh with a reference to a "non-global" glob.

        And what do you think the 'glob' in 'typeglob' stands for?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Where to find info on low level perl internals names?
by Not_a_Number (Parson) on Oct 25, 2011 at 10:59 UTC
Re: Where to find info on low level perl internals names?
by bart (Canon) on Oct 25, 2011 at 11:02 UTC
    I've read through the most obvious places...
    Have you also checked out Perlguts Illustrated? I don't thinks so, or you'd have seen this:
    The first things to look at are the data structures that represent Perl data; scalars of various kinds, arrays and hashes. Internally Perl calls a scalar SV (scalar value), an array AV (array value) and a hash HV (hash value). In addition it uses IV for integer value, NV for numeric value (aka double), PV for a pointer value (aka string value (char*), but 'S' was already taken), and RV for reference value. The IVs are further guaranteed to be big enough to hold a void* pointer.

    update Link updated (future proofed) to always point to the latest version, as per the follow-up. Thanks, anon! I knew about "dist", but not that it would work for this particular case.

Re: Where to find info on low level perl internals names?
by Corion (Pope) on Oct 25, 2011 at 11:04 UTC

    For general information on Perls internal memory structures, I would look at illguts. The names for the various structures are best treated as Just Names, but the following mnemonics work for me:

    SV - Scalar Value IV - Integer Value NV - Numerical Value RV - Reference Value PV - Pointer Value (like *char, for example) LV - I haven't encountered this one, maybe it lives in illguts GV - glob value. The glob is (I think) shorthand for global AV - Array Value HV - Hash Value CV - Code Value (think subroutine reference) FM - Format IO - Filehandle
    The ops are also documented in illguts I believe.

    Update: Typoo

      LV = left hand value, used for "substr($string, 0, 10) = "replacement value";" system.
Re: Where to find info on low level perl internals names?
by Tux (Monsignor) on Oct 25, 2011 at 12:49 UTC

    It's all documented :)

    $ perldoc guts : Variables Datatypes Perl has three typedefs that handle Perl's three main data type +s: SV Scalar Value AV Array Value HV Hash Value Each typedef has specific routines that manipulate the various +data types. What is an "IV"? Perl uses a special typedef IV which is a simple signed integer + type that is guaranteed to be large enough to hold a pointer (as wel +l as an integer). Additionally, there is the UV, which is simply an un +signed IV. Perl also uses two special typedefs, I32 and I16, which will al +ways be at least 32-bits and 16-bits long, respectively. (Again, there +are U32 and U16, as well.) They will usually be exactly 32 and 16 bits + long, but on Crays they will both be 64 bits. : : etc etc $

    Next to guts you might want to take a look at xs and xstut.

    I don't think your WTF?'s are appropriate.


    Enjoy, Have FUN! H.Merijn

      Thank you. I remembered seeing your name in a handy.h comment. I doubt I can edit out the WTF?'s as an Anonymonk Mous but I would if I could.

Re: Where to find info on low level perl internals names?
by Anonymous Monk on Oct 25, 2011 at 12:52 UTC

    Many Thanks To All!

    Perl Guts Illustrated was exactly what I was looking for and oddly enough, I had seen and visited a link to it, but it was 404'd.

Re: Where to find info on low level perl internals names?
by ikegami (Pope) on Oct 25, 2011 at 19:38 UTC

    Your use of "is" is incorrect when talking about scalar types. An IV is not necessarily an integer.

    $ perl -MB -E'$_=123; $_=undef; say B::class B::svref_2object \$_' IV

    SV is the base "class" for all Perl variables (not just scalars), but yeah, the "S" stands for "scalar".

    Here goes:

    B class nameActual SV typeDescriptionExample
    SPECIALNULLCan hold undefperl -MB -E'say B::class B::svref_2object \undef'
    PVPVCan hold undef, a string of 8 bit chars or a string of 32/64 bit charsperl -MB -E'say B::class B::svref_2object \"a"'
    IVIVCan hold undef, a reference, a signed int or an unsigned integerperl -MB -E'say B::class B::svref_2object \123'
    NVNVCan hold undef or a floating point numberperl -MB -E'say B::class B::svref_2object \1.3'
    PVIVPVIVCan hold undef, a reference, a signed int, an unsigned integer, a string of 8 bit chars and/or a string of 32/64 bit charsperl -MB -E'$_=123; "".$_; say B::class B::svref_2object \$_'
    PVNVPVNVCan hold undef, a reference, a signed int, an unsigned integer, a floating point number, a string of 8 bit chars and/or a string of 32/64 bit charsperl -MB -E'$_=1.3; "".$_; say B::class B::svref_2object \$_'
    PVMGPVMGA PVNV that supports magicperl -MB -E'say B::class B::svref_2object \$|'
    PVLVPVLVA PVMG with extra fields, used for lvaluesperl -MB -E'say B::class B::svref_2object \substr("",0)'
    GVPVGVA globperl -MB -E'say B::class B::svref_2object \*FOO'
    AVPVAVAn arrayperl -MB -E'say B::class B::svref_2object []'
    HVPVHVA hashperl -MB -E'say B::class B::svref_2object {}'
    CVPVCVA subperl -MB -E'say B::class B::svref_2object sub{}'
    FMPVFMA format
    use B; use feature qw( say ); format X = Foo . say B::class B::svref_2object *X{FORMAT};
    IOPVIOCan hold a file handle or a directory handleperl -MB -E'say B::class B::svref_2object *STDOUT{IO}'
    REGEXPREGEXPA regexp objectperl -MB -E'say B::class B::svref_2object qr//'

    The "P" in "PV" is for "pointer".

    Where "and/or" is used, all combinations are possible, with the following exceptions:

    • An undefined scalar is one that contains nothing at all, so a scalar cannot contain both undef and something else.
    • A scalar cannot contain two or more of the following at a time: a reference, a signed integer and an unsigned integer.
    • A scalar cannot contain both of the following at a time: a string of 8 bit chars and a string of 32/64 bit chars

    While it is technically possible for some scalars to contain both a reference and something else, Perl doesn't create these, and I don't know how safe it is.

      I revamped the table in the parent.

      • Cleared up confusion that came from "IV" and the like having somewhat different meanings in different contexts.
      • Cleared up confusion due to difference between B class names and the actual SV type name
      • Since the question was about B class names, the examples were changed from using Devel::Peek to using B.
      • Added many missing class names.
      A scalar cannot contain both of the following at a time: a string of 8 bit chars and a string of 32/64 bit chars

      32bit chars? 64bit chars? UTF 16? UTF 32? UTF 64 jkjk?

        32bit chars? 64bit chars?

        32 on 32 bit builds. 64 on 64 bit builds. The format actually allows for 72 bit numbers, but Perl doesn't provide a means of storing and fetching values that large.

        UTF 16? UTF 32? UTF 64

        Are you asking about the format? It's a variable width format based on UTF-8 confusingly called utf8, but it has nothing to do with Unicode. For starters, the highest possible Unicode character is only 0x10FFFF, far less than what utf8 allows. Unicode has a bunch of reserved and private use and whatnot code points, but not these strings. Unicode imposes certain semantics, but not these strings.

      Note that there is actually TWO different types of content for IO that differ only slightly in internal storage, but a lot in behavior. There are open files (including sockets and scalar-IO) and open directories. The difference is invisible from the outside:

      $ perl -MDP -we'open DH,$0;DDump*DH' SV = PVGV(0x11d7dc0) at 0x76a5e0 REFCNT = 3 FLAGS = (MULTI) NAME = "DH" NAMELEN = 2 GvSTASH = 0x74af48 "main" GP = 0x771bc0 SV = 0x0 REFCNT = 1 IO = 0x76a5f8 FORM = 0x0 AV = 0x0 HV = 0x0 CV = 0x0 CVGEN = 0x0 LINE = 1 FILE = "-e" FLAGS = 0x2 EGV = 0x76a5e0 "DH" $ perl -MDP -we'opendir DH,".";DDump*DH' SV = PVGV(0x11d7e00) at 0x76a5f0 REFCNT = 3 FLAGS = (MULTI) NAME = "DH" NAMELEN = 2 GvSTASH = 0x74af48 "main" GP = 0x771b30 SV = 0x0 REFCNT = 1 IO = 0x76a620 FORM = 0x0 AV = 0x0 HV = 0x0 CV = 0x0 CVGEN = 0x0 LINE = 1 FILE = "-e" FLAGS = 0x2 EGV = 0x76a5f0 "DH"

      Enjoy, Have FUN! H.Merijn

        In both case, we have scalars of the same type (PVIO). But just like an IV can hold more than one kind of data, PVIO can contain one of two types of handles. I'll clarify that in my table.

        By the way, you didn't show the difference, so here goes:

        $ perl -MDevel::Peek -we'open FH,$^X; Dump *FH{IO}' SV = IV(0x7564a8) at 0x7564b0 REFCNT = 1 FLAGS = (TEMP,ROK) RV = 0x7693f0 SV = PVIO(0x76ccb8) at 0x7693f0 REFCNT = 2 FLAGS = (OBJECT) STASH = 0x768b68 "IO::File" IFP = 0x763b30 OFP = 0x0 DIRP = 0x0 LINES = 0 PAGE = 0 PAGE_LEN = 60 LINES_LEFT = 0 TOP_GV = 0x0 FMT_GV = 0x0 BOTTOM_GV = 0x0 TYPE = '<' FLAGS = 0x0 $ perl -MDevel::Peek -we'opendir DH,"."; Dump *DH{IO}' SV = IV(0x759d18) at 0x759d20 REFCNT = 1 FLAGS = (TEMP,ROK) RV = 0x76cc78 SV = PVIO(0x770528) at 0x76cc78 REFCNT = 2 FLAGS = (OBJECT) STASH = 0x76c3d8 "IO::File" IFP = 0x0 OFP = 0x0 DIRP = 0x7a7330 LINES = 0 PAGE = 0 PAGE_LEN = 60 LINES_LEFT = 0 TOP_GV = 0x0 FMT_GV = 0x0 BOTTOM_GV = 0x0 TYPE = '\0' FLAGS = 0x0

        Thanks again!

        I'm kinda laughing at myself, and you might enjoy the laugh as well, so I might as well share... I spent a few minutes staring at the output of those two commands you posted trying to spot "the difference." ;)

        (sigh) sometimes I make myself wonder.

Reaped: Re: Where to find info on low level perl internals names?
by NodeReaper (Curate) on Oct 26, 2011 at 12:53 UTC
Re: Where to find info on low level perl internals names?
by Khen1950fx (Canon) on Oct 27, 2011 at 13:13 UTC
    I've been using Opcode and Opcodes to grok B and the opcodes. For example,
    #!/usr/bin/perl use strict; use Opcodes; use Opcode qw(opdump); use Data::Dumper::Concise; print Dumper (opdump), "\n", (scalar opcodes), "\n", (opname2code('gv')), "\n", (opdesc(7)), "\n", (opclass(7)), "\n", (opdesc(opclass(7))), "\n"; (opdump) lists all the opcodes. (scalar opcodes) the number of opcodes for your version of perl (opname2code('gv')) gives the number('code') of gv. (opdesc(7)) the number for gv. Gives a short description (opclass(7)) - classes such as OP, COP, UNOP, BINOP, etc. - here, gv is a svop_or_padop which is 6 (opdesc(opclass(7))) - tells you that it's a scalar variable

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://933592]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (6)
As of 2014-12-26 23:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (176 votes), past polls