Item Description: Scans C source code for functions, typedefs, macros, variables, etc.

Review Synopsis: A useful module for extracting information from C source files, with a lot of cool Perl inside.


The C::Scan module performs fast, accurate scanning of C source code. It provides an object interface for accessing information about a particular C source file. The main interface (after creating the initial object) is to use a get() method that will fetch information using a set of pre-defined keywords which specify the type of information you want: A lot of the information is available either raw or parsed, depending on the specific keyword used (for example, 'fdecls' vs. 'parsed_fdecls').

Why should you use it?

You want to use Perl to extract information about C source code, including the functions declared or defined, arguments to functions, typedefs, macros defined, etc.

Why should you NOT use it?

Any bad points?

The documentation is lacking. This is really annoying because almost all of the keyword fetches that try to parse the text use complex and arbitrary structures for return values: an array ref of refs to arrays that each hold five defined values, an array ref of, a hash ref where the hash values are array refs to two-element arrays, etc. Don't be surprised if you have to dive in to the code to really figure out what's being returned.

Related Modules

C::Scan is an example of extremely powerful use of the Data::Flow module (not surprising, as both were originally written by Ilya). The keywords you use to fetch information are the underlying Data::Flow recipe keywords.

Personal notes

I used C::Scan to create a code pre-processor that would scan our C source and dump various information into structures for use by an administrative interface. This ended up eliminating several steps in our process that would always break when someone added a new command function but didn't update the right help-text table, etc.

I learned a lot from threading my way through the C::Scan source code. It makes liberal use of \G in regexes to loop through text looking for pieces it can identify as a function, typedef, etc., and the pos builtin to fetch and set the offset for the searches. This allows the module to use multiple copies of the text side-by-side, one with the comments and strings whited out and the other with full text. This way, it can scan a "sanitized" version to identify C syntax by position, but then return full text from the other string. This is an extremely effective and astonishingly efficient technique.


Examples of a few ways to pull information from C::Scan:
$c = new C::Scan(filename => 'foo.c', filename_filter => 'foo.c', add_cppflags => '-DFOOBAR', includeDirs => [ 'dir1', 'dir2' ] ); # # Fetch and iterate through information about function declarations. # my $array_ref = $c->get('parsed_fdecls'); foreach my $func (@$array_ref) { my ($type, $name, $args, $full_text, undef) = @$func; foreach my $arg (@$args) { my ($atype, $aname, $aargs, $full_text, $array_modifiers) = @$ +arg; } } # # Fetch and iterate through information about #define values w/out arg +s. # my $hash_ref = $c->get('defines_no_args'); foreach my $macro_name (keys %$hash_ref) { my $macro_text = $hash_ref{$macro_name}; } # # Fetch and iterate through information about #define macros w/args. # my $hash_ref = $c->get('defines_args'); foreach my $macro_name (keys %$hash_ref) { my $array_ref = $macros_hash{$macro_name}; my ($arg_ref, $macro_text) = @$array_ref; my @macro_args = @$arg_ref; }