Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Finding C++ single inheritance occurrances

by gibsonca (Beadle)
on May 18, 2011 at 16:53 UTC ( #905513=perlquestion: print w/ replies, xml ) Need Help??
gibsonca has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to find the occurrences of single inheritance (SI) in C++ code and header files. Not knowing C++ all that well, I thought finding SIs would be looking for a string that meets the following criteria-

 class <str1> : public <str3>::<str2>

However I was informed by someone who knows C++ and perl (no longer here) that the best search on a given file for SI would be

@si = grep /(class|struct)\s+[^:{;]+:[\w<>\s]+{/, $string;

I am a little shaky how this regexp works, and while it finds most of the SIs I have given it, it does not work all of the time. So I think I am a little confused on both. Any suggestions on how to robustly find SIs occurrences? Thanks --cg

Comment on Finding C++ single inheritance occurrances
Select or Download Code
Re: Finding C++ single inheritance occurrances
by roboticus (Canon) on May 18, 2011 at 17:07 UTC

    gibsonca:

    If it doesn't work all the time, what fails? Once you identify cases that don't work with the current regex, fix the regex to be more inclusive, or add another regex to detect the case(s).

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: Finding C++ single inheritance occurrances
by toolic (Chancellor) on May 18, 2011 at 17:07 UTC
    I am a little shaky how this regexp works
    YAPE::Regex::Explain might help...
    The regular expression: (?-imsx:(class|struct)\s+[^:{;]+:[\w<>\s]+{) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- class 'class' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- struct 'struct' ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- [^:{;]+ any character except: ':', '{', ';' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- : ':' ---------------------------------------------------------------------- [\w<>\s]+ any character of: word characters (a-z, A- Z, 0-9, _), '<', '>', whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- { '{' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

    Typically, you would provide a LIST to grep, instead of a single scalar.

      Typically, you would provide a LIST to grep, instead of a single scalar.

      Conversely, if one only has one string to match, one wouldn't use grep.

      my @matches = $string =~ /(?:class|struct)\s+[^:{;]+:[\w<>\s]+{/g;
Re: Finding C++ single inheritance occurrances
by educated_foo (Vicar) on May 18, 2011 at 17:42 UTC
    That code doesn't work at all: it returns the entire $string if it matches the expression, or nothing otherwise. Also, the expression captures the word "class" or "struct", which is pretty useless. You were on the right track with your pseudocode, and probably want to do something along these lines:
    %stuff = ($text =~ /(?:class|struct) \s+ ([\w\d_:]+) \s* : \s* (?:public|protected|private) \s+ ([\w\d_:]+) \s* \{/gsx);
    Then keys %stuff would be the class names, with $stuff{CLASS} being CLASS's parent.
Re: Finding C++ single inheritance occurrances
by John M. Dlugosz (Monsignor) on May 18, 2011 at 19:39 UTC
    It's been pointed out that the grep usage doesn't make any sense, and it doesn't return the class names it would find.

    But the expression itself is odd, too. As you can see from toolic's explanation, it grabs everything after the class keyword up to the next : character. It doesn't make use of \w or other identifier-forming knowledge, even though \w is used later.

    After the ':', it finds word characters, whitespace, and angle brackets. Why angle brackets and whitespace? To handle templates. So something like public map <int, char> can be parsed as a base class. Oh, wait! The comma is not allowed so this won't parse! The comma is left out of the set, because multiple inheritance looks like public name1, private name2 and the idea I guess was that if it contains a comma before finding the '{' then it's not SI.

    But that's not the case for templates with more than one parameter.

    It would also be confused by comments and any non-standard keywords and modifiers that are used on that line. It requires the whole thing to be on one line. It won't work at all if preprocessor macros are involved.

    It requires the stuff to be followed by a '{', which will match actual definitions, but not (just) declarations.

    As for the criteria you mention: You already saw that you could use struct as well as class. You can have other visibility keywords than public or none at all. Matching just what you showed doesn't tell you that there's no ", base2" following it!

    The two strings separated by a :: isn't useful. Classes might be qualified to the point of containing exactly one use of ::, or not, or any number of them. Are you looking at the last :: to find out the base name (as str2) and the qualifications used (as str3)? Then the front part needs to be optional, if there is no :: mentioned at all.

    The devil is what you allow in each <str>, since allowing anything will let through all kinds of junk.

    In general, it cannot be done this way, since it requires a parse of the grammar and not a simple pattern. But it could be made to work well enough for the actual cases you have.

    To allow it to be more fault-tolerant, I suggest you program your tool to report on all occurrences of class/struct it finds, along with the determination of "yes" (it is SI), "clearly no", and "can't really understand it". That way you can review the results and make sure it's not missing something.

    If I were to do this without elaborate parsing, I'd start by removing comments and funny extended keywords. Then replace <> template arguments with a simple token, and then use a pattern similar to those shown already. But it needs to handle declarations that span across lines, and since you have not found them yet, the pre-conditioning shouldn't introduce any false positives or other artifacts that would mess up the next step. But slurp the whole file as one string, pre-condition it, and search for all matches treating line-breaks as whitespace.

Re: Finding C++ single inheritance occurrances
by roboticus (Canon) on May 19, 2011 at 20:28 UTC

    gibsonca:

    John's response shows many of the shortcomings of working on the source code. However, if the code already exists, then it presumably compiles. Perhaps you can parse the debugging information of the compiled object or executable files. The output of objdump is rather arcane, but looks regular enough to more easily parse than the original source code. Other compilers may have other debug or cross-reference output that you may find easier to handle.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://905513]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (11)
As of 2014-08-22 16:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (161 votes), past polls