Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

S simple C checking script

by heatblazer (Scribe)
on Mar 28, 2012 at 08:51 UTC ( #962109=perlquestion: print w/replies, xml ) Need Help??
heatblazer has asked for the wisdom of the Perl Monks concerning the following question:

Hello. Recently a friend of mine, who is learning programming in university for a master degree asked me to help writing a script for unix that matches a C file and opens it for reading or writing. Here is my suggested script:

#!/usr/bin/perl use warnings; use utf8; use strict; my $arguments = shift; if ( system("gcc $arguments") == 0) { #can we complie the file? print "C file.. opening... \n"; open (FH, '<', $arguments) or die("No file: $!\n"); while ( <FH> ) { print $_; #read it, edit it or whatever needed } } else { print "Not a C file\n"; }

I know that you`d suggest doing a search in file such as regex search or .c match, it`s true but actually you can`t be that verbose if it`s C file by just matching #include<stdio.h> or int main(void)... So I suggested to try compling it -- if it succeeds it`s a C file if it fails it`s other one. Opinions?

Replies are listed 'Best First'.
Re: S simple C checking script
by GrandFather (Sage) on Mar 28, 2012 at 09:16 UTC

    Somewhat depending on what you consider to be a C file, that is a highly non-trivial task! I don't think the compiler test would satisfy me because it would generate too many false negatives and maybe even some false positives. I've written plenty of files that I would consider C (or C++) files, but which a compiler would fail to compile, either because of missing header files or because of syntax errors.

    In like fashion probing the file for interesting common character sequences is likely to be rather error prone.

    What is your friend's use case? Often if you focus just on solving the problem for the use case you can simplify the solution and end up with something that is robust for the task at hand. For example just matching files that have a .c extension may be sufficient and is very easy, fast to implement and very fast executing.

    True laziness is hard work

      Well it appears that parsing some gcc output would be nice too. I agree that a simple grep emulation for finding .c and .h files is as easy as this but it`s not the real problem solver since nothing is stopping you to write some perl/js/html in .c or .h files... or just some rubbish ascii text with no particular task. My idea is : if it`s compiled - it`s a C.

Re: simple C checking script
by Ratazong (Monsignor) on Mar 28, 2012 at 09:22 UTC

    Are you sure you are not over-extending the request of your friend?

    Looking at the content is difficult:

    • not every C-file needs main()
    • not every C-file includes stdio.h
    The idea to compile the file to see if it runs will also lead to problems:
    • what if it is a C-file, which however contains a syntax-error? Probably it should be listed nevertheless...
    • what if the C-file needs some non-standard compile options (and compiling fails if you don't provide them?)
    • what if it is a simple C++ - program/file, which however compiles with C?
    Therefore I would just look at the extensions .c and .h (and possible additional relevant extensions for your domain).

    HTH, Rata

      OK, then a combined check.. with first pass to be for .c and .h then maybe a compile check... well it`s a very scratch of code I typed for 1-2 mins it came in my head, there can be more additions like using C`s -Wall or maybe valgrind too but it becames complicated by parsing valgrind and yes, C++ can be build with gcc, but since C++ is an additon of C... I don`t know why parsing a C file rather than C++, actually, it`s no trivial task if he intends to do it in complete. Matching .c and .h could output any code inside it, java, perl, c++... what is stopping you writing javascript in .c file?

        I fear you will never get a solution that extracts all c-files, and no others... no matter how hard you work.

        That's why you should spend more time with your friend first (as suggested by GrandFather):

        • how fault-tolerant should your solution be? (e.g. a 90%-detection-rate might be fine)
        • is it more desirable to have false positives (files wrongly classified being in C) of false negatives (C files that have been missed)?
        • what is the environment (how likely are people using "wrong" extensions, e.g. a perl-script in a .c-file)
        • how fast should your solution be? (e.g. is it acceptable to do a lot of plausi-checks on that hundreds of 20-MB-.cpp-files to ensure that none of them is in reality C?)
        In my experience, the time used to understand the use-case of your customer and to investigate with him where it is possible to "cut corners" is well invested. It will give you the possibility for easier solutions, and lead to higher customer satisfaction. And less change-requests afterwards ...

        HTH, Rata

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://962109]
Approved by Corion
Front-paged by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (7)
As of 2018-07-18 07:16 GMT
Find Nodes?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?

    Results (383 votes). Check out past polls.