Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

How best to identify & Categorize Source Code?

by Zadeh (Beadle)
on Mar 31, 2008 at 22:19 UTC ( [id://677653]=perlquestion: print w/replies, xml ) Need Help??

Zadeh has asked for the wisdom of the Perl Monks concerning the following question:

As an ad-hoc method, it's been common in some perl apps I use to maintain an ever-larger list of extensions (.c, .cpp, .h, .pl, ...) to recognize if a file is source code. I can think of a number of problems with this approach:

1) All files have to have an extension. It's not uncommon for people to save scripts and makefiles without one.
2) There's an implicit assumption that there is a one-to-one mapping between each unique extension and the kind of content it should have.
3) You have to continually maintain a list of these extensions.

There's got to be a better way. From within *nix I might often do something like this

$ file -s some_file.c

and then see:

some_file.c: ASCII C program text

This brings me to some more questions: Is there a nice tidy perl module to accomplish this effect? If not, how best to implement it?

Replies are listed 'Best First'.
Re: How best to identify & Categorizing Source Code?
by perrin (Chancellor) on Mar 31, 2008 at 22:50 UTC
      I made a go at this initially, but the only thing it returns so far is "text/plain" which doesn't help much. What am I missing?
        It should be using a technique very similar to the "file" command. Try feeding it your /etc/magic file.
Re: How best to identify & Categorize Source Code?
by apl (Monsignor) on Apr 01, 2008 at 09:55 UTC
    If you're on *nix, you could read the first line to see what compiler/interpreter/shell is invoked....
Re: How best to identify & Categorize Source Code?
by Arunbear (Prior) on Apr 01, 2008 at 18:40 UTC
    There is File::Comments, though it is alpha software according to its docs. Alternatively there is File for Windows which may be useful if you're on win32.
Re: How best to identify & Categorize Source Code?
by Errto (Vicar) on Apr 01, 2008 at 19:36 UTC
    Try File::Type. I've used it only a bit, but it at least claims to fix some of the problems with File::MMagic.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://677653]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (3)
As of 2024-04-18 00:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found