Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^4: heuristic to detect (perl) code

by LanX (Canon)
on Jan 20, 2013 at 10:46 UTC ( #1014281=note: print w/ replies, xml ) Need Help??


in reply to Re^3: heuristic to detect (perl) code
in thread heuristic to detect (perl) code

It's not that easy

for instance:

1) comments '#' should (mostly) follow a newline or a semicolon or to be more precise the '#' shouldn't be preceded by a quote-like operator (single s, y tr, q, qq, qr or qw or whatever)

2) strings are closed by the same quote so you need to capture the opening one and check the ending with \1.

3) __DATA__ must appear at line start, OTOH the existence of DATA is already a good indicator for perlcode.

I think discussing single strategies is for vain, in the end you have to test and train different criteria against a suitable big amount of perlmonk posts, to see if the code-sections are found.

With bayes classifier there is a very good mathematical method to combine the probabilities of such methods.

Some of the products I listed in OP use this approach, they are just not trained for perlmonks posts (where tiny code-snippets also appear in text) and have maybe a to heavy footprint to be integrated here.

update

For instance highlight.js has a function which returns the guessed language.

Cheers Rolf


Comment on Re^4: heuristic to detect (perl) code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1014281]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (6)
As of 2015-08-01 01:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (285 votes), past polls