Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Re^4: heuristic to detect (perl) code

by LanX (Chancellor)
on Jan 20, 2013 at 10:46 UTC ( #1014281=note: print w/ replies, xml ) Need Help??

in reply to Re^3: heuristic to detect (perl) code
in thread heuristic to detect (perl) code

It's not that easy

for instance:

1) comments '#' should (mostly) follow a newline or a semicolon or to be more precise the '#' shouldn't be preceded by a quote-like operator (single s, y tr, q, qq, qr or qw or whatever)

2) strings are closed by the same quote so you need to capture the opening one and check the ending with \1.

3) __DATA__ must appear at line start, OTOH the existence of DATA is already a good indicator for perlcode.

I think discussing single strategies is for vain, in the end you have to test and train different criteria against a suitable big amount of perlmonk posts, to see if the code-sections are found.

With bayes classifier there is a very good mathematical method to combine the probabilities of such methods.

Some of the products I listed in OP use this approach, they are just not trained for perlmonks posts (where tiny code-snippets also appear in text) and have maybe a to heavy footprint to be integrated here.


For instance highlight.js has a function which returns the guessed language.

Cheers Rolf

Comment on Re^4: heuristic to detect (perl) code

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1014281]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (6)
As of 2015-11-27 12:16 GMT
Find Nodes?
    Voting Booth?

    What would be the most significant thing to happen if a rope (or wire) tied the Earth and the Moon together?

    Results (727 votes), past polls