Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Parsing a PHP web application

by talexb (Chancellor)
on Aug 19, 2002 at 20:20 UTC ( [id://191303]=perlquestion: print w/replies, xml ) Need Help??

talexb has asked for the wisdom of the Perl Monks concerning the following question:

I had a quick look around the Monastery and CPAN for something that would parse PHP and build some sort of a function structure and variable dictionary .. I have spent an enjoyable hour, but have turned up nothing obvious.

I'd like to use this tool against a mass of related PHP files in order to determine who calls what, which functions are in which modules, who includes what, and that type of information. The kind of stuff that is useful when trying to get a jump-start on understanding a web application.

If nothing exists, I may see what I can put together using Parse::RecDescent (I've been itching to find the right project for that module -- and the time).

--t. alex
but my friends call me T.

Replies are listed 'Best First'.
Re: Parsing a PHP web application
by ignatz (Vicar) on Aug 19, 2002 at 21:59 UTC
    I did hack this together: PHP UML Diagram Generator, however a real PHP grammer would be a very useful thing.

    AFTERTHOUGHT: As someone who makes his living mostly as a PHP programmer, I take a perverse pleasure in coding as little as possible in the language. It's my dream to someday obtain a Zen-like nirvana state where I code PHP without using PHP at all.

    ()-()
     \"/
      `                                                     
    
Re: Parsing a PHP web application
by talexb (Chancellor) on Aug 19, 2002 at 22:08 UTC
    My first cut at extracting the code from a PHP file, for those who might be interested. No Parse::RecDescent, it's not gorgeous code, but it seems to work well so far.
Re: Parsing a PHP web application
by andreychek (Parson) on Aug 20, 2002 at 22:57 UTC
    talexb and I briefly discussed in the chatterbox that this might be possible by using Vim syntax files. If one were to write a Perl parser for the syntax files, you could obtain a lot of information about the php scripts you are interested in.

    The syntax files contain lines that look like the following:
    " Env Variables syn keyword phpEnvVar GATEWAY_INTERFACE SERVER_NAME SERVER_S +OFTWARE SERVER_PROTOCOL REQUEST_METHOD QUERY_STRING DOCUMENT_ROOT HTT +P_ACCEPT HTTP_ACCEPT_CHARSET HTTP_ENCODING HTTP_ACCEPT_LANGUAGE HTTP_ +CONNECTION HTTP_HOST HTTP_REFERER HTTP_USER_AGENT REMOTE_ADDR REMOTE_ +PORT SCRIPT_FILENAME SERVER_ADMIN SERVER_PORT SERVER_SIGNATURE PATH_T +RANSLATED SCRIPT_NAME REQUEST_URI contained " Internal Variables syn keyword phpIntVar GLOBALS HTTP_GET_VARS HTTP_POST_VARS HTTP_CO +OKIE_VARS HTTP_POST_FILES HTTP_ENV_VARS HTTP_SERVER_VARS PHP_ERRMSG PHP_SELF HTT +P_RAW_POST_DATA HTTP_STATE_VARS _GET _POST _COOKIE _SERVER _ENV con +tained " Function names syn keyword phpFunctions apache_lookup_uri apache_note ascii2eb +cdic ebcdic2ascii getallheaders virtual apache_child_terminate apache +_setenv contained syn keyword phpFunctions array array_change_key_case array_chun +k array_count_values array_diff array_filter array_flip array_fill ar +ray_intersect array_key_exists array_keys array_map array_merge array +_merge_recursive array_multisort array_pad array_pop array_push array +_rand array_reverse array_reduce array_shift array_slice array_splice + array_sum array_unique array_unshift array_values array_walk arsort +asort compact count current each end extract in_array array_search ke +y krsort ksort list natsort natcasesort next pos prev range reset rso +rt shuffle sizeof sort uasort uksort usort contained " Comment syn region phpComment start="/\*" end="\*/" contained cont +ains=phpTodo extend syn match phpComment "#.\{-}\(?>\|$\)\@=" contained cont +ains=phpTodo syn match phpComment "//.\{-}\(?>\|$\)\@=" contained cont +ains=phpTodo
    The above are just a few snippets from the php.vim file I received with vim 6.1. With this, it doesn't seem like it would be too difficult to write a script to parse this syntax file, then pull out interesting information from the actual php scripts.

    Using the above snippets, perhaps we could create 4 arrays: phpEnvVar, phpIntVar, phpFunctions, and phpComment. Then, just use split or the like to put each variable, function name, and comment from the syntax file onto their respective array. Once you have all the information you care about parsed out of the syntax file, you could use any number of means to extract useful info out of the PHP scripts. As talexb mentioned, Parse::RecDescent seems like a good candidate for this. However, the adventurous may even be able to get it to work with a combination of Tie::File and Quantum::Superpositions.

    Good luck!
    -Eric

    --
    Lucy: "What happens if you practice the piano for 20 years and then end up not being rich and famous?"
    Schroeder: "The joy is in the playing."

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://191303]
Approved by footpad
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (6)
As of 2024-04-19 15:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found