Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^2: Swallowing an elephant in 10 easy steps

by ELISHEVA (Prior)
on Aug 13, 2009 at 18:27 UTC ( #788410=note: print w/ replies, xml ) Need Help??


in reply to Re: Swallowing an elephant in 10 easy steps
in thread Swallowing an elephant in 10 easy steps

The time drivers are the overall quality of the design, ease of access to code and database schemas, and the size of the system: the number of database tables, the complexity of the type/class system(s), the amount of code, and the number of features in whatever subsystem you explore in step 10. Rather than an average, I'll take the most recent example, Perl Monks.

The Perl Monks website has 83 data tables, two main type hierarchies (nodetypes and perl classes), a core engine of about 12K and about 600 additional code units spread throughout the database. Documentation is scattered and mostly out of date. The initial architecture seems solid but its features have been used inconsistently over time. Accessing the schema and code samples is slow because there is no tarball to download - it has to be done through the web interface or manually cut and pasted into files off line. The database/class assessment (1-4) took about 16 hours. Steps 5-7 took about 30 hours. Steps 8-10 took about 24 hours. All told that is 70 hours, including writing up documentation and formatting it with HTML.

However, I always like to leave myself some breathing space. If I were contracting to learn a system that size, I'd want 90 hours and an opportunity to reassess time schedules after the initial code walk through was complete. If a system is very poorly designed this process takes somewhat longer.

A crucial element in controlling time is controlling the amount of detail needed to gain understanding. It is easy to lose sight of the forest for the trees. That is why I advise stopping and moving onto the next phase once your categories give a place to most design elements and the categories work together to tell story. That is also why I recommend backtracking as needed. Sometimes we make mistakes about which details really matter and which can be temporarily blackboxed. Knowing I can backtrack lets me err on the side of black boxing.

The other element affecting time is, of course, the skill of the analyst or developer. I have the advantage that I have worked both at the coding and the architecture level of software. I doubt I could work that fast if I didn't know how to read code fluently and trace the flow of data through code. Having been exposed to many different system designs over the years also helps - architectural strategies leave telltale footprints and experience helps me pick up on those quickly.

However one can also learn these skills by doing. The more you practice scanning, categorizing and tracing through code and data the better you get at it. It will take longer, but the steps are designed to build on themselves and are, in a way, self-teaching. That is why you can't just do the 10 steps in parallel as jdporter jokingly suggests below.

However some theoretical context and a naturally open mind definitely helps: if you think that database tables should always have a one-to-one relationship with classes you will be very very confused by a system where that isn't true. If I had to delegate this work to someone else I probably would work up a set of reading materials on different design strategies that have been used in the past 30 years. Alternatively or in addition, I might pair an analyst with a programmer so that they could learn from each other (with neither having priority!)

Best, beth

Update: expanded description of the PerlMonks system so that it addresses all of the time drivers mentioned in the first paragaph.

Update: fixed miscalculation of time


Comment on Re^2: Swallowing an elephant in 10 easy steps
Re^3: Swallowing an elephant in 10 easy steps
by ig (Vicar) on Aug 19, 2009 at 23:11 UTC

    I haven't see the PerlMonks code, but I have seen the perl source and read much of the documentation. I wonder if you have studied it, how you would tackle it and what reasonable time line expectations might be for gaining an overall understanding of it.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://788410]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2014-07-31 02:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (244 votes), past polls