Understanding unfamiliar source code can be difficult for programmers. I took on a software porting project in my first job, and this is how I dealt with getting to grips with the large, unknown codebase. I hope this is of use to any software developer trying to understand source code written by others.
It is harder to read source code than to write it.
Take Your Time
Take plenty of time to become familiar. It’s far more valuable to learn and understand the codebase at this point than to jump in and potentially add incorrect code.
Take a Copy
Before you do anything else, take a snapshot of the code you receive in a source control system (SCCS/RCS/CVS/Git/Mercurial/Subversion – whatever your fancy). Include the documentation and any binaries too.
Use the software
Forget the source code for a bit. Just use the software, over and over, as many different options and set-ups as possible. Try to understand it fully from a user perspective.
Scan read through any documentation you have, particularly anything with diagrams and charts. (If there’s something I don’t understand, I mark it with a highlighter or post-it note)
If there are no docs, now is a good time to start creating them – your manager will love you :)
Try to find out about the history of the code – who wrote it, and why.
Read or Trace Through Code
(Note: this was written for sequential code; event-driven code will need a slightly different approach)
Start from the start of the codebase: look for the main() or initializing function, or whatever the equivalent is in the language of the source code. Start there and manually step through the code. (I did this on paper, not with the debugger, but you might prefer to use that.)
Write down the major functions that get called. Write down what they call. See if these functions are part of the code base, or an external library. Keep going until you have a good high level overview of what happens when.
Read the Tests
If there are unit tests, you can look through them to see what they expect, and you can even write some of your own to test your hypotheses.
Setting up a successful build environment from scratch is a great way to find out what dependencies the system might have, and is just good thing to do to help you understand the system. Once you’ve it set up, you could move on to automating a build and test cycle.
Ask the Last Guy
If it’s easy to do – or if you feel it’s necessary after doing the other steps listed – try to meet or at least call the previous programmers who worked on the code. Ask them about anything you don’t understand, starting at the high level design issues and working down to lower level coding issues.
Ask about trade-offs they had to make. Ask about anything marked in the docs. Ask them about the main functions, and where they had problems. Ask them if there any specific areas they think they should show you.
Try to get the most out of having them available to you as as possible as you might not get another opportunity with them.
- Book: Working Effectively with Legacy Code – Michael Feathers
- Joel on Software: “Things You Should Never Do” (i.e. don’t rewrite code)
This was originally written from the point of view of a cross-linked sequential C program. Suggestions for improvement, and other comments, are most welcome.