Simple Programming Tip #3 by Charlie Calvert

By: Charlie Calvert

Abstract: Learn how refactoring can help you create robust and easy to maintain programs.

"I'd advise you to run your process with a regular rhythm of releases, each mapped to a set of use cases and each representing a successive refinement of the system's architecture." -Grady Booch

"Refactoring is the process of taking a running program and adding to its value, not by changing its behavior but by giving it more of those qualities that enable us to continue developing at speed." - Kent Beck

This subject of this tip is refactoring. The overall theme can be expressed as follows: Programmers need to constantly refactor their code in order to make their logic easier to understand and maintain. In order to do this safely, they need to cover their classes with unit tests.

If you prefer seeing these ideas laid out as bullet points, then here is an alternative presentation. Programmers refactor their code to make it:

  1. Easier to understand.

  2. Easier to maintain

  3. Easier to test.

What is Refactoring?

For the purposes of this article, I will define refactoring as follows: It is the process of renaming or restructuring classes, methods and variables.

If you draw an analogy between programming and writing, then refactoring your code can be thought of as a process similar to editing or rewriting a piece of text. If you go through an existing written work and focus on clarity of presentation, on finding exactly the right way of expressing an idea, and on adding clarity by breaking ideas out into new paragraphs, new sub-headings, and new chapters, then you are engaged in a process similar to refactoring.

Another interesting definition of refactoring is offered by Martin Fowler in his book "Refactoring," from Addison Wesley. Fowler writes: Refactor: (verb): to restructure software by applying a series of refactorings without changing its observable behavior. This definition is less than ideal in that it is self referential: it uses the word refactorings to define the verb refactor. Nevertheless, it brings out a key point: refactoring is not about adding new features, it is about refining existing features. This theme will appear several times in this article.

A few examples might help. If you have a variable called MyList, and you clarified its purpose by renaming it MyListOfPlanets or MyPlanetList, then you have refactored your code.

A more interesting example of refactoring might involve taking a single class and breaking it out into two classes. Such an action might enhance the clarity of your code or its re-usability. For instance, you might take a class called MyFilterList and break it out into two classes called MyPlanetTextFilter and MyPlanetList.

Here are three specific advantages you gain by breaking MyFilterList out into two classes called MyPanetTextFilter and MyPlanetList:

  1. Names such as MyPlanetTextFilter and MyPlanetList are easier to understand than MyFilterList. Take a moment to think about this issue and you can see why this is true. If you hear the term MyFilterList you might ask what is being filtered and what is being listed. To discover the answer to these questions, you would need to read the code in the class. In short, you would have to become a human compiler and start parsing code in order to understand its purpose. A word like MyPlanetList, on the other hand, explains up front that the purpose of the class is to maintain a list of planets. It makes your code easier to understand.
  2. It is easier to maintain two small classes called MyPlanetList and MyPanetTextFilter than it is to maintain a single big class called MyFilterList. Once again, it is easy to understand why this is the case. If you encounter a bug in MyFilterList, the first question you would have to ask yourself is whether the bug is in the list of planets, or in the filtering of the lists of planets. You have to be sure that your fix to one part of the class does not break the other part. If, on the other hand, you are working with a class called MyPlanetList, and find a bug in it, then you need concern yourself only with a single problem domain: the act of maintaining a list of planets. In short, refactoring your code into two smaller classes helped make your code easier to debug and maintain.
  3. Finally, having broken MyFilterList out into two classes makes your code easier to test. If you are writing tests for MyFilterList, then you have to compose two types of test, one to test the status of the list of planets, and one to test the filtering of the planets. It is obviously easier to write a test for a class like MyPlanetList, since the purpose of the class is so easy to define and understand. Furthermore, you can be sure that you are testing only bugs related to the planet list itself, and not accidentally using your test to uncover bugs in the filtering process.

To keep my presentation simple, I have come up with a simple example. There is, however, a second reason why this example may seem exceedingly obvious to you. We refactor our code in order to make it easy to understand. As a result, well refactored code should have an obvious, intuitive, perhaps even trivial feeling to it.

Who is Interested in Refactoring?

Not all programmers will be interested in refactoring. To better understand why this is the case, you need to consider three possible schools of thought about designing a program:

  1. One school suggests completely planning out your program ahead of time, defining all your program's functionality and classes. This design document then becomes an immutable guide which must be followed to the letter, regardless of consequences.
  2. Another school of thought advocates developing your code incrementally via an iterative process. Start out a with a few basic design goals, then sit down and implement them. Now test your code, ask for feed back, and redesign your code to include any improvements that emerged from the testing and feedback sessions. Continue this process until you have developed an application that passes all your tests and fulfills the practical suggestions you got during feedback sessions.
  3. Combine the two methods by starting out with a moderately specific plan, and then enhance that plan by an iterative process of incremental improvements as outlined in the previous bullet point.

If you are an advocate of the first school of programming, then refactoring is not going to be important to you. After all, if everything was set in stone from the beginning, then why would you ever need to restructure your classes or rename your variables? However, if you share my advocacy of the latter schools of thought, then you need to think about refactoring your code.

Another group of programmers who will not be interested in refactoring are those who continually want to add yet one more feature to a program. If you are this type of programmer, then the whole concept of refactoring your code, and of writing unit tests, will sound boring, or worse, like a waste of time.

I would add that from my point of view, refactoring and unit testing is both practical and intellectually satisfying. When I can work at my own pace, I find programming to be more interesting than playing chess, reading a novel, watching a movie or playing a computer game. What specifically is it about programming that I find so interesting? To me, the most exciting part of writing code is finding the right structure for my program. Not quite as enthralling but still interesting, is the act of writing tests to prove that my architecture is sound.

More obliquely, but perhaps more tellingly, Kent Back presents us with the following jewel of wisdom: "If you can get today's work done today, but you do it in such a way that you can't possibly get tomorrow's work done tomorrow, then you lose." Kent Beck is one of the founders of the school of programming discussed in this tip. His goal is to make sure that programmers start winning, and stop losing.

The Primary Benefit of Refactoring

Let's think about refactoring your code from a slightly different perspective. From this new point of view, the main purpose of refactoring is to encapsulate code inside increasing levels of abstraction. That can sound like a complex process on first hearing. However, it is meant to promote not complexity, but simplicity.

When we work at increased levels of abstraction than we can think about complex ideas in simpler terms. Consider the following two ways of describing an object:

  1. This object is made from trees that were cut down, ground up into pulp, and then mashed together into thin white sheets. On these sheets of wood pulp, an ink prepared from refined petroleum and plant products is stamped on pages in patterns which are meaningful to trained carbon based entities who have a sophisticated cerebral cortex and a refined visual ability.
  2. This object is a book.

The first description is more specific and detailed, the second is more abstract. However, they are both ways of talking about the same object.

When it comes to reusing the same idea, most people would prefer the second explication, rather than the first. The same is true in programming, we always have the option of writing out a series of complicated steps over and over again. We can, however, simplify the process by working at a higher level of abstraction. This usually means we encapsulate a series of steps inside an object.

We are able to use the word book in conversation because we can be sure that most listeners have a good and valid understanding of the term. In programming, we can reuse an object easily if it is well structured and well defined. One of the primary goals of refactoring is finding that well structured and well defined presentation for an object.

What is Wrong with Refactoring and Unit Testing?

There are no magic bullets in programming. The subject is hard no matter what tools you use. Give me a moment to set up my argument, and I will try to explain what can go wrong with this technology.

When refactoring code, we want to move increasingly toward simplified levels of abstraction. We want to take complex operations and encapsulate them inside a set of objects that are easy to understand and test. Consider the following analogy. If we are trying to create a computer program that simulates a library, we might first start with a single object called Library. Then we might see that our Library consisted of several shelves of books. Rather than incorporate shelves as part of the Library object, we might instead create a Shelf object and a ShelfList object. Then we might notice that a Shelf contained many books. Again, we see the need to break out the concept of a book into a separate Book object. And so on, as you discover the chapters inside the book, and the paragraphs inside the chapters, etc.

All object oriented programmers do some of this. Advocates of Agile or XP programming in general, and of refactoring and unit testing in particular, take this idea to an extreme. In the end, they have a lot of small classes. They might even seem to be adding to the level of complexity in their program. Instead of one or two objects, they now have many small objects.

People who don't like unit testing, and refactoring, always eventually come around to this point as the core of their criticism of the whole methodology. They will say, "Look, ultimately you are left with all these small objects and seeing how they fit together is not easy." No one is trying to deny this fact. Programming is difficult, and refactoring and unit testing is not a magic bullet that will suddenly make it simple. It still takes time to come to understand a well refactored program. The point, however, is that refactoring is an effective way to discover a very good structure for your program. It is, we advocates of the technique believe, a better way to find the right structure than you can achieve by planning everything out in detail ahead of time. Yes, you still need a document that describes the structure of your program so that newcomers can see how it is put together. And you still need to do some planning ahead of time. However, if you restructured properly and carefully then your architecture should fit together neatly in a logical and cleanly thought out manner. It will be easy to test and easy to maintain. The point is that restructuring helps you find and refine your architecture, and unit testing helps you prove that your architecture is valid.

Extreme Refactoring

The great advantage of frequent unit testing and refactoring your code is that it helps you structure a process that is otherwise amorphous. When I think that I can deliver a project in one month when it really takes me two, I am actually quite correct in my original assumption. I will in fact spend about one month of that two months actually planning and writing my code. The other month will be spent trying to make sure it works right, and making sure I know enough about its structure to be able to add features and debug the code. The great thing about unit testing and refactoring is that it takes that second, missing month of development, and gives it a definable structure and purpose. It doesn't make it go away, but it gives that period structure.

If you properly unit test and refactor your code, then you always know that your code works, and you always know its structure and how to test and amend that structure. Unit testing and refactoring gives definition to the amorphous, unstructured, portion of code development.

Open Source Magic

If you are used to working in shops that think only in terms of major releases, it can be very confusing to watch the development of a certain type of modern open source project. Sometimes I will hear a lot about a famous project, and go to SourceForge to download it. To my consternation, I discover that this project is at version 0.214. Seeing that version number, I might think the project is completely useless, and not worth downloading. But if I do download it, I might be surprised to find that nearly all the features found in the product are functioning properly. What is going on with this project? What does it mean to say that such a well developed project is only at version 0.214?

The answer here is simple. The developers are following the basic principle of releasing early and often. They have covered their project with unit tests, and know that at any one stage in its development, the whole program is working correctly and in a fairly robust manner. They won't reach version 1.0 until they have added many more features, but that fact has nothing to do with whether or not the program is robust. By using the principles outlined in this tip, and in previous tips, and in other programming tips yet unwritten, these people have discovered a means of creating robust software that is useful from a very early stage.

There are two great benefits derived from this technique:

  1. The developers can get a following who will give them good bug reports and good product ideas even during earlier stages of development. The product works right away, so testers begin using it even in early stages of development. This more or less precludes the possibility of the development team every reaching 1.0 with some huge unfound bug lurking in their code. Big commercial development teams who think in terms of major releases, on the other hand, risk this problem every time they ship. As a rule, those shops produce major version numbers which are inherently buggy, and it is only their point releases that get thoroughly tested and cleaned up.
  2. The second benefit of releasing early and often is that bugs get fixed not in a matter of not months or years, but in a matter of days, or sometimes hours. Though it does not happen often, there are probably a number of readers who have had the experience of reporting a serious bug to a big open source project, only to find that the developers fixed the bug and posted the update in a matter of hours. Though the particular bug I was reporting was an exceedingly simple one to fix, nevertheless this happened to me just the other day at www.plone.org. How can something like this occur in a world where bug fixes usually take months or years to be implemented? For readers of this article, the answer should be obvious. Fixing the bug is easy because the code is well refactored and easy to understand. Testing that the fix did not break other code is easy because the project is already covered with unit tests. (You have to write unit tests not after fixing each bug, but as you add each new feature.) After running the tests, it is just a matter of performing a build and publishing the result.

Just to avoid unnecessary disagreements, let me make it clear that I do not mean to be arguing against commercial software. Instead, I am arguing in favor of a particular development technique. There are many commercial shops that use exactly this technique. I bring up open source projects only because they allow us to see into the nature of the development process more easily than we can see into the development process at most commercial shops. I repeat, this technique applies equally to open source and commercial projects.

Summary

Earlier in this tip I outlined three schools of thought about programming. The first school of thought advocated thoroughly planning out your code ahead of time, and then sticking with that plan. The other two plans advocated some variation of an iterative development process involving feedback and testing.

A certain kind of person might go for option one on the grounds that it seems the most rigorous, the most disciplined, of the three choices. However, if you have managed to read through this entire programming tip, you must now be able to see that a combination of intense unit testing and intense refactoring is, if anything, potentially more difficult and more time consuming than the first alternative. I am not advocating this technique because I think it will make your programming cycle shorter. It will improve the odds of your success, and help prevent wasting time in the later stages of development, but it is not meant to be seen as a shortcut.

An extreme approach to unit testing and refactoring does, however, have one tremendous advantage over the first technique: It is much more fun. One of the mantras of Extreme Programming is that developers like to write tests. I have found this to be true. I find it very difficult to get the discipline or insight necessary to thoroughly plan out a project ahead of time. However, I almost always enjoy sitting down and writing and refining my tests, and there are few intellectual pursuits more rewarding for me personally than refactoring my code.

Software development has been compared to the art of herding cats. The point of this phrase is that there is something intangible, something more intuitive than scientific about the development process. If you can't define something in rigorous step by step detail, then your ability to do it well really becomes a factor of how much passion you are willing to bring to a task. One of the great advantages of unit testing and refactoring as a way of life for a programmer is that it helps engage developers in their work. It's fun. It is an endeavor one can passionately pursue on a day to day basis.

Good programmers are, in the best sense of the word, technophiles. They love technology. For reasons that we can't quite explain, we love working with complex, intricate tools such as compilers, debuggers and IDE's. The tools we use to perform unit testing and to restructure our code are a technophile's dream. It is simply fun to pop up JBuilder and use its well designed refactoring tools. There is something fascinating and pleasing about writing and running a well designed test.

The act of writing a test provides an incremental goal on the way to our bigger goal. Huge projects can become long and dreary tasks. But you can spice up that task by writing lots of small tests. Then, at the end of each day, you have something tangible, something cool, that you can look back on as an accomplishment. At the end of the day, you are not just 0.3 percent closer to finishing your project. Instead, you are the author of three completed unit tests that run and work perfectly, and you restructured your code to make it better.

The point here is that unit testing and refactoring are a difficult and sometimes painstaking way of achieving a goal, but they are also a fun and engaging way of achieving that goal. Yes they take time and work, but they also have their own, sometimes partially intangible, rewards.


Server Response from: ETNASC02