The problem of learning Object Oriented Design
The greatest problem in learning object oriented design is in the toy problems. The object oriented solution never looks better than the regular solution.
I recently read Head First Design Patterns and redid every pattern in that book in Python. You can see my results at https://github.com/jmcguire/learning/tree/master/design_patterns. Look at the original files and the new files. The new files are always longer, and never easier to grok.
The real problem is that the benefits of Object Oriented Design don’t show themselves until you are past toy problems and dealing with real-world applications, but those applications won’t fit in a book or in a paper.
In his paper on well-designed modules, Parnas shows early evidence of this problem.
The flowchart was a useful abstraction for systems with on the order of 5,000-10,000 instructions, but as we move beyond that it does not appear to be sufficient; something additional is needed.
(By “flowchart” he means a method of designing functions that don’t have good information hiding. Meaning they are all tightly coupled.)
So until you hit programs with over 5,000 lines (instructions) of code, you won’t see the benefits of good design.
Lets break that number down into functions⌗
A well written subroutine should do one thing well, along with handling error checking and edge cases. A programmer should be able to hold it all in their head. Normally this is around 40 to 80 lines of code.
- A lot of programmers will waffle on this issue and say that the number of lines a function should be is how many lines a function needs. This is both technically correct and largely unhelpful. 40 - 80 lines isn’t an absolute, but it’s a good guideline.
- It’s not a coincidence that 40 to 80 lines will fit comfortably into a screen or terminal shell without scrolling. Seeing everything at once is mega-important to understanding it.
- Back in 2004, the Windows 2000 code was leaked. The programming website kuro5hin did a huge review of the code, and made admirable mention of the size of the subroutines. They were mostly all around 40 - 80 lines long.
How many subroutines will a program with 5,000-10,000 instructions have? Between 63 to 250, or ~150 on average.
So to bring this back to Parnas, when your program does less than 150 things then you won’t see the benefits of using a well-modularized program, which is really what OO design principles is all about
But does this make sense?
Dunbar’s cognitive limits⌗
Lets go delve into the human brain.
In 1992, anthropologist Robin Dunbar was studying non-human primates, and noticed that the size of their social groups were correlated with the size of the neocortex. (The neocortex is the part of the mammalian brain that makes us great.) Juggling social interactions is a difficult task, and our mammalian brain has evolved quite a bit to handle it. We can, and need to, keep track of how each person feels about each other person in our circle. Keeping track of that information lets us protect ourselves, our children, and helps us get laid. It’s important.
Dunbar was studying non-humans, but with a bit of math and measurement he came up with the maximum social circle size for humans. 150 people. He looked around for proof, and found it everywhere, from farming villages to military groupings.
Today we even see it in businesses. In The Tipping Point, Malcom Gladwell talked about how W. L. Gore and Associates was so devoted to keeping their teams at 150 people that they would build a new building whenever a team got larger than that. (How Gladwell got from the thesis of his book to that particular topic is a journey I have totally forgotten.)
The 150-function Program Limit⌗
Using a complete lack of anthropological knowledge, I’m going to assert that 150 items is the number of functions you can have before you need better groupings.
At 150 functions, you can (with some difficulty) keep track of all the relationships and interactions in your head. And keeping things straight in your head is a 100% necessity to programming. It’s the whole point of “being in the zone”.
One you get larger than 150 functions, you need a better grouping. Not just splitting functions off into separate modules. That doesn’t actually solve the problem of preventing functions from interacting with other functions.
At 150 functions, you need proper design. These days that means Object Oriented Design Patterns, with private functions, interfaces (which is a way of simplifying relationships), and the whole nine yards.
The problem with learning Object Oriented design⌗
So here’s the matter of Object Oriented Design. You’re never working with a large enough sample program for OO Design to be worth the effort. I’ve always suspected it, but now I have a spot of proof to back up that opinion.
Learning Object Oriented Design Patterns is just a problem on a problem. In Head First Design Patterns. the average number of functions in each of the original problems is around 8. Once we’ve transformed these terrible, unfocused programs into easy-to-manage wonders with proper Design Patterns, the average number of functions is 16.
It gets worse before it gets better, and you never see the “better” in a toy problem.
Notes⌗
Want to see how many functions your own python scripts have? Run this (inconveniently-long) Unix one-liner!
for file in ` find . -type f -name "*.py" `; do printf "\t$file\n" `grep "def" $file | wc -l`; done