When there are failures at a small level, like a deployment goes wrong, there’s a meeting, and a blameless post-mortem written is shared publicly. Normally this happens quickly, while everyone’s memories are still fresh.
When entires projects and movements fail, the opposite happens. There are no public post-mortems, and no meetings. A couple people leave the company, some blame is privately assigned, and the rumor mill goes into overdrive. At this level, a failure can mean a derailed career.
The company I work for has been very quietly abandoning some projects that were “architectural digressions”. At best this process ius filled with fun euphemisms, like “sunsetting.” At its worst moments no one writes anything down and information is distributed via private conversations.
I hate this.
The most important lessons come from the largest failures, and trying to glean best practices from half-heard third-party conversations is the worst way to get better at being an architect. Human beings of all stripes are desperate to learn, and if proper and educated lessons aren’t offered, we all do whatever we can to fill it in.
I wish there were more post-mortems on large failures.
C.A.R. Hoare learns⌗
In the ACM Turing Award paper, “The Emperor’s Old Clothes”, C.A.R. Hoare talks about a failure he experienced as an architect. This one fact instantly makes it my favorite Turning Award talk.
C.A.R. Hoare was an “Assistant Chief Engineer,” which today would be a Director or Architect, I think. He was responsible for building operating system software for the “Elliot 503 Mark II” computer. The project had 15 programmers and a ship date of 18 months. That date slipped by three months, and then another three months. After two years they finally produced software that worked 500x slower than it should. It was a failure.
Hoare says of this moment:
So I still could not see how I had brought such a great misfortune upon my company. At the time I was convinced that my managers were planning to dismiss me.
But no, they were intending a far more severe punishment. “O.K. Tony,” they said. “You got us into this mess and now you’re going to get us out.” “But I don’t know how,” I protested, but their reply was simple. “Well then, you’ll have to find out.” They even expressed confidence that I could do so. I did not share their confidence. I was tempted to resign. It was the luckiest of all my lucky escapes that I did not.
Hoare goes on to talk about some root lessons that he had to learn. A year later, Hoare and his team had turned around the whole project, and soon had happy customers. They got to success by sticking their head in the sand and not talking about their past failures.
Wait… that’s not right.
No, they got to success by talking about all their and their customer’s grievances and coming up with a plan.
But what should we actually plan to do when we knew only one thing - that all our previous plans had failed? I therefore called an all-day meeting of our senior programmers on October 22, 1965, to thrash out the question between us.
Dan McKinley learns⌗
Dan McKinley talks about his own failures in Choose Boring Technology.
At one point I tried building Scala services that talked to MongoDB. I thought this would result in better infrastructure, solve all of my productivity problems, and would make me happy. But no part of that turned out to be the case.
The embarrassing wreckage of this period is still on the internet, you can go find it and make fun of me. And Etsy employees still give me shit about it, with good cause.
McKinley’s “boring technology” talk is about how he learned a hard lesson on architecture choices. Read it if you want to know specifics, but I love that he started with the failure that prompted him to do better.
Post-mortem of Post-mortems⌗
Not all lessons are learned. Later in C.A.R. Hoare’s ACM Turing Award paper he writes about the difficulties around Algol 68. In the words of Wikipedia, “The effort took five years, burned out many of the greatest names in computer science…” Two of the “great names” burned out were C.A.R. Hoare and Edsger Dijkstra. They were both disillusioned again during the ADA language design process, which Dijkstra summed up as, “Technical incompetence, probably enhanced by dishonesty.”
As a final note, I was sent down this path from a quote from C.A.R. Hoare in Joe Duffy’s “A few thoughts on the role of software architects”.