First Learn to use this. Then I'll teach you to use this.
In Braveheart, these lines were spoken by Argyle, William Wallace's Uncle. Young William Wallace had just lost his father and brother at the hands of the English. Understandably he wanted to do something about it. He picked up Argyle's heavy sword and tried to wield it. Argyle took it from him and said pointing to his head. "First learn to use this." Then pointing to the sword, he added. "Then I'll teach you to use this." This was important guidance for young William.
Today we would say, Think before you act. Walk before you run. Or even, Plan then Do. It was guidance that would have befit me in those early days at my first job.
On the day I started work, it was my third time in Derry. The first was my interview, the second was my medical, and now the third was the day I started work. I was apprehensive. I wondered had I oversold myself! Would I be able to match up? The inferiority complex that I had consigned to the back of my mind reared its head again. I was in the graduate recruitment program, so I knew I would be one of many under evaluation for opportunities. In this regard I compensated by adapting my personality to be a bit more brash that it actually was. I found myself being driven by the desire for results. I would write more programs faster than anybody else. And even though, nine times out of ten, maybe even ninety-nine times out of one hundred it worked, I was wrong to adopt this strategy. It had me acting before thinking. I was using my arms at the keyboard before I was using my head to fully think out and document what I would be typing. I was taking more risk than I needed to. I was a man in a hurry.
One example comes to mind. I had just finished writing and testing some code, and I was getting ready to deploy it into production. This was DEVOPS before it became fashionable. In those days we wrote, tested, and deployed into production. We even were our own system administrators. But I digress.
As I always did, before I deployed some code into production, I decided to back-up the source-code development disk. That required mounting the backup source-code platter into a removable disk-drive, spinning it up, initializing it, and copying from the source-code drive to the now blank destination drive. As a safety precaution, I would write-protect the source-code drive. I had done this many times before, so I was perhaps a bit more cavalier about it than I should have been. I pressed the write-protect button. Unfortunately, I pressed it on the wrong drive. I inadvertently protected the blank destination drive, the drive I was writing to, and not the one I was reading from. That would not have been a problem at all if the rest of my chore was without problem, but alas it wasn't.
To compound matters, I reversed the source-code disk and the blank destination disk in the copy command. So, I was copying from the newly initialized empty drive to the full drive. Had I write-protected the correct drive, i.e., the source-code drive, that would not have been a problem. I would just have gotten an error message that I actually would have been relieved to get. But having just made two errors, the result was catastrophic. I was actually overwriting the data I thought I was protecting. I wasn't protecting anything. I was copying an empty drive atop a full drive. As soon as I realized that I had reversed the parameters on the copy command and the write protect error did not come up, I ran to physically stop the disk copy in the computer room. It was too late. The transfer had started and the disk I wanted to protect was corrupt. My heart sank. This was the worst possible outcome. I was in a quandary. I knew I would have to step up and admit this to my boss, and to my peers. Afterall, not only did I destroy my own data, but I destroyed theirs too.
Fleetingly I thought about just covering it all up, but I knew that would come back to haunt me even worse, and result in my termination. As it was, I wasn't sure of my job security. But I had to come clean. I did. I came clean. My boss was surprisingly calm about it. Don't get me wrong, he wasn't happy, but it could have been worse. It could have been a production disk. He was looking on the bright side as much as he could. My peers were a bit more upset and understandably so. Afterall, I had just wiped out their code on the source-code disk. And there was no backup copy now either. I had just initialized it. Our only backup now was printed copies. Thankfully, we religiously printed out hard copies and retained them. There would be a hell of a lot of typing in our collective future! I know that was what was on their mind. There were potentially days, if not weeks, in front of us, retyping source-code from hardcopy. Compiling, testing, and deploying into production.
The deployment into production was necessary to ensure the running images were drawn from the newly typed-in online source-code. Everybody steeled themselves for this task. I carried the lion's share of the burden since it was my mistake, though as time went on, my colleagues picked up more of the slack than they said they would, and it all evened out in the wash, more than I expected.
We contacted the vendor of the disk technology to ask for their assistance. They were able to perform some forensic diagnostics on the corrupted disk and restore the header. That enabled the disk to be mounted, and an integrity check revealed that about eighty percent of the disk was intact. Only about twenty percent had to be recreated. That was a relief for me particularly, but also for everybody else. It meant that our work was only a fraction of what we thought it would be. We didn't celebrate but we did welcome the break that we had got.
Like a negative experience in life that impacted many people, I was gun-shy about physically handling and copying disks, asking others to do that for me. But thankfully my boss wasn't going to let me wallow in any self-pity. He forced me to get back up on that saddle by executing backups for everybody in our group. However, he was unambiguously clear in his expectations. Another mishap would mean curtains for me. In my mind I knew success meant a restoration of confidence, a confidence that I never took for granted, ever again. A lesson was learned in that experience that I remember to this day. One can generally be forgiven for making a mistake once but repeating that mistake. Uh-oh!
In the debrief with my boss about the incident, I was asked to explain what went wrong. I described the two mistakes that I made, write protecting the wrong disk-drive, and reversing the from/to on the copy command. He agreed that yes, that was the immediate cause, but then he asked me about the root cause. I had never heard about the root cause before. I had only ever heard about the cause of a problem. The distinction between immediate and root was new to me. But it was something I would never forget. The immediate cause may be what precipitates a particular problem but absent addressing the root cause, it could happen again, and in this case not just to me, but to anybody else. Addressing the root cause would ensure it never happened again.
Think about it this way. A man goes out in the morning and his car won't start. His battery is flat. He is late for work. Was his flat battery the immediate cause or the root cause of his being late for work? It was the immediate cause! To get to the root cause you have to ask, why was the battery flat? In this case it was flat because it didn't have sufficient distilled water to maintain its charge. Continuing the quest for the root cause you would next ask, why was there not enough distilled water in the battery to hold the battery charge? The answer in this example is because scheduled maintenance was not performed. By continuing to ask why until there are no more whys to ask, you get to the root cause. In the case of the flat battery the root cause was the absence of an effective preventative maintenance program. In the case of my mishap, it was the absence of a defined process for backing up development disks, with documented procedures for executing specific tasks within that process, and no clear roles and responsibilities for performing those tasks.
Once the root cause of any problem is understood, corrective actions can be identified and implemented to...