Saturday, January 5, 2008

Why Software Quality Sucks

It was an important customer demo at a big industry trade show. We had worked franticly all through the previous night to get the systems setup for this demo. The demo began very well but halfway through, the program crashed. It was a disaster! Since the demo was for a large and important potential customer, there were a whole bunch of my company suits (non-techie executives) present and they were clearly upset. Apparently they had seen this all too often and couldn't understand why we developers were so incapable of producing bug-free software, and why we should be paid so much money for the junk we produce! At the hospitality suite that night, after everyone was happily buzzed and relaxed, I accosted some of these suits to try and explain to them what had happened and provide excuses for the failed demo. After the usual excuses about demo'ing with the latest unstable dev build, lack of sufficient resources to code, this other team not delivering their fixes on time, bad hardware etc. (all of which they had probably heard many times before), the conversation drifted on to a more generic discussion on software and bugs. I found myself trying to impress upon them the difficulty of building bug-free software. Now this was easy and I felt confident I could convince them because I had Bill Gates on my side. That's right, Bill Gates of Microsoft. It so happened that around the same time as this trade show, Bill Gates was demo'ing Windows 98 at COMDEX, introducing it to the whole world for the first time. And it crashed and burned in the middle of the demo! An embarrassed Gates mumbled something about needing to iron out some more bugs. I mentioned this incident to my suits. Some of them had read about it in the papers. I told them Microsoft had 2000 very bright engineers working on Windows 98 and asked rhetorically why they still couldn't guarantee bug-free code. The software we were building and demo'ing was incredibly complex large-scale, real-time, network monitoring and control systems with hardware interfaces to receive 100s of thousands of data points, and heavy-duty number crunching in mission-critical, high-availability systems. I reminded them that we had just 3 engineers working on the portion of the demo that crashed. The point was that if Microsoft, with its huge army of the world's best developers, cannot produce bug-free software, how could we, with far fewer resources to build equally complex systems, produce anything better? The suits seemed to see the point. By the end of the night I had totally convinced them that software is, by nature, very complex and it was virtually impossible to build systems of better quality than Windows. One had to just live with the fact that software systems will malfunction and crash periodically. Thanks Bill!

That was 10 years ago. Windows has come a long way since those days and we now see less of the infamous "blue screen of death". Software in general has matured more in the last decade. However, we are nowhere close to the level of maturity in commercial software where high quality is taken for granted. Any enterprise-scale software is fraught with bugs requiring continuous application of hot-fixes, patches, upgrades etc. Ask any IT professional in any company about the quality of software he supports and invariably you will hear complaints. Complaints are likely to be more vociferous when it comes to business applications software.

So why does software quality in general and business applications software in particular suck so badly? Why is it that software is not as reliable as (say) bridges? Some would argue that the software industry is still very new. After all, they argue, we've been building software for less than 50 years while we've been building bridges for 1000s of years. Others would argue that software is much more complex than building bridges or cars, and as complexity increases, so does the probability of defects. There are some who would point to lack of standardization in the way certain common software elements are built resulting in defects when trying to integrate elements into one application. Then there is the camp that feels that a significant number of software engineers lack the proper training and skills and that the software industry is probably the only industry where we don't make a distinction between technicians and engineers (think of electrician versus electrical engineer, plumber versus civil engineer etc.) - in software, anyone who works with code is a software engineer and problems arise when technicians try to do the job of engineers.

While all of the above reasons may be valid, I believe the fundamental reason for poor software quality is the way we build it. This is more true for applications software than systems software. In spite of all the advances in languages, IDEs, design patterns, open-source libraries, runtimes and standards, building large applications software remains horrendously tedious, labor intensive and error prone. I think we need a radical shift in the fundamental mechanics of writing code in a programming language. We need a new vocabulary for application developers to state their business logic rather than code it. Modeling tools have attempted to address some basic part of this but have been very code and programming oriented e.g. thinking in terms of classes in UML. Model Driven Architectures (MDA) never lived up to the hype and simply provide a language to describe the high-level problem without providing a real-life solution.

I have no idea what this new vocabulary and mechanics for building software would look like in its final form (although I have a name for it - Yeti), but it seems to me intuitively that it will have the following characteristics:
  • Rules would figure in it prominently, and application building will primarily involve describing business rules
  • Application logic would be expressed in a natural language syntax
  • Application builders would not be programming (as we know it) in languages such as Java, C++ or JavaScript
  • Applications would be able to process incomplete information employing fuzzy logic, and learning

Maybe I'll never see Yeti in my lifetime. Maybe Yeti simple cannot exist given the basic von Neumann model for computers.
Or maybe not.

No comments: