Project recovery or putting projects back on track is a multi-faceted, multi-phased effort. It involves taking a macro view of the project and examining specific problem areas to carry out root cause analysis and implement the most important corrective actions as determined through Pareto charts and weighted decision tables. Carrying out root cause analysis requires experiential insights to look for clues in common problem areas, closely examining the area further to identify the root cause. Another article explains the causes to look for in the requirements area and this article describes the common causes in other life cycle phases that contribute to project failure.
Common Causes Leading to Project Trouble
While there can be a number of reasons that can lead to trouble or even project failure, they can be classified into mainly 4 categories apart from the typical reasons directly related to the requirements phase. And these categories are:
- Not fixing the defects early
- Not developing a traceability matrix and using it throughout the project
- The chaos resulting from non-adherence to life cycle models
- Absence of project management decisions during the testing phase.
Each of these categories is explained in detail during the remainder of this article.
Not Fixing Defects Early
Defect fixing in a software development life cycle has an economic angle to it. Software development follows an expanding pyramid structure with one line of requirement translating into multiple functional features, which then translate into dozens of design features, which further translate into hundreds of test cases and hundreds of lines of code. Therefore, when a defect of an earlier phase has to be fixed in a later phase, not only does the phase in which the defect occurred gets impacted, but the whole set of features of the subsequent life cycle phases also gets impacted. In quantitative terms, if it takes 1 unit of effort to fix a defect in the phase in which the defect originated, it can take up to 10 units to fix the defect in the next phase and could take 100 to 1000 times during the end of the project as illustrated in the following diagram:
As noted in the diagram, the cost of fixing raises exponentially as the phases progress. More details on the economics of software quality can be found in Caper Jone’s book on the economics of software quality.
As a result of this nature of the economics of defect, when defect detection and the correction does not happen in a timely manner, defect accumulate towards the end phases, takes enormous effort (and hence cost) to fix them, and management and the customers don’t allow that level of leeway to enable the team to spend as much effort as is required to eliminate the root causes of defects which further worsens the situation.
As defects accumulate, the project gets into a vicious circle, where the project keeps losing time and effort (and hence money) in fixing the defects, but paradoxically management does not allow as much time and effort as is needed to eliminate defects from the root, and defect correction effort is spent in grudgingly allowed narrow windows of time and effort not leading to the elimination of defects. Over a period of time, the team would have spent a huge level of time and effort (and cost) but without the proportionate elimination of defects because the time and effort were always allocated in insufficient increments. The losses from the project spiral and projects inch towards being disastrous failures.
The long-term solution to this problem is to introduce phase-end reviews if they don’t exist and tighten the phase-end reviews if they exist in the process guidelines. Every phase-end review such as requirements review, functional specification review, design review, etc. should lead to effective detection of defects in that phase and correction also in that phase itself.
A short-term solution would be to convince the management about the potential impact of this spiraling effect, estimate the defect correction effort pragmatically at the point of project revival, and re-plan the project by allocating enough time for defect correction. This should provide the team with enough breathing space to effectively correct the accumulated defects and bring the project back on track.
Read previous posts on recovering troubled projects below:
Not Developing a Traceability Matrix and Using It
A traceability matrix is a widely known practice for ensuring that the later life cycle outputs actually conform to the requirements stated. In other words, the functional features should actually correspond to the requirements, the design should conform to functional features and the code should conform to the design. This conformance should be maintained explicitly in a traceability matrix and should be used in all phase-end reviews and in testing.
Not using a traceability matrix can lead to a sudden tsunami. That is, the project appears to be sailing smoothly, but all of a sudden when a stakeholder reviews the project he or they tends to say that this is not what I/we wanted. And the entire project goes for a tail-spin and enters into a crisis mode.
Developing a traceability matrix and using it throughout the project is the solution to this cause. The traceability matrix format provided in the PMBoK Guide is a good one that can be followed. A slightly simplified format for the traceability matrix is suggested below:
Requirement Id | Requirement Description | Functional features Id’s | Design Elements | Code | Test Cases | Remarks |
Req 001 | Feature 1Feature 2Feature 3 | Design section 1.1.2 Design section 2.3.4 | File names / function names | Test case ids | ||
Req 002 |
Table: A typical Format of a Traceability Matrix
Non-Adherence to Life Cycle Models:
Sometimes, software development teams lack a clear understanding of the implications of the chosen life cycle model. The last decade has witnessed the emergence of a number of life cycle models that are very similar in nature and the line differentiating these models can be very blurred. Some blur lines that can cause confusion and lead to customer dissatisfaction are listed below:
- Inability to differentiate between iterative and incremental life cycle models
- The difference between an ‘iteration’, an ‘increment’, and a ‘sprint’ of a scrum model
- Inability to choose the right life cycle model based on project circumstances
In one of the real-life projects, a ‘spiral’ life cycle model was chosen which was interpreted as an iterative model by the development team and interpreted as an incremental model by the customer. The team and the customer never sat down to iron out the nitty-gritty of the life cycle model and the communication gap remained. Towards the end of one ‘iteration’ or ‘increment’ when the team delivered the product, the customer was unhappy and objected to the level of quality of the product. The team, however, was quite confident that the quality level is consistent with the agreed-upon standards. Nevertheless, it is the customer’s opinion that matters, and the project entered crisis mode.
The trouble in this project arose because of an unclear understanding of the implication of ‘incremental’ and ‘iterative’ models. In an ‘incremental’ model, a part of the complete deliverable is fully developed at production quality standards (deployment quality standards). After this part is delivered, it will be turned over to operations and will not be developed again. However, in an ‘iterative’ model, a part or the whole system is delivered, just to provide a look and feel of the product to the potential users and is not expected to be at a production quality level. Deliverables of a first iteration are not turned over to operations (are not deployed in a production environment) and are developed or refined again in a second iteration (and sometimes in multiple iterations also) and then the deliverables of the final iteration are what will be of production quality and is turned over to the operations team.
There are several real-life incidents of project crisis that have arisen because of a lack of understanding of life cycle models.
The way to avoid or come out of such a crisis is to understand mainly two aspects of the life cycle model – the ETVX criteria and the acceptance criteria both of which are explained below:
1. ETVX Criteria:
The acronym ETVX stands for Entry, Task, Verification, and eXit. A life cycle model is distinguished from another primarily based on the ETVX criteria. The ETVX model defines the criteria for the Entry into a phase, the Tasks to be performed in the phase, the Verification mechanism used in the phase and the criteria to be met to exit from the phase. For instance, in a waterfall model, one of the Entry criteria to enter into the design phase is that the Functional specification document should have been approved. The tasks could be, for instance, design algorithms, design database models, design data structures, etc. Verification could be designed document inspection and exit criteria could be approval of the design document. It is this ETVX model that distinguishes an incremental model from an iterative model and a scrum model from both iterative and incremental models.
2. Acceptance Criteria:
While the verification and exit criteria of the ETVX model apply to each phase within the life cycle model, the Acceptance criteria define all the parameters at the project level listing the deliverables and the criteria for the deliverables for the customer to consider delivery is accepted. Any ambiguity in the acceptance criteria can lead to a crisis towards the fag end of the project. Well-defined acceptance criteria would prevent such circumstances.
Absence of Project Management Decisions During Testing:
Software testing has certain engineering characteristics that the project manager must understand and take suitable decisions based on the status of testing. Several projects get into crisis mode during the testing phase and in some worst cases, the testing phase takes 2-5 times more duration and effort than the rest of the project. In such projects, the entire gamut of life cycle activities gets carried out during the testing phase. And such projects certainly get into crisis mode. There are two typical causes that lead to the crisis during the testing phase – one is a lack of process for early detection and correction of defects and this has been discussed in detail in the earlier section. The second cause has to do with the lack of engineering perspective on the testing process. As a result, no project management decisions are taken to control the course of testing, and testing and fixing going on in their own free-flowing, uncontrolled course. Knowledge of two characteristics of testing as a process help in the second type of root cause:
- After a certain number of test-fix cycles, say, 2 to 5 rounds, if the residual defects (number of known defects remaining in the system) are unacceptable then, the project manager must stop the process of fixing and carry out a root cause analysis rather than going on with an endless test-fix cycle that would throw the project into crisis. The key characteristic of a testing process that demands this decision is that defect elimination does not follow a linear path as shown in diagram 1 below. Instead, it follows a non-linear pattern as shown in diagram 2 where for a few cycles, the number of defects starts reducing exponentially and thereafter, the process of fixing itself starts introducing more and more defects. Thus, beyond a few cycles, the project enters an unstable mode and just continuing the process of fixing cannot eliminate the remaining defects. Depending on a root cause analysis, some of the previous phase deliverables and some amount of recoding may be required to eliminate the residual defects.
- It is important that testing is carried out in the form of test-fix cycles rather than in a single, monolithic testing phase. Test planning must put an acceptable number of defects for each cycle with the number of acceptable defects reducing progressively in each cycle. The rationale for this planning is similar to the reasons explained in the previous paragraph. Quality cannot be just achieved by indefinite fixing and it has to be incorporated by quality assurance strategies throughout the life cycle. This is the reason that mission-critical systems such as flight control systems are developed in an entirely different way compared to say, desktop products. Defect allowance is extremely low in a flight control system compared to a desktop product. A higher level of quality is built into flight control system software, by way of the process and methodologies followed in carrying out the tasks; the methodologies used in the verification and validation activities at each stage of the project. Thus, the quality goal and quality assurance strategies chosen can provide enough visibility ahead and determine an acceptable number of defects for each cycle. And if the number of defects found is significantly higher than the planned level, the project manager must do a root-cause analysis rather than continuing with the test-fix cycle.
These two aspects of software testing can provide a degree of control to the project manager over the testing process.
Apart from the software requirements-related best practices, certain key best practices followed throughout the project life cycle can prevent the project from entering into a crisis mode. These best practices described above have been summarized below:
- Early detection and correction of defects
- Development and usage of traceability matrix
- Establishing clarity about the ETVX criteria of the life cycle model and clarity about acceptance criteria
- Carrying testing in multiple test-fix cycles with an allowable number of defects defined in the test plan and carrying out a root cause analysis if the number of defects in a cycle exceeds the planned level.