Steps for failure analysis

To interpret a failure accurately, the analyst has to gather all pertinent facts and then decide what caused them. To be consistent, the analyst should develop and follow a logic path that ensures a critical feature will not be over looked. The following steps should be taken:

Decide what to do. How detailed an analysis is necessary? Before starting, try to decide how important the analysis is. If the failure is relatively insignificant, in cost and inconvenience, it deserves a cursory analysis; the more detailed steps can be ignored. But this strategy increases the chance of error. Some failures deserve a 20-minute analysis with an 80 percent probability of being correct, but critical failures require true root cause failure analysis (RCFA), in which no questions are left unanswered. RCFA may require hundreds of man-hours, but it guarantees an accurate answer.
Find out what happened. The most important step in solving a plant failure is to seek answers soon after it happened and talk to the people involved. Ask for their opinions, because they know the everyday occurrences at their worksite and their machinery better than anyone. Ask questions and try to get first person comments. Do not leave until you have a good understanding of exactly what happened and the sequence of events leading up to it.

Make a preliminary investigation. At the site, examine the broken parts, looking for clues. Do not clean them yet because cleaning could wash away vital information. Document the conditions accurately and take photographs from a variety of angles of both the failed parts and the surroundings.

Gather background data. What are the original design and the current operating conditions? While still at the site, determine the operating conditions; time, temperatures, amperage, voltage, load, humidity, pressure, lubricants, materials, operating procedures, shifts, corrosives, vibration, etc. Compare the difference between actual operating conditions and design conditions. Look at everything that could have an effect on machine operation.
Determine what failed. After you leave the site and the immediate crush of the failure, look at the initial evidence and decide what failed first—the primary failure—and what secondary failures resulted from it. Sometimes these decisions are very difficult because of the size of analysis that is necessary. Find out what changed. Compare current operating conditions with those in the past. Has surrounding equipment been altered or revised?

Examine and analyze the primary failure. Clean the component and look at it under low-power magnification, 5x to 50x. What does the failure face look like? From the failure face, determine the forces that were acting on the part. Were conditions consistent with the design? With actual operation? Are there other cracks or suspicious signs in the area of the failure? Important surfaces should be photographed and preserved for reference.
Characterize the failed piece and the support material. Perform hardness test, dye penetrant and ultrasonic examination, lubricant analysis, alloy analysis, etc. Examine the failed part and the components around it to understand what they are. Check to see if the results agree with design conditions.

Conduct detailed chemical and metallurgical analyses. Sophisticated chemical and metallurgical techniques may reveal clues to material weaknesses for minute quantities of chemical that may cause unusual fractures.
Determine the failure type and the forces that caused it. Review all the steps listed. Leaving any questions unasked or unanswered reduces the accuracy of the analysis.
Determine the root causes. Always ask, "Why did the failure happen in the first place?" this question usually leads to human factors and management systems. Typical root causes like "The shaft failed because of an engineering error" or "The shaft failed because it was not aligned properly" expose areas where huge advances can be realized. However, these problems have to be dealt with differently; people will have to recognize personal errors and to change the way they think and act.

Laboratory Equipment