We don’t do a lot of it, but every now and again we get asked to play engineering detective, to try to figure out why a product is broken. Typically our customers in this case are not the manufacturer, rather an OEM who is buying a product or subsystem and integrating it with other hardware or software and reselling. Sometimes this also is needed by OEMs who are in divorce proceedings with their China ODM. And sometimes it is true failure analysis for the customer who designed the product (or had it designed).
Typically the starting point is field data, and typically that data is not good. Products failing, or performing marginally, in numbers high enough to warrant attention. In many cases the product or subassembly in question is part of a larger system, making it difficult to discern what failure modes are due to what part of the system. Overlaying this is the general mega-challenge of forensic engineering, which is sorting out what is cause and what is effect. Did the wear on a gear cause the system to fail, or did the wear occur because something else in the system caused it to wear prematurely?
One of the main challenges of this type of forensic engineering is trying to reproduce failure modes, often intermittent, often requiring patience. Even a product with a mean time to failure (MTTF) of say 1 year, a bad number in most markets, would mean that after testing 10 samples for 12 months only 5 will have failed (once). Hence patience, but finding acceleration factors or ways to artificially induce failure is also critical.
Another challenge is working without full documentation. No drawings to see what a measurement should be, no schematics or source code to understand the circuit functionality. For this reason, forensic engineering is very closely related to reverse engineering, which we also do. Oftentimes doing a full teardown (which we also do a lot of) is one of the first steps toward understanding the product. We even may try to recreate basic documentation: block diagrams, BOM, etc.—and in extreme cases do full schematic level reverse engineering and / or 3D scans of the mechanicals. In these cases not to be able to reproduce the design but to be able to understand what is going on that might be precipitating the failure modes observed.
It’s also dang hard to pin such work down onto a timeline. Kinda like asking a detective to provide her schedule as to when she will solve the crime. Actually more than just kinda like that…
We’re currently working on such a challenge, trying to identify root cause of a number of different failure modes on a complex electro-mechanical product. No details, for this is a very sensitive project with a lot of money at stake, not to mention the potential for a lot of lawyers. We are looking at many aspects of the product—gears, motors, control circuits, firmware, consumables—and many subsystems. We’ve found several red herrings already, symptoms masquerading as problems, but we’ve also identified a few contributing factors that can and are being fixed. Not done yet, not even close, and its equal parts fun and frustration, working from installation documents instead of drawings, but as an old boss of mine once said, “Chuck, if it wasn’t difficult we wouldn’t need guys like you.”