Next Generation HALT and HASS

Name: Next Generation HALT and HASS | Robust Design of Electronics and Systems
Brand: Wiley
Price: 89.99 EUR
Availability: OnlineOnly

Robust Design of Electronics and Systems

Kirk A. Gray John J. Paschkewitz(Autor*in)

Wiley (Verlag)

Erschienen am 11. März 2016

296 Seiten

E-Book

ePUB mit Adobe-DRM

Systemvoraussetzungen

978-1-118-70021-1 (ISBN)

89,99 €inkl. 7% MwSt.

Systemvoraussetzungen

für ePUB mit Adobe-DRM

E-Book Einzellizenz

Als Download verfügbar

Beschreibung

Next Generation HALT and HASS presents a major paradigm shift from reliability prediction-based methods to discovery of electronic systems reliability risks. This is achieved by integrating highly accelerated life test (HALT) and highly accelerated stress screen (HASS) into a physics-of-failure-based robust product and process development methodology. The new methodologies challenge misleading and sometimes costly mis-application of probabilistic failure prediction methods (FPM) and provide a new deterministic map for reliability development. The authors clearly explain the new approach with a logical progression of problem statement and solutions.
The book helps engineers employ HALT and HASS by illustrating why the misleading assumptions used for FPM are invalid. Next, the application of HALT and HASS empirical discovery methods to quickly find unreliable elements in electronics systems gives readers practical insight to the techniques.
The physics of HALT and HASS methodologies are highlighted, illustrating how they uncover and isolate software failures due to hardware-software interactions in digital systems. The use of empirical operational stress limits for the development of future tools and reliability discriminators is described.
Key features:
* Provides a clear basis for moving from statistical reliability prediction models to practical methods of insuring and improving reliability.
* Challenges existing failure prediction methodologies by highlighting their limitations using real field data.
* Explains a practical approach to why and how HALT and HASS are applied to electronics and electromechanical systems.
* Presents opportunities to develop reliability test discriminators for prognostics using empirical stress limits.
* Guides engineers and managers on the benefits of the deterministic and more efficient methods of HALT and HASS.
* Integrates the empirical limit discovery methods of HALT and HASS into a physics of failure based robust product and process development process.

Weitere Details

Weitere Ausgaben

Personen

Inhalt

Series Editor's Foreword xi

Preface xiv

List of Acronyms xvi

Introduction 1

1 Basis and Limitations of Typical Current Reliability Methods and Metrics 5

1.1 The Life Cycle Bathtub Curve 7

1.1.1 Real Electronics Life Cycle Curves 9

1.2 HALT and HASS Approach 11

1.3 The Future of Electronics: Higher Density and Speed and Lower Power 13

1.3.1 There is a Drain in the Bathtub Curve 14

1.4 Use of MTBF as a Reliability Metric 16

1.5 MTBF: What is it Good For? 17

1.5.1 Introduction 17

1.5.2 Examples 18

1.5.3 Conclusion 24

1.5.4 Alternatives to MTBF for Specifying Reliability 25

1.6 Reliability of Systems is Complex 26

1.7 Reliability Testing 28

1.8 Traditional Reliability Development 33

Bibliography 34

2 The Need for Reliability Assurance Reference Metrics to Change 36

2.1 Wear-Out and Technology Obsolescence of Electronics 36

2.2 Semiconductor Life Limiting Mechanisms 37

2.2.1 Overly Optimistic and Misleading Estimates 42

2.3 Lack of Root Cause Field Unreliability Data 43

2.4 Predicting Reliability 48

2.5 Reliability Predictions - Continued Reliance on a Misleading Approach 50

2.5.1 Introduction 51

2.5.2 Prediction History 52

2.5.3 Technical Limitations 53

2.5.4 Keeping Handbooks Up-to-Date 54

2.5.5 Technical Studies - Past and Present 59

2.5.6 Reliability Assessment 62

2.5.7 Efforts to Improve Tools and Their Limitations 63

2.6 Stress-Strength Diagram and Electronics Capability 63

2.7 Testing to Discover Reliability Risks 68

2.8 Stress-Strength Normal Assumption 69

2.8.1 Notation 70

2.8.2 Three Cases 71

2.8.3 Two Normal Distributions 73

2.8.4 Probability of Failure Calculation 73

2.9 A Major Challenge - Distributions Data 73

2.10 HALT Maximizes the Design's Mean Strength 75

2.11 What Does the Term HALT Actually Mean? 78

Bibliography 83

3 Challenges to Advancing Electronics Reliability Engineering 86

3.1 Disclosure of Real Failure Data is Rare 86

3.2 Electronics Materials and Manufacturing Evolution 89

Bibliography 91

4 A New Deterministic Reliability Development Paradigm 92

4.1 Introduction 92

4.2 Understanding Customer Needs and Expectations 95

4.3 Anticipating Risks and Potential Failure Modes 98

4.4 Robust Design for Reliability 104

4.5 Diagnostic and Prognostic Considerations and Features 110

4.6 Knowledge Capture for Reuse 110

4.7 Accelerated Test to Failure to Find Empirical Design Limits 112

4.8 Design Confirmation Testing: Quantitative Accelerated Life Test 113

4.9 Limitations of Success Based Compliance Test 114

4.10 Production Validation Testing 115

4.11 Failure Analysis and Design Review Based on Test Results 116

Bibliography 120

5 Common Understanding of HALT Approach is Critical for Success 122

5.1 HALT - Now a Very Common Term 123

5.2 HALT - Change from Failure Prediction to Failure Discovery 124

5.2.1 Education on the HALT Paradigm 125

5.3 Serial Education of HALT May Increase Fear, Uncertainty and Doubt 130

5.3.1 While You Were Busy in the Lab 132

5.3.2 Product Launch Time - Too Late, But Now You May Get the Field Failure Data 132

6 The Fundamentals of HALT 134

6.1 Discovering System Stress Limits 134

6.2 HALT is a Simple Concept - Adaptation is the Challenge 135

6.3 Cost of Reliable vs Unreliable Design 136

6.4 HALT Stress Limits and Estimates of Failure Rates 137

6.4.1 What Level of Assembly Should HALT be

Applied? 137

6.4.2 HALT of Supplier Subsystems 138

6.5 Defining Operational Limit and Destruct Limits 138

6.6 Efficient Cooling and Heating in HALT 139

6.6.1 Stress Monitoring Instrumentation 139

6.6.2 Single and Combined Stresses 140

6.7 Applying HALT 142

6.7.1 Order of HALT Stress Application 143

6.8 Thermal HALT Process 144

6.8.1 Disabling Thermal Overstress Protection Circuits 145

6.8.2 HALT Limit Comparisons 146

6.8.3 Cold Thermal HALT 148

6.8.4 Hot Thermal HALT 150

6.8.5 Post Thermal HALT 151

6.9 Random Vibration HALT 152

6.10 Product Configurations for HALT 155

6.10.1 Other Configuration Considerations for HALT 156

6.11 Lessons Learned from HALT 157

6.12 Failure Analysis after HALT 159

7 Highly Accelerated Stress Screening (HASS) and Audits (HASA) 161

7.1 The Use of Stress Screening on Electronics 161

7.2 'Infant Mortality' Failures are Reliability Issues 163

7.2.1 HASS is a Production Insurance Process 164

7.3 Developing a HASS 167

7.3.1 Precipitation and Detection Screens 168

7.3.2 Stresses Applied in HASS 172

7.3.3 Verification of HASS Safety for Defect Free Products 173

7.3.4 Applying the SOS to Validate the HASS Process 174

7.3.5 HASS and Field Life 177

7.4 Unique Pneumatic Multi-axis RS Vibration Characteristics 177

7.5 HALT and HASS Case History 179

7.5.1 Background 179

7.5.2 HALT 180

7.5.3 HASS (HASA) 181

7.5.4 Cost avoidance 183

Bibliography 184

7.6 Benefits of HALT and HASS with Prognostics and Health Management (PHM) 184

7.6.1 Stress Testing for Diagnosis and Prognosis 185

7.6.2 HALT, HASS and Relevance to PHM 186

Bibliography 189

8 HALT Benefits for Software/Firmware Performance and Reliability 190

8.1 Software - Hardware Interactions and Operational Reliability 190

8.1.1 Digital Signal Quality and Reliability 193

8.1.2 Temperature and Signal Propagation 194

8.1.3 Temperature Operational Limits and Destruct Limits in Digital Systems 197

8.2 Stimulation of Systematic Parametric Variations 198

8.2.1 Parametric Failures of ICs 199

8.2.2 Stimulation of Systematic Parametric Variations 201

Bibliography 205

9 Design Confirmation Test: Quantitative Accelerated Life Test (ALT) 207

9.1 Introduction to Accelerated Life Test 207

9.2 Accelerated Degradation Testing 211

9.3 Accelerated Life Test Planning 212

9.4 Pitfalls of Accelerated Life Testing 215

9.5 Analysis Considerations 216

Bibliography 217

10 Failure Analysis and Corrective Action 218

10.1 Failure Analysis and Knowledge Capture 218

10.2 Review of Test Results and Failure Analysis 220

10.3 Capture Test and Failure Analysis Results for Access on Follow-on Projects 221

10.4 Analyzing Production and Field Return Failures 222

Bibliography 222

11 Additional Applications of HALT Methods 223

11.1 Future of Reliability Engineering and HALT Methodology 223

11.2 Winning the Hearts and Minds of the HALT Skeptics 225

11.2.1 Analysis of Field Failures 225

11.3 Test of No Fault Found Units 226

11.4 HALT for Reliable Supplier Selection 226

11.5 Comparisons of Stress Limits for Reliability Assessments 228

11.6 Multiple Stress Limit Boundary Maps 230

11.7 Robustness Indicator Figures 235

11.8 Focusing on Deterministic Weakness Discovery Will Lead to New Tools 235

11.9 Application of Limit Tests, AST and HALT Methodology to Products Other Than Electronics 236

Bibliography 238

Appendix: HALT and Reliability Case Histories 239

A.1 HALT Program at Space Systems Loral 240

A.2 Software Fault Isolation Using HALT and HASS 243

A.3 Watlow HALT and HASS Application 253

A.4 HALT and HASS Application in Electric Motor Control Electronics 256

A.5 A HALT to HASS Case Study - Power Conversion Systems 261

Index 268

1
Basis and Limitations of Typical Current Reliability Methods and Metrics

Reliability cannot be achieved by adhering to detailed specifications. Reliability cannot be achieved by formula or by analysis. Some of these may help to some extent, but there is only one road to reliability. Build it, test it and fix the things that go wrong. Repeat the process until the desired reliability is achieved. It is a feedback process and there is no other way.

David Packard, 1972

In the field of electronics reliability, it is still very much a Dilbert world as we see in the comic from Scott Adams, Figure 1.1. Reliability Engineers are still making reliability predictions based on dubious assumptions about the future and management not really caring if they are valid. Management just needs a 'number' for reliability, regardless of the fact it may have no basis in reality.

Figure 1.1 Dilbert, management and reliability.

The classical definition of reliability is the probability that a component, subassembly, instrument, or system will perform its specified function for a specified period of time under specified environmental and use conditions. In the history of electronics reliability engineering, a central activity and deliverable from reliability engineers has been to make reliability predictions that provide a quantification of the lifetime of an electronics system.

Even though the assumptions of causes of unreliability used to make reliability predictions have not been shown to be based on data from common causes of field failures, and there has been no data showing a correlation to field failure rates, it still continues for many electronics systems companies due to the sheer momentum of decades of belief. Many traditional reliability engineers argue that even though they do not provide an accurate prediction of life, they can be used for comparisons of alternative designs. Unfortunately, prediction models that are not based on valid causes of field failures, or valid models, cannot provide valid comparisons of reliability predictions.

Of course there is a value if predictions, valid or invalid, are required to retain one's employment as a reliability engineer, but the benefit for continued employment pales in comparison to the potential misleading assumptions that may result in forcing invalid design changes that may result in higher field failures and warranty costs.

For most electronics systems the specific environments and use conditions are widely distributed. It is very difficult if not impossible to know specific values and distributions of the environmental conditions and use conditions that future electronics systems will be subjected to. Compounding the challenge of not knowing the distribution of stresses in the end - use environments is that the numbers of potential physical interactions and the strength or weaknesses of potential failure mechanisms in systems of hundreds or thousands of components is phenomenologically complex.

Tracing back to the first electronics prediction guide, we find the RCA release of TR-ll00 titled Reliability Stress Analysis for Electronic Equipment, in 1956, which presented models for computing rates of component failures. It was the first of the electronics prediction 'cookbooks' that became formalized with the publishing of reliability handbook MIL-HDBK-217A and continued to 1991, with the last version MIL-HDBK-217F released in December of that year. It was formally removed as a government reference document in 1995.

1.1 The Life Cycle Bathtub Curve

A classic diagram used to show the life cycle of electronics devices is the life cycle bathtub curve. The bathtub curve is a graph of time versus the number of units failing.

Just as medical science has done much to extend our lives in the past century, electronic components and assemblies have also had a significant increase in expected life since the beginning of electronics when vacuum tube technologies were used. Vacuum tubes had inherent wear-out failure modes, such as filaments burning out and vacuum seal leakage, that were a significant limiting factor in the life of an electronics system.

Figure 1.2 The life cycle bathtub curve

The life cycle bathtub curve, which is modeled after human life cycle death rates and is shown in Figure 1.2., is actually a combination of two curves. The first curve is the initial declining failure rate, traditionally referred to as the period of 'infant mortality', and the second curve is the increasing failure rates from wear-out failures. The intersection of the two curves is a more or less flat area of the curve, which may appear to be a constant failure rate region. It is actually very rare that electronics components fail at a constant rate, and so the 'flat' portion of the curve is not really flat but instead a low rate of failure with some peaks and valleys due to variations in use and manufacturing quality.

The electronics life cycle bathtub curve was derived from human the life cycle curves and may have been more relevant back in the day of vacuum tube electronics systems. In human life cycles we have a high rate of death due to the risks of birth and the fragility of life during human infancy. As we age, the rates of death decline to a steady state level until we age and our bodies start to fail. Human infant mortality is defined as the number of deaths in the first year of life. Infant mortality in electronics has been the term used for the failures that occur after shipping or in the first months or first year of use.

The term 'infant mortality' applied to the life of electronics is a misnomer. The vast majority of human infant mortality occurs in poorer third world countries, and the main cause is dehydration from diarrhea, which is a preventable disease. There are many other factors that contribute to the rate of infant deaths, such as limit access to health services, education of the mother and access to clean drinking water. The lack of healthcare facilities or skilled health workers is also a contributing factor.

An electronic component or system is not weaker when fabricated; instead, if manufactured correctly, components have the highest inherent life and strength when manufactured, then they decline in strength, or total fatigue life during use.

The term 'infant mortality', which is used to describe failures of electronics or systems that occurs in the early part of the use life cycle, seems to imply that the failure of some devices and systems is intrinsic to the manufacturing process and should be expected. Many traditional reliability engineers dismiss these early life failures, or 'infant mortality' failures as due to 'quality control' and therefore do not see them as the responsibility of the reliability engineering department. Manufacturing quality variations are likely to be the largest cause of early life failures, especially far designs with narrow environmental stress capabilities that could be found in HALT. But it makes little difference to the customer or end-user, they lose use of the product, and the company whose name is on it is ultimately to blame.

So why use the dismissive term infant mortality to describe failures from latent defects in electronics as if they were intrinsic to manufacturing? The time period that is used to define the region of infant mortality in electronics is arbitrary. It could be the first 30 days or the first 18 months or longer. Since the vast majority of latent (hidden) defects are from unintentional process excursions or misapplications, and since they are not controlled, they are likely to have a wide distribution of times to failure. Many times the same failure mechanism in which the weakest distributions may occur within 30 to 90 days will continue for the stronger latent defects to contribute to the failure rate throughout the entire period of use before technological obsolescence.

1.1.1 Real Electronics Life Cycle Curves

Of course the life cycle bathtub curves are represented as idealistic and simplistic smooth curves. In reality, monitoring the field reliability would result in a dynamically changing curve with many variations in the failure rates for each type of electronics system over time as shown in Figure 1.3. As failing units are removed from the population, the remaining field population failure rate decreases and may appear to reach a low steady state or appear as a constant or steady state failure rate in a large population.

Figure 1.3 Realistic field life cycle bathtub curve

In the real tracking of failure rates, the peaks and valleys of the curve extend to the wear-out portion of the life cycle curve. For most electronics, the wear-out portion of the curve extends well beyond technological obsolescence and will be never actually significantly contribute to unreliability of the product.

Without detailed root cause analysis of failures that make up the peaks of the middle portion of the bathtub curve, or what is termed the useful life period, any increase in failure rates can be mistaken as the intrinsic wear-out phase of a system's life cycle. It may be discovered in failure analysis that what at first appears to be an wear out mode in a component, is actually due to it being overstressed from a...

Inhalt (EPUB)

Systemvoraussetzungen

Als PDF speichern Als Link merken