Abstract
We present an approach for overcoming computational errors at run time that originate from static hardware faults in digital processors. The approach is based on embedded machine-learning stages that learn and model the statistics of the computational outputs in the presence of errors, resulting in an error-aware model for embedded analysis. We demonstrate, in hardware, two systems for analyzing sensor data: 1) an EEG-based seizure detector and 2) an ECG-based cardiac arrhythmia detector. The systems use a small kernel of fault-free hardware (constituting <7.0% and <31% of the total areas respectively) to construct and apply the error-aware model. The systems construct their own error-aware models with minimal overhead through the use of an embedded active-learning framework. Via an field-programmable gate array implementation for hardware experiments, stuck-at faults are injected at controllable rates within synthesized gate-level netlists to permit characterization. The seizure detector demonstrates restored performance despite faults on 0.018% of the circuit nodes [causing bit error rates (BERs) up to 45%], and the arrhythmia detector demonstrates restored performance despite faults on 2.7% of the circuit nodes (causing BERs up to 50%).
Original language | English (US) |
---|---|
Article number | 6874569 |
Pages (from-to) | 1459-1470 |
Number of pages | 12 |
Journal | IEEE Transactions on Very Large Scale Integration (VLSI) Systems |
Volume | 23 |
Issue number | 8 |
DOIs | |
State | Published - Aug 1 2015 |
All Science Journal Classification (ASJC) codes
- Software
- Hardware and Architecture
- Electrical and Electronic Engineering
Keywords
- Embedded sensing
- fault tolerance
- hardware resiliency
- machine learning
- run-time error correction