algorithm-based fault tolerance, weighted checksum method, linear algebra algorithms
The modified weighted checksum method is proposed, which can be used for deriving fault tolerant versions of most linear algebra algorithms. The purpose is the detection and correction of calculation errors occurred due to transient hardware faults during algorithm execution. Using the proposed method, the fault-tolerant versions of Jordan-Gauss and Faddeeva algorithms are designed. The computational complexity of new algorithms is increased approximately on O(N2) multiply-add operations in comparison with the original algorithms. However, new algorithms enable to detect and to correct a single error in an arbitrary row or column of input data matrices at the each algorithm step. Hence, it is possible to correct up to N2 and (N2/2 + N • P) single errors during realization of whole Jordan-Gauss and Faddeeva algorithms respectively. Finally, the results of experimental verification of the proposed algorithms are represented.