PSU Neutron Beam Group (NBG)

 

Soft Error Analysis Toolset (SEAT) Development

Participants: V. Narayanan, Professor of Computer Science and Engineering
  M. J. Irwin, Professor of Computer Science and Engineering
  Kenan Ünlü, Professor of Mechanical and Nuclear Engineering
  Y. Xie, Professor of Computer Science and Engineering
  S. Çetiner, Ph.D. Student of Mechanical and Nuclear Engineering
  V. Degalahal, Ph.D. Student of Computer Science and Engineering
  F. Alim, Ph.D. Student of Mechanical and Nuclear Engineering
   
Services Provided: Neutron Beam Laboratory
   
Sponsors: National Science Foundation
  Radiation Science and Engineering Center
  Department of Computer Science and Engineering

 

Introduction

Soft errors, or single event effects (SEE), are transient circuit errors caused due to excess charge carriers induced primarily by external radiation. Radiation, directly or indirectly, may induce localized ionization that can flip the internal values of the memory cells. The major radiation source that causes this temporary malfunction in semiconductor devices is the cosmic rays.

Figure 1 . A 65-nm DRAM (left) and a schematic that conceptualizes the soft error phenomenon (right): Electron-hole pairs created through ionization by radiation might get drawn to node terminals before they recombine in the substrate causing a transient glitch in the device node. This temporary pulse might flip the internal state of the memory bit.


Figure 2. Integrated Circuit (IC) are becoming a major component of modern societies

Cosmic ray particles have the ability to either toggle the state of memory elements or create unwanted glitches in combinational logic that may be latched by memory elements. As supply voltages reduce and feature sizes become smaller in future technologies, soft error tolerance is considered a significant challenge for designing future electronic systems. For example, a 1 GB memory system based on 64Mbit DRAMs has a combined error rate of 3435 FIT (failure in 10 9 hours of operation) when using single error correction and double error detection. An even higher soft error rate of 4000 FIT was reported for a typical processor with approximately half of the errors affecting the processor core and the rest affecting the cache. Such errors also affect the fast growing FPGA (Field Programmable Gate Array) segment.

As earth’s atmosphere shields most cosmic ray particles from reaching the ground and charge per circuit node used to be large, SEE on terrestrial devices has not been important until recently. The galactic flux of primary cosmic rays (mainly consisting of protons) is very large, about 100,000 particles/m 2 s as compared to the much lower final flux (mainly consisting of neutrons) at sea level of about 360 particles/m 2 s [1]. Only few of the galactic particles have adequate energy to penetrate the earth’s atmosphere. However, with continued scaling of feature sizes and the use of more complex systems, soft errors in terrestrial applications are becoming an increasing concern and have drawn attention since late 1990s.

The issue of SEE was first studied in the context of scaling trends of microelectronics in 1962 [2]. Interestingly, the forecast from this study that the lower limit on supply voltage reduction will be imposed by SEE is shared by a recent work from researchers at Intel [3]. However, most works on radiation effects, since the work in 1962, focused on space applications rather than terrestrial applications.

There have been various documented failures due to soft errors ranging from memories used in large servers and aircrafts to implantable medical devices like cardiac defibrillators [4]. A widely cited soft error episode involves L2 caches with no error correction or protection that caused Sun Microsystems’ flagship servers to crash suddenly and mysteriously [5]. This problem resulted in loss of various customers for Sun Microsystems. More ominous than this failure can be errors in embedded devices such as cardiac defibrillators that are becoming an integral part of our society. As computing systems develop into indispensable part of various critical applications ranging from medical implants to fly-by-wire aircrafts, immunity against soft errors becomes more critical for the society as a whole.

The importance of dealing with the soft error problem can be evidenced by the large number of papers and articles that flooded the scientific community over the last decades. However, most researchers are impeded by access to realistic fault models and real soft error data. This limitation results from confidentiality of soft error data of chips tested by semiconductor companies and the limited access to accelerated soft error testing facilities for academics. Most commercial soft error testing in U.S.A. is performed at the Los Alamos test facility, access to which is expensive and cumbersome due to security clearances required.

 

The Impetus Behind The SEAT

Radiation-induced SEE may seem to be easily solved through techniques such as radiation-hardened processing. These kinds of countermeasures have been traditionally and successfully adopted to remedy radiation effects in space applications. However, they are not suitable for commercial manufacturers of terrestrial devices as many of the solutions consume more power, reduce manufacturability and severely influence IC performance [6]. Even space applications are moving away from the use of radiation hardened process technology. They are using commercial off-the-shelf components that employ soft error protection techniques at software and architecture level for cost and performance reasons. As a result, many researchers have been focusing on employing new soft error countermeasures ranging from process to software levels.

Advances in process technology such as adoption of silicon-on-insulator (SOI), elimination of boron-10 impurities are expected to mitigate the soft error problem to a certain extent. However, solutions at higher levels are still essential for reliable operation of the computing system. The lack of fault models that abstract the physical phenomena of soft errors accurately in a fashion that is accessible to computer engineers and the absence of tools that analyze the effectiveness of soft error countermeasures are affecting researchers in their quest for taming the soft error problem.

There is an obvious need for a community resource for researchers and industrial practitioners studying radiation-induced SEE on computing systems. Existing tools either do not address the problem in full extent or they are kept confidential by the sole proprietorship of commercial entities, and therefore are not available to the research community. The SEAT will serve a critical purpose in providing researchers of electrical, computer, information sciences or nuclear origin with an open, modular, flexible yet a comprehensive tool.

The SEAT has emerged as a complementary tool to furnish theoretical foundation to experimental radiation-induced soft error research at Penn State Breazeale Nuclear Reactor by Mechanical and Nuclear Engineering, and Computer Science and Engineering Departments. More details can be found in [7] in this annual report. The experiments performed are compiled into an “accelerated soft error testing dataset”. The researchers are then able to seek to duplicate these observations by the SEAT or vice versa.

The strength of the SEAT is the fact that it is built upon the combined expertise of computer and nuclear engineers. The SEAT hierarchy starts with modeling the ionization effects of particle strikes on semiconductor devices, and then creates higher-level abstractions of these effects for analysis at the circuit and architecture level. This infrastructure will enable researchers working on circuit, architectural and software countermeasures for soft errors to obtain a better perspective of the physical phenomena, and help them tune their techniques accordingly. If the fault model used at architecture or circuit-level fails to model the SEE accurately, the underlying value of solutions proposed at higher abstractions become meaningless. In this report, we will present details of SEAT-DA, the device level abstraction of the toolset.

 

SEAT-DA Tool

Soft error induced transient pulse generation is dependent on exact charge deposited by the neutron-Si interaction and its subsequent collection. SEAT-DA is a tool flow built on top of three different tools as shown in Figure 3. It models both charge deposition and charge collection as described in the following subsections.

Figure 3. SEAT-DA Simulation Tool Flow

 

Charge deposition by neutron induced soft errors

To study n-Si interactions, we use the Monte Carlo N Particle (MCNP) toolset. Input to MCNP includes a model of silicon substrate and the description of the neutron flux. MCNP can be made to run with the right reaction codes and neutron data files to model various reactions. This feature is particularly useful as the neutron flux is dependent on the location and altitude, we may setup MCNP with the exact distribution of neutron flux at a given place to calculate the exact n-Si interaction. We have also created customized scripts that parse the MCNP output to identify the different reactions and their outputs. MCNP is used for studying neutron, photon, electron, or coupled neutron/photon/electron transport. This tool has been traditionally used in nuclear engineering for applications such as, reactor designs, radionuclide based imaging, and others. Neutron-Si reactions can be classified into two main groups: elastic and inelastic. MCNP can model both elastic and inelastic scattering. Elastic scattering, due to the low mass of the neutrons, does not produce significant ionizations. In contrast, inelastic reactions occur when the neutron enters the nucleus and the unstable nucleus disintegrates to smaller particles. Many reactions are possible and various particles may be emitted (Please see Equation 1; we will refer to these reaction by the numbers given below).

n + 28 Si à p + 28 Al à 1

à n + a + 24 Mg à 2

à n + p + 27 Al à 3

à a + 25 Mg à 4 (1)

à 3 He + 26 Mg à 5

à 2 a + 21 Ne à 6

à ? + 29 Si à 7

à n + 29 Si à 8

à etc.

Once the different reactions products are obtained, we use Transport of Ions in Matter (TRIM) simulator to calculate the charge deposited by these ions. Interfacing MCNP and TRIM together enables an accurate analysis of the charge creation. TRIM is used to calculate the stopping power of ions. TRIM identifies the range of these ions and the charge these ions are capable of depositing. Once the ion distribution resulting from a particle strike is known, its range and charge generation rate is calculated using TRIM. This generation rate is fed to a 3D device simulator to calculate the charge collected in a given region of the device. Among the above set of possible reactions, inelastic scattering produce byproducts that are heavier than the original neutrons, hence they deposit more charge as they travel in silicon. In terms of the susceptibility, transient pulse caused by the inelastic scattering is of higher magnitude than elastic scattering errors. For this reason, it can cause errors on even nodes with large capacitance, or alternatively will not be easily attenuated by the electrical and latching window masking effects. However, it should be noted that these occur fewer in numbers in comparison to the elastic scattering. However, in this work we just present the results from for inelastic reactions, as we believe these are the upper bound worst case scenarios that require to be addressed to ensure a reliable circuit operation. A circuit designed for these conditions will be immune to errors due the elastic scattering.

 

Charge collection

After the reaction products of n-Si interactions deposit charge, this charge may either recombine or get collected on the device terminal to generate current. For modeling charge collection we use Synopsys TCAD Davinci 3D device simulator. Davinci uses the physical model and equation interface (PMEI) to perform simulations that incorporate user-defined physical models and equations. The input to the 3D simulator includes the device structure, device parameters and device level equations. The charge may be collected in the device terminals by either drift or diffusion processes.

In the case where the ion track is sufficiently far from the space charge zone of the drain junction, the carriers generated in the track mainly move by diffusion. However, for charge collection, the most sensitive regions are reverse biased p/n junctions of the transistor. The high field present in a reverse-biased junction depletion region can collect the charge generated by the ion tracks through drift processes, leading to a transient current at the junction. An important phenomenon associated with the charge collection is called field funnel. Charge generated along the ion track can locally collapse the junction electric field due to the highly conductive nature of the charge track and separation of charge by the depletion region. Figure 8 shows the field in a device after the field has collapsed. The funneling effect can increase charge collection at the struck node by extending the junction electric field away from the junction and deep into the substrate such that charge deposited some distance from the junction can be collected through the efficient drift process.

In deep-sub-micron technology, another phenomenon termed as alpha-particle source-drain penetration effect (ALPEN) also contributes to the phenomenon of charge collection. Due to ALPEN, if a particle strike passes through both the source and the drain at near-grazing incidence, a significant but short-lived source-drain conduction current that mimics the “on” state of the transistor, is generated. However, in sub-100nm devices, when electron-hole pairs are generated there is a high probability that such a generation spans a region greater than the gate length. Hence, we will expand the definition of ALPEN to include these effects. In addition, we will refer to the processes of funneling and ALPEN as drift processes.

The simulator was setup to use the physical models that include standard drift-diffusion laws and classical physical models. These models include: Carrier-carrier scattering mobility model (CCSMOB), to account for the large carrier concentrations present in the charge column. CCSMOB also includes effects of doping and temperature on mobility. Field-dependent mobility model (FLDMOB), to account for reverse biased junction, and high electric fields in the depletion region. Shockley-Read-Hall and Auger recombination models to account for recombination of the carriers. Band-gap-narrowing (BGN) model is used to model the pn junction as a bipolar device. The device was loaded with lumped resistance and capacitance models to ensure realistic conditions.

The electron-hole pairs are introduced in the simulation as a charge column. The charge column is assumed to have a Gaussian profile. The charge is generated over a period of about 6 picoseconds using a Gaussian waveform. The structure was setup to solve time-dependent solution lasting up to 5ns. This is sufficient to resolve the drift and diffusion component of the charge collection process. However, the diffusion charge collection may continue for a longer period, but its contribution to total charge collection is negligible. The output from the 3D simulation analysis is used to generate current profiles for the different particle strikes. The current is integrated over the time to calculate the charge collected by the soft error.

Hence, a typical transient current generated by a soft error has a high drift component, which lasts for a few picoseconds and after the collapsed field is re-established, the charge collections is predominantly due to diffusion. For glitch based circuit level analysis, it is important to model both drift and diffusion component accurately as the drift process is responsible for the peak, and the diffusion process is responsible long tail of the fast-rising slow decaying current pulse.

 

Figure 4. Funneling

 

Conclusions And Future Work

The SEAT seems to fill a critical need, particularly in research community. With its current stage, it received a great deal of interest from both the academia and industry. We received many positive critiques in a conference that we presented the SEAT the first time [9]. Many industry affiliates stated their interest in the tool.

Even though neutrons account for the majority of the cosmic particles at sea level, the contribution of other particles, particularly protons, become dominant to the soft error problem at higher altitudes. For a more through analysis and more extensive applicability, other particle interactions should also be incorporated into the simulation. At this stage of the tool, we managed to include proton flux in the cosmic rays into the analysis. This, however, does not include nuclear-level interactions of protons with the medium, but it does take into account their direct ionization effect. For better modeling of physics, nuclear interactions of protons with the host nuclei must be accounted for since this dominates proton-related single-event effects.

 

References

[1]  J. F. Ziegler , Terrestrial Cosmic Ray Intensities , IBM Journal of Research and Development, v. 40, No: 1, pp. 19-39, 1996.

[2]  J. Wallmark , S. Marcus , Minimum Size and Maximum Packaging Density of Non-redundant Semiconductor Devices , Proceedings of IRE, 50, 1962.

[3]  S. Borkar , T. Karnik , V. De , Design and Reliability Challenges in Nanometer Technologies , Proceedings of 41 st Design Automation Conference, San Diego , 1994.

[4]  P. D. Bradley , E. Normand , Single Event Upsets in Implantable Cardioverter Defibrillators , IEEE Trans. Nucl. Sci., v. 45, pp. 2929-2940, Dec. 1998.

[5]  Sun Microsystems, Soft Memory Errors and Their Effect on Sun Fire Systems , 2002.

[6]  P. E. Dodd , L. W. Massengill , Basic Mechanisms and Modeling of Single-event Upset in Digital Microelectronics , IEEE Trans. Nucl. Sci., v. 50, pp. 583-602, June 2003.

[7]  N. Vijaykrishnan , M. J. Irwin , K. Ünlü , V. Degalahal , S. M. Çetiner , Testing Neutron-induced Soft Errors in Semiconductor Memories , Breazeale Nuclear Annual Report, December 2004.

[8]  J. F. Ziegler , http://www.srim.org/

[9]  V. Degalahal , S. M. Cetiner , F. Alim , N. Vijaykrishnan , M. J. Irwin , K. Ünlü , SESEE: A Soft Error Simulation and Estimation Engine , 7 th Int. Conf. on Military and Aerospace Programmable Logic Devices (MAPLD), September 8-10, 2004 .