Saturday, September 27, 2008

Develop fault-tolerant softwares.

Fault-tolerance is the property that enables a system to continue operating properly in the event of the failure of some of its components.” [2]


Many systems need to be fault tolerance and reliable, mainly safety critical application such as patient monitoring system, transaction processing etc. For more details see resources. In this blog, I am presenting you some practical approaches to include fault tolerance in software applications.

We team of four members, implemented software fault tolerance with the guidance from Asst. Prof. Rajesh Purohit and our alumni (working in Sun Microsystems) during UG. Here, I am showing you our initial work about implementation of fault tolerant concepts (process pair, self-checking & correction and multiple process backup technique). Let see simple example a sequence generator.

The sequence generator is the process which accomplishes some specified task/operation without affecting other processes, files or databases. Its current state depends on previous state. Some examples of sequence generator are random number generators, series generators etc. We used Fibonacci series generator.

1. Fault tolerance to sequence generator using process pair technique – The Process pair technique executes every process as a pair of primary and backup process. This is the technique which is used in Tandem nonstop systems to achieve fault tolerance but Tandem uses two different processors for achieving fault tolerance whereas we used two instances of a process running simultaneously on single processor. The one instance called primary process executes in active state and it store state information in stable storage, while another instance called backup process executes in monitoring state. If primary process fails then backup process retrieve state information for resume processing and complete the remaining task where primary process leaved.
Fig1: Before failure and after failure

2. Fault tolerance to sequence generator using self-checking and correction technique – Self-checking process automatically checks for errors and restarts itself until and unless designated task gets completed. It uses stable storage as a mean to check for errors. If last execution of process terminates abnormally, without completion of designated task, an error is identified by newly start instance of process and it starts execution, where it left task using the state information.
Fig 2: Cycle by self-correcting process

3. Fault tolerance to sequence generator using multiple backup process technique – The multiple process backup technique implements the multiple process single monitor paradigm. It is a refinement of process pair technique which reduces process redundancy. The parent backup process executes and keeps monitoring all the active child processes to provide backup in case of failure of active child process but it can provide backup to only one of all active child process at a time whichever fails first. When designated task of failed process is completes then it can again come in monitoring state. The parent backup process and all child processes communicate with each other by passing messages. The parent backup process also keeps track of number of all child processes which are active; it identifies the failed child process and provides it the full backup.
Fig 3: before failure and after failure

At last, as the future of this project we proposed the inclusion of a new layer called Fault Tolerant Layer which works between operating system layer and application software. This layer provides an inherent fault tolerant environment to any generic process or application that runs over it. The fault tolerant layer provides a mean of communication between application and operating system. The layer may use one or more fault tolerant techniques like process pair, self-checking and correcting and generic backup etc. The main goal of this fault tolerant layer is to accomplish the fault tolerance for any generic process which runs over it.
Figure 4: The proposed fault tolerant layer between application program and operating system layer

In java you can use Serialization to achieve fault tolerance [3].

Resources:

No comments: