lamssi_crSection: LAM SSI CR OVERVIEW (7)Updated: May, 2004 |
lamssi_crSection: LAM SSI CR OVERVIEW (7)Updated: May, 2004 |
LAM/MPI can involuntarily checkpoint and restart parallel MPI jobs. Doing so requires that LAM/MPI was compiled with thread support and that back-end checkpointing systems are available at run-time. MPI jobs will have to run with at least MPI_THREAD_SERIALIZED support. If a job elects to run with checkpoint/restart support and an available cr module is found, the job's thread level will automatically be promoted to MPI_THREAD_SERIALIZED. See the User's Guide for more details.
LAM currently only has one cr module: blcr. In order for an MPI job to be able to be checkpointed and restarted, all of its MPI SSI modules must support checkpoint/restart. Currently, this means using the crtcp RPI module.
The blcr module has one SSI parameter: