Parallel Execution in Abaqus/Standard

Parallel execution in Abaqus/Standard:

reduces run time for large analyses;
is available for shared memory computers and computer clusters for the element operations, direct sparse solver, and iterative linear equation solver; and
can use compute-capable GPGPU hardware on shared memory computers for the direct sparse solver, AMS eigensolver, and modal frequency response solver.

This page discusses:

Invoking Parallel Processing
Thread-Based Parallelization
MPI-Based Parallelization
Hybrid Parallelization of MPI and Threads
GPGPU Acceleration
Consistency of Results

Invoking Parallel Processing

Abaqus/Standard supports both shared memory computers and computer clusters for parallelization. Parallelization is invoked using the cpus option in the abaqus execution procedure. The type of parallelization that is executed relies on the computer resource configured by the job submission system. The configured computer resource is reflected via the environment variable mp_host_list (see Environment File Settings). If mp_host_list consists of a single machine host, thread-based parallelization is used within that host if it has more than one processor available. If mp_host_list consists of multiple hosts, MPI-based parallelization is executed. In addition, if each host has more than one processor, thread-based parallelization is executed on each host. This type of parallelization is defined as a hybrid parallelization of MPI and threads.

Thread-Based Parallelization

Abaqus/Standard can be executed in thread mode within one node of a compute cluster and takes advantage of the shared memory available to the threads that are running on different processors. In most cases thread-based parallelization is fully supported in Abaqus/Standard. It is not supported for the inefficient old implementation of the linear dynamic analysis procedures initiated using the parameter setting SIM=NO. Using this branch of the code is required in several cases as a workaround for current limitations of the new high-performance implementation of the linear dynamic analysis procedures.

Input File Usage

Enter the following input on the command line:

abaqus job=job-name cpus=n

If mp_host_list is configured as shown below, the host list consists of only one host (maple), which has n processors to use. The job runs thread-based parallelization on this host.

mp_host_list=['maple', n]

MPI-Based Parallelization

Abaqus/Standard can also be executed in MPI mode, which uses the message passing interface to communicate between machine hosts. In most cases MPI-based parallelization is fully supported in Abaqus/Standard, except in the following workflows and features:

Quasi-Newton nonlinear solution technique (Solving Nonlinear Problems).
Eigensolver buckling prediction (Eigenvalue Buckling Prediction).
Natural frequency extraction (Natural Frequency Extraction).
Response spectrum analysis (Response Spectrum Analysis).
Random response analysis (Random Response Analysis).
Mode-based linear dynamic analysis (Transient Modal Dynamic Analysis, Mode-Based Steady-State Dynamic Analysis, Subspace-Based Steady-State Dynamic Analysis, and Complex Eigenvalue Extraction).
Cavity radiation analyses where parallel decomposition of the cavity is not allowed and writing or restart data is requested (Cavity Radiation in Abaqus/Standard).
Heat transfer analyses where average-temperature radiation conditions are specified (Thermal Loads).
Substructure generation (Generating Substructures).
Matrix generation (Generating Structural Matrices).
Element matrix output requests (Element Matrix Output in Abaqus/Standard).
Continuation of output on restart (Continuation of Output upon Restart).
Adaptive meshing (Defining ALE Adaptive Mesh Domains in Abaqus/Standard).

Input File Usage

Enter the following input on the command line:

abaqus job=job-name cpus=n

If n=4 and mp_host_list is configured as shown below, the host list consists of four hosts (maple, pine, oak, and elm), and each host has only one processor to use. The job runs MPI-based parallelization between these hosts.

mp_host_list=[['maple', 1], ['pine', 1], ['oak', 1], ['elm', 1]]

Hybrid Parallelization of MPI and Threads

You can further improve performance in hybrid mode when MPI parallelization is executed between hosts and thread-based parallelization is executed within each host.

The threads_per_mpi_process option can be used in conjunction with the cpus option to reconfigure the parallelization. The number of threads_per_mpi_process should be a divisor of the number of processors on each host and eventually a divisor of the number of cpus if the number of processors on all hosts are the same.

Input File Usage

Enter the following input on the command line:

abaqus job=job-name cpus=n

If n=4 and mp_host_list is configured as shown below, the host list consists of two hosts (maple and pine), and each host has two processors to use. The job runs MPI-based parallelization between maple and pine and also runs two thread parallelization separately on both hosts.

mp_host_list=[['maple', 2], ['pine', 2]]

Enter the following input on the command line:

abaqus job=job-name cpus=n threads_per_mpi_process=m

If n=4, m=2, and mp_host_list is originally configured as follows:

mp_host_list=['maple', 4]

threads_per_mpi_process splits the original single host (maple) into two hosts, and each host has two processors to use as follows:

mp_host_list=[['maple', 2], ['maple', 2]]

Then the job runs MPI-based parallelization between the first host (maple) and second host (maple) and also runs two thread parallelization separately on both hosts.

If mp_host_list is originally configured as follows:

mp_host_list=[['maple', 2], ['pine', 2]]

threads_per_mpi_process splits each host into two hosts that results in four hosts as follows:

mp_host_list=[['maple', 1], ['maple', 1], ['pine', 1], ['pine', 1]]

The job runs only MPI-based parallelization between the hosts in the host list.

GPGPU Acceleration

GPGPU acceleration is supported in the direct sparse solver (Direct Linear Equation Solver), AMS eigensolver (Selecting the Eigenvalue Extraction Method), and modal frequency response solver (Mode-Based Steady-State Dynamic Analysis).

Input File Usage

Enter the following input on the command line:

abaqus job=job-name gpus=n

Consistency of Results

Some physical systems (systems that, for example, undergo buckling, material failure, or delamination) can be highly sensitive to small perturbations. For example, it is well known that the experimentally measured buckling loads and final configurations of a set of seemingly identical cylindrical shells can show significant scatter due to small differences in boundary conditions, loads, initial geometries, etc. When simulating such systems, the physical sensitivities seen in an experiment can be manifested as sensitivities to small numerical differences caused by finite precision effects. Finite precision effects can lead to small numerical differences when running jobs on different numbers of processors. Therefore, when simulating physically sensitive systems, you may see differences in the numerical results (reflecting the differences seen in experiments) between jobs run on different numbers of processors. To obtain consistent simulation results from run to run, the number of processors should be constant.