About Distributed Resource Management

Distributed resource management (DRM) provides job scheduling for simulation process work items executing through the 3DOrchestrate infrastructure. Groups of 3DOrchestrate Stations can be configured to operate in the default DRM mode (built-in), in IBM Spectrum LSF mode, or in a customized open DRM system.

This page discusses:

See Also
Configuring a Group of Stations
SMAExeServer.properties File

Using a DRM job scheduler system is especially helpful for remote execution of compute-intensive applications such as the Abaqus physics simulation solvers and physics results visualization engine. See Configuring a Station for Remote Execution of Physics Solvers and Visualization.

Built-in DRM

The default DRM mode distributes execution work items to the available stations based on affinity matching requested by the end-user. One advantage of using the built-in DRM is very low scheduling overhead, which is particularly valuable for fast-running work items. Built-in DRM mode was previously called Fiper DRM in older versions of 3DOrchestrate.

However, built-in DRM does not support queues or job prioritization, nor does it consider relative machine speed or dynamic scheduling parameters (such as machine load, available memory, or available disk space) when selecting a station. As a result, built-in DRM mode does not always provide ideal scheduling for workflows containing long-running, resource-intensive work items. For such workflows, the LSF or customized open DRM mode is preferable.

Open DRM

3DOrchestrate provides two APIs that let you customize the interface between it and your DRM job scheduler. You can customize the interface using either scripts or a Java plug-in. For more details, see Customizing the 3DOrchestrate-DRM Interface.

IBM Spectrum LSF DRM

IBM Spectrum LSF (Load Sharing Facility) is a third-party job scheduler product. If you have an LSF cluster available, you can have the 3DOrchestrate Distribution Server submit work items to the LSF scheduler, enabling better control of the execution of your work load.

Using LSF DRM mode can significantly enhance the scheduling capabilities of 3DOrchestrate, particularly for workflows with time-consuming, resource-intensive work items. In the default built-in DRM mode, the system requires that stations be running and awaiting work items sent from the 3DOrchestrate Distribution Server. When using LSF DRM mode, the 3DOrchestrate Distribution Server uses LSF to launch station processes as needed on LSF compute nodes. Each process is then connected to the 3DOrchestrate Distribution Server, runs a single work item, and is terminated. Each work item dispatched with the LSF DRM corresponds to a single LSF job. This configuration gives LSF direct control over the station processes that are actually doing work, both for resource management and accounting purposes, and allows the 3DOrchestrate Distribution Server to use LSF’s sophisticated scheduling capabilities to select the optimal node for each piece of work.

Unlike built-in DRM mode, LSF DRM imposes some scheduling and process-launching overhead on each work item to be executed. However, for compute-intensive, long-running work items the improved scheduling and job management that LSF DRM provides greatly outweighs the overhead. For workflows composed of significant numbers of small, short-running work items, LSF DRM mode may reduce job throughput.

Before configuring 3DOrchestrate to work with your IBM Spectrum LSF system, create a group of stations in the Station Administration app with the same group name as your LSF host group.

To configure the 3DOrchestrate Distribution Server for LSF, do the following in the SMAExeServer.properties file:

  1. Uncomment the following line to activate the corresponding property (remove the leading # character):

    #fiper.system.drm.2=Lsf
  2. By default, all LSF commands used by the 3DOrchestrate Distribution Server are initialized within the 3DOrchestrate Distribution Server code. As long as the LSF environment (e.g. profile.lsf) is sourced and available to the OS user that the 3DOrchestrate Distribution Server is running as, the properties listed below (in this step) are not required.

    The following properties are available to override the default LSF commands with your own:

    fiper.system.bsubpath
    fiper.system.bkillpath
    fiper.system.bjobspath
    fiper.system.lshostspath
    fiper.system.bqueuespath
    fiper.system.bmgrouppath
  3. Specify the default LSF queue that the 3DOrchestrate Distribution Server should submit to by editing the server property fiper.system.lsfQueueName.

  4. Optionally, specify default LSF resource strings that should be applied to all job submissions by setting the 3DOrchestrate Distribution Server property fiper.system.lsfExtraResources.

  5. Optionally, specify the 3DOrchestrate Distribution Server property fiper.system.lsfStationCommand. By default, this is set to "SMAExeTranstation". This command must exist in the executable PATH on all LSF compute hosts. If you have homogenous compute nodes, you may specify an absolute path to the command using the specified property rather than ensuring it is in the PATH. If you have a heterogenous environment (e.g. Windows and Linux hosts), then the value of this setting must be in the PATH and be executable on all LSF compute hosts.

  6. If you have configured the 3DOrchestrate Distribution Server for run-as security, ensure that the <COS_server_install_dir>/acstemp/ folder/directory is writable by all OS users that will be submitting jobs to LSF.

  7. The 3DOrchestrate Station code needs to be accessible to all LSF compute nodes. The command that should be launched is:

    <station install_dir>/<platform>/code/command/SMAExeTranstation

    This command differs from the SMAExeStation command in that it forces the station to start with fiper.station.nogui=true and fiper.logon.prompt=no. Your SMAExeStation.properties file must be configured otherwise as a normal station would with a fiper.station.tempdir that is writable by all LSF OS users and a valid fiper.login.profile, fiper.logon.prop.user, and fiper.logon.prop.pw at a minimum.

  8. Restart the 3DOrchestrate Distribution Server in the Java application server.