Applying Profiling

You can use profiling to determine where the time is spent in evaluating the model equations.

This page discusses:

Basic Profiling

You can use basic profiling to determine where the time is spent in evaluating the model equations, evaluating the steps in the simulation.

To be able to use any profiling, precision timing is needed. This can be accomplished setting the flag:

Advanced.Define.PrecisionTiming=true

This functionality uses the frequency counter of the processor to provide high-resolution time measurements.

Note: For information about how to set flags, see Displaying or Changing Flags by Scripting.

To turn on basic profiling, set the flag:

Advanced.GenerateBlockTimers=true

Then translate the model, and run the simulation. At the end of the log file a table contains the profiling information.

We exemplify the usage of block timers by simulating the model Modelica.Mechanics.MultiBody.Examples.Loops.EngineV6_analytic (which takes about 0.1 second to simulate). Below follows a timing of such a simulation.

Profiling information for the blocks.
Estimated overhead per call 0.01[us] total 0.001[s]
the estimated overhead has been subtracted below.
Name of block     , Block, Total CPU[s],  Mean[us]  ( Min[us] to Max[us] ),  Called
Entire model               : 0,            0.139,      10.88 ( 0.53    to   464.91),   12740
Event handling             : 1,            0.003,      13.59 ( 11.18   to    49.33),     252
Empty timer                : 2,            0.000,       0.01 ( 0.00    to     0.05),   12744
Outside of model           : 3,            0.034,       2.68 ( 0.07    to  2140.39),   12742
Parameters                 : 4,            0.000,     203.73 ( 203.73  to   203.73),       1
   Parameter code          : 5,            0.000,     197.46 ( 197.46  to   197.46),       1
   Parameter code          : 6,            0.000,       6.12 ( 6.12    to     6.12),       1
InitialSection             : 7,            0.000,     292.15 ( 292.15  to   292.15),       1
   Initial code            : 8,            0.000,     232.99 ( 232.99  to   232.99),       1
   initialization.linear[1]: 9,            0.000,      57.24 ( 57.24   to    57.24),       1
   Initial code            : 10,           0.000,       0.07 ( 0.07    to     0.07),       1
OutputSection              : 11,           0.123,      10.26 ( 8.29    to   463.70),   11990
   Output code             : 12,           0.044,       3.67 ( 2.92    to   265.53),   11990
   simulation.linear[1]    : 13,           0.078,       6.46 ( 5.07    to   458.66),   11990
   Output code             : 14,           0.000,       0.02 ( 0.01    to     0.73),   11990
DynamicsSection            : 15,           0.001,       0.06 ( 0.03    to    16.61),   11991
   Dynamics code           : 16,           0.000,       0.01 ( 0.00    to     0.12),   11991
AcceptedSection2           : 17,           0.007,       1.62 ( 1.21    to    28.02),    4337
   Auxiliary2 code         : 18,           0.007,       1.58 ( 1.15    to    27.94),    4337

The first lines state that we have estimated the overhead of the timers to 0.01 microseconds and subtracted them from all timing estimates, thus making it easier to find the problematic block. The total overhead is also included for comparison.

It is then necessary to examine the file dsmodel.c in order to determine what block number corresponds to what variable(s). The start of block 6 is marked by DymolaStartTimer(6) and the end by DymolaEndTimerName(6). The first blocks are special:

Block # Task
0 The total time for all model evaluations.
1 The total time for model evaluations during event handling.
2 An empty block, included just for comparison.
3 The total time spent between model evaluations.

For each of these blocks we have only subtracted the overhead for its own timer and not for additional timer calls that occur during its call. Thus the total time for all model evaluations should more accurately be around 0.139 s - 0.001s (total overhead) = 0.138 s.

In this example the event iterations are insignificant compared to the rest, even though there are 125 event iterations.

The remaining blocks are code-blocks, either system of equations (including a trailing part of torn equations) or blocks of non-equations.

Among these blocks the most time is spent in block 13. This linear system of equations accounts for about 0.08 s of the models total 0.14 s. It represents the global kinematic loop that is defined by the six pistons through their connection to the crankshaft. However, note that each cylinder in the EngineV6_analytic model contains a loop breaker joint that analytically solves all kinematic loops. Thus, simulation.linear[1] is entirely solved during translation. During simulation it only consists of assignment statements (as can be seen under Statistics in the Translation log, where the equation´s size is listed as zero).

Note that block 8 has the largest average time of the normal blocks, but it has no significant influence on the total CPU-time. This block consists of variables that are parameter-dependent and thus are constant during continuous integration, and is thus evaluated only once.

Remember to reset the flag:

Advanced.GenerateBlockTimers=false

in order to avoid the overhead for the next model. This flag is not cleared by Clear or Clear All commands.

The timer overhead also affects the total simulation time. Also note that the accurate timers measure wall-clock time and not CPU-time in the simulation process. Thus simple statements can have large maximum times if another process interrupts them between the start of the timer and the stop of the timer. This should not affect the average and minimum times.

Fine Grained Profiling

In some cases, the basic profiling indicates that one non-equation block is the cause of the problem. In order to determine what is the cause within that block it is possible to turn on a finer grained profiling.

To activate finer grained profiling, set the flag:

Advanced.GenerateTimers=true
Notes:
  • Note that any profiling demands the flag Advanced.Define.PrecisionTiming = true, as described in the first section above.
  • For information about how to set flags, see Displaying or Changing Flags by Scripting.

Then translate the model, and run the simulation. At the end of the log file a large table contains the profiling information. Note that it might be necessary to open the log file (dslog.txt) in an editor to view the entire table.

By running the EngineV6_analytic model again with fine grained profiling, you get results such as

Profiling information for the blocks.
Estimated overhead per call 0.01[us] total 0.001[s]
the estimated overhead has been subtracted below.
Name of block             , Block, Total CPU[s],  Mean[us]  ( Min[us] to  Max[us] ),  Called
Entire model              : 0,            0.538,      42.22 ( 1.33    to    569.62),   12743
Event handling            : 1,            0.017,      65.65 ( 51.94   to    465.13),     252
Empty timer               : 2,            0.000,       0.01 ( 0.00    to      0.13),   12744
Outside of model          : 3,            0.150,      11.74 ( 0.13    to 100523.00),   12743
…
OutputSection             : 3550,         0.413,      34.47 ( 28.34   to   567.45),    11990
…
   Output code            : 3648,         0.000,       0.01 ( 0.00    to     0.13),    11990
   Output code            : 3649,         0.002,       0.20 ( 0.10    to    30.42),    11990
   Output code            : 3650,         0.000,       0.01 ( 0.00    to    19.31),    11990
…
   Output code            : 3748,         0.000,       0.01 ( 0.00    to     0.14),    11990
   Output code            : 3749,         0.002,       0.17 ( 0.10    to    24.16),    11991
   Output code            : 3750,         0.000,       0.01 ( 0.00    to    16.59),    11991
…
   simulation.linear[1]   : 4157,         0.088,       7.30 ( 5.70    to   288.06),    11990
      Lin sys total       : 4158,         0.087,       7.26 ( 5.65    to   287.97),    11990
      Lin sys torn        : 4163,         0.026,       2.18 ( 1.64    to   264.76),    11990

The dominating block 4157 is the equation block found already with the coarser grained profiling. Note that more details about the linear system are given here. However, since the system is solved already during translation, no time is spent during simulation to construct right-hand-side vectors and system matrices. Nor is any time spent solving the resulting linear equations. Thus only the time for the torn assignments are listed above.

The blocks 3649 and 3749 (as well as 3849, 3949, 4049, and 4146) are complex equations giving the gas force in each cylinder. Since these equations have event generating ‘>’ and ‘<‘ one might try to use the noEvent operator. When using noEvent one must first check that the equation is sufficiently smooth. Since part of the equation is a piece-wise polynomial, we evaluate both pieces at the switch and observe that they do not give the same result. Thus, noEvent must not be used here. Additionally, the Event Log tab of the Simulation Analysis command tells us that these events only happen once during simulation. They are therefore not a big issue. (For information about the Event Log, see Analyzing Events.)

On the other hand, rewriting the polynomials with Horner’s rule decreased the average from about 0.2 to 0.05 microseconds. This had marginal influence on the total CPU-time in this example.

One should remember to reset the flag

Advanced.GenerateTimers=false

in order to avoid the overhead for the next model. Note that this flag is not cleared by clear or Clear All commands.

Profiling of Function Calls

You can activate profiling of function calls. This can be used to, for example, investigate external function calls, especially FMUs. Another usage can be to find time-consuming media functions.

To activate profiling of function calls, set the flag

Advanced.GenerateFunctionTimers = true

(The flag is by default false.)

Note: For information about how to set flags, see Displaying or Changing Flags by Scripting.

This profiling can be used together with profiling described in previous sections, but can also be used on its own. Note however that any profiling demands the flag Advanced.Define.PrecisionTiming = true, as described in the first section above.