Plotting output from code profilers

GCPy contains functions to plot output from the gprofng and Intel VTune code profilers, which can help you to identify computational bottlenecks (aka “hotspots”) in GEOS-Chem Classic, GCHP, and HEMCO.

gprofng

This example demonstrates how you can plot function profiles generated by the gprofng performance profiler.

Source code

Script location: gcpy/profile/gprofng_functions.py

Usage

First, generate a function profile with gprofng:

$ gprofng collect app /path/to/executable/file

where /path/to/executable/file is the path to the program that you wish to profile. For example, to profile GEOS-Chem Classic, you would use this command:

$ gprofng collect app ./gcclassic

Gprofng will send profiling output to a folder named test.N.er, where N is an integer index and er stands for “experiment record”.

Next, send function profiling information to a file:

$ echo functions | gprofng display text test.N.er > functions_profile.txt

Here is a sample functions_profile.txt for GEOS-Chem Classic.

(gp-display-text) Functions sorted by metric: Exclusive Total CPU Time

Excl. Total     Incl. Total      Name
CPU             CPU
   sec.      %     sec.      %
508.406 100.00  508.406 100.00   <Total>
 62.594  12.31   62.594  12.31   __unitconv_mod_MOD_convertbox_kgm2_to_kg
 61.653  12.13   61.653  12.13   __unitconv_mod_MOD_convertbox_kg_to_kgm2
 58.401  11.49   58.401  11.49   <static>@0x7696c (<libm-2.28.so>)
 46.833   9.21   64.075  12.60   __tomas_mod_MOD_mnfix
 41.609   8.18   41.609   8.18   __gckpp_linearalgebra_MOD_kppdecomp
 29.941   5.89   29.941   5.89   __gckpp_linearalgebra_MOD_kppsolve
 26.609   5.23   26.609   5.23   __gckpp_function_MOD_fun_split
 ... etc ...

The Excl. Total (total Exclusive Time) metric is useful for identifying computational bottlenecks. This represents the amount of time spent in a subroutine, excluding time spent in any subroutines called by the subroutine.

Make sure that you have specified the proper Matplotlib backend for your system. Then run the example script with the following command:

$ python -m gcpy.profile.gprofng_functions functions_profile.txt 1 40

This will create a plot similar to that shown above, which shows the top 40 functions sorted by exclusive time. To see the next 40 functions sorted by exclusive time, use this command:

$ python -m gcpy.profile.gprofng_functions functions_profile.txt 41 80

etc. You should display fewer than 50 functions in order to prevent the time and percent labels from overlapping.

Intel VTune

These examples demonstrate how you can display output from the Intel VTune profiler in an easy-to-read format.

Source code

Script locations

Usage

First, use Intel VTune to collect information about hotspots:

$ vtune -collect hotspots -- /path/to/executable/file

where /path/to/executable/file is the path to the program that you wish to profile. For example, to profile GEOS-Chem Classic, you would use this command:

$ vtune -collect hotspots -- ./gcclassic

Intel VTune will send profiling output to a folder named rNNNhs, where NNN is a 3-digit integer (e.g. r000hs, r001hs, .etc).

List hotspots by function

Generate a hotspot report that shows the amount of CPU time that each function takes to execute:

$ vtune -report        "hotspots"                \
        -result-dir    "rNNNhs"                  \
        -format        "csv"                     \
        -group-by      "function"                \
        -report-output "hotspots.by-function.csv"

The report will be in comma-separated-variable (CSV) format, using the horizontal tab (\t) character as the separator. Use this command to display the list:

$ python -m gcpy.profile.vtune_list_hotspots -f hotspots.by-function.csv

You will see output similar to this:

Rank      Function                                                              CPU Time [s]
       gomp_simple_barrier_wait                                              24632.527261
       gomp_team_barrier_wait_end                                             9641.441448
       do_spin                                                                2989.702093
       do_spin                                                                1192.336839
       __gckpp_integrator_MOD_forwardeuler                                    1127.055528
       __carbon_gases_mod_MOD_chem_carbon_gases._omp_fn.1                      867.404384
       __hco_calc_mod_MOD_get_current_emissions._omp_fn.1                      746.427654
       gomp_iter_dynamic_next                                                  459.442222
       gomp_mutex_lock_slow                                                    386.295001
      pow                                                                     374.182018
      __memmove_evex_unaligned_erms                                           281.754054
      __memset_evex_unaligned_erms                                            219.870543
      gomp_team_end                                                           210.792107
      getvertindx                                                             199.950757
      __hco_calc_mod_MOD_get_current_emissions._omp_fn.0                      196.184759
      gomp_team_barrier_wait_final                                            177.007387
      apply_scale_factor                                                      171.323264
      get_value_from_datacont                                                 160.885377
      __hco_calc_mod_MOD_hco_calcemis                                         138.678973
      __calc_met_mod_MOD_airqnt._omp_fn.1                                     118.066415
      do_cloud_convection                                                     112.604329
      get_current_emissions                                                   103.695072
      gomp_loop_dynamic_start                                                 100.352473
      __gckpp_function_MOD_fun                                                 99.050549
      __hco_tidx_mod_MOD_tidx_getindx                                          96.415549
      __vdiff_mod_MOD_vdiffdr._omp_fn.0                                        93.715739
      expf64                                                                   93.333107
      gomp_simple_barrier_wait                                                 89.524049
      fzppm                                                                    87.241202
      lmtppm                                                                   82.941413

Press ENTER to continue, or Q/q then ENTER to quit >>>

Tip

Use the -l argument to display a different number of lines per screen. For example:

$  $ python -m gcpy.profile.vtune_list_hotspots -f hotspots.by-function.csv -l 40

will display 40 lines per screen, etc.

List hotspots by source code line

You may also generate a hotspot report that shows the module name and line number of each hotspot:

$ vtune -report        "hotspots"                \
        -result-dir    "rNNNhs"                  \
        -format        "csv"                     \
        -group-by      "source-line"             \
        -report-output "hotspots.by-line.csv"

The report will be in comma-separated-variable (CSV) format, using the horizontal tab (\t) character as the separator. Use this command to display the list:

$ python -m gcpy.profile.vtune_list_hotspots -f hotspots.by-line.csv

You will see output similar to this:

Rank      Source File                      Source Line          CPU Time [s]
       simple-bar.h                              60          24722.051310
       bar.c                                    112           9435.561971
       wait.h                                    56           3215.631958
       [Unknown source file]              [Unknown]           1276.917637
       gckpp_Integrator.F90                     186           1122.006955
       carbon_gases_mod.F90                     536            833.929713
       hco_calc_mod.F90                        1248            636.648621
       wait.h                                    56            499.983260
       iter.c                                   197            420.879619
      mutex.c                                   41            345.307192
      wait.h                                    57            331.221428
      team.c                                   956            209.822250
      bar.c                                    112            203.079367
      wait.h                                    57            182.085065
      bar.c                                    133            176.957387
      hco_calc_mod.F90                        2216            138.241661
      hco_calc_mod.F90                        1651            123.538416
      loop.c                                   130             98.411270
      gckpp_Function.F90                        67             95.191678
      futex.h                                  123             90.769803
      hco_calc_mod.F90                        1418             74.508612
      hco_calc_mod.F90                        1541             70.619145
      hco_tidx_mod.F90                         366             67.885338
      hco_calc_mod.F90                        1019             67.284800
      mixing_mod.F90                           780             62.574882
      hco_calc_mod.F90                        1475             60.544203
      calc_met_mod.F90                         697             57.911982
      hco_interface_common.F90                 165             57.280402
      history_mod.F90                         2670             54.087905
      hco_calc_mod.F90                        1046             53.123816

Press ENTER to continue, or Q/q then ENTER to quit >>>

Comparing hotspots

Let’s say you have used Intel VTune to generate hotspot reports for runs before and after a given fix was applied. You can compare a hotspot by name to see how much time it took to execute in both runs. Use the following command:

python -m gcpy.profile.vtune_compare_hotspots       \
  --ref-file     "hotspots.by-function.before.csv"  \
  --ref-label    "Before"                           \
  --dev-file     "hotspots.by-function.after.csv"   \
  --dev-label    "After"                            \
  --hotspot-name "get_current_emissions"

You will then see output similar to this:

Hotspot                          Before         After      Abs Diff    % Diff
get_current_emissions         113.13094     103.69507      -9.43587     -8.34

Plotting hotspots

Make sure that you have specified the proper Matplotlib backend for your system. Then run the example script with the following command:

$ python -m gcpy.profile.vtune_plot_hotspots hotspots.by-function.csv 1 40

or

$ python -m gcpy.profile.vtune_plot_hotspots hotspots.by-line.csv 1 40

This will show the first 40 hotspots plotted in decreasing order of CPU time, similar to the example shown in our gprofng section above.