Plotting output from code profilers

GCPy contains functions to plot output from the gprofng and Intel VTune code profilers, which can help you to identify computational bottlenecks (aka “hotspots”) in GEOS-Chem Classic, GCHP, and HEMCO.

gprofng

This example demonstrates how you can plot function profiles generated by the gprofng performance profiler.

_images/gprofng_functions.png

Source code

Script location: gcpy/profile/gprofng_functions.py

Usage

First, generate a function profile with gprofng:

$ gprofng collect app /path/to/executable/file

where /path/to/executable/file is the path to the program that you wish to profile. For example, to profile GEOS-Chem Classic, you would use this command:

$ gprofng collect app ./gcclassic

Gprofng will send profiling output to a folder named test.N.er, where N is an integer index and er stands for “experiment record”.

Next, send function profiling information to a file:

$ echo functions | gprofng display text test.N.er > functions_profile.txt

Here is a sample functions_profile.txt for GEOS-Chem Classic.

(gp-display-text) Functions sorted by metric: Exclusive Total CPU Time

Excl. Total     Incl. Total      Name
CPU             CPU
   sec.      %     sec.      %
508.406 100.00  508.406 100.00   <Total>
 62.594  12.31   62.594  12.31   __unitconv_mod_MOD_convertbox_kgm2_to_kg
 61.653  12.13   61.653  12.13   __unitconv_mod_MOD_convertbox_kg_to_kgm2
 58.401  11.49   58.401  11.49   <static>@0x7696c (<libm-2.28.so>)
 46.833   9.21   64.075  12.60   __tomas_mod_MOD_mnfix
 41.609   8.18   41.609   8.18   __gckpp_linearalgebra_MOD_kppdecomp
 29.941   5.89   29.941   5.89   __gckpp_linearalgebra_MOD_kppsolve
 26.609   5.23   26.609   5.23   __gckpp_function_MOD_fun_split
 ... etc ...

The Excl. Total (total Exclusive Time) metric is useful for identifying computational bottlenecks. This represents the amount of time spent in a subroutine, excluding time spent in any subroutines called by the subroutine.

Make sure that you have specified the proper Matplotlib backend for your system. Then run the example script with the following command:

$ python -m gcpy.profile.gprofng_functions functions_profile.txt 1 40

This will create a plot similar to that shown above, which shows the top 40 functions sorted by exclusive time. To see the next 40 functions sorted by exclusive time, use this command:

$ python -m gcpy.profile.gprofng_functions functions_profile.txt 41 80

etc. You should display fewer than 50 functions in order to prevent the time and percent labels from overlapping.

Intel VTune

These examples demonstrate how you can display output from the Intel VTune profiler in an easy-to-read format.

Source code

Script locations

Usage

First, use Intel VTune to collect information about hotspots:

$ vtune -collect hotspots -- /path/to/executable/file

where /path/to/executable/file is the path to the program that you wish to profile. For example, to profile GEOS-Chem Classic, you would use this command:

$ vtune -collect hotspots -- ./gcclassic

Intel VTune will send profiling output to a folder named rNNNhs, where NNN is a 3-digit integer (e.g. r000hs, r001hs, .etc).

List hotspots by function

Generate a hotspot report that shows the amount of CPU time that each function takes to execute:

$ vtune -report        "hotspots"                \
        -result-dir    "rNNNhs"                  \
        -format        "csv"                     \
        -group-by      "function"                \
        -report-output "hotspots.by-function.csv"

The report will be in comma-separated-variable (CSV) format, using the horizontal tab (\t) character as the separator. Use this command to display the list:

$ python -m gcpy.profile.vtune_list_hotspots -f hotspots.by-function.csv

You will see output similar to this:

Rank      Function                                                              CPU Time [s]
1         gomp_simple_barrier_wait                                              24632.527261
2         gomp_team_barrier_wait_end                                             9641.441448
3         do_spin                                                                2989.702093
4         do_spin                                                                1192.336839
5         __gckpp_integrator_MOD_forwardeuler                                    1127.055528
6         __carbon_gases_mod_MOD_chem_carbon_gases._omp_fn.1                      867.404384
7         __hco_calc_mod_MOD_get_current_emissions._omp_fn.1                      746.427654
8         gomp_iter_dynamic_next                                                  459.442222
9         gomp_mutex_lock_slow                                                    386.295001
10        pow                                                                     374.182018
11        __memmove_evex_unaligned_erms                                           281.754054
12        __memset_evex_unaligned_erms                                            219.870543
13        gomp_team_end                                                           210.792107
14        getvertindx                                                             199.950757
15        __hco_calc_mod_MOD_get_current_emissions._omp_fn.0                      196.184759
16        gomp_team_barrier_wait_final                                            177.007387
17        apply_scale_factor                                                      171.323264
18        get_value_from_datacont                                                 160.885377
19        __hco_calc_mod_MOD_hco_calcemis                                         138.678973
20        __calc_met_mod_MOD_airqnt._omp_fn.1                                     118.066415
21        do_cloud_convection                                                     112.604329
22        get_current_emissions                                                   103.695072
23        gomp_loop_dynamic_start                                                 100.352473
24        __gckpp_function_MOD_fun                                                 99.050549
25        __hco_tidx_mod_MOD_tidx_getindx                                          96.415549
26        __vdiff_mod_MOD_vdiffdr._omp_fn.0                                        93.715739
27        expf64                                                                   93.333107
28        gomp_simple_barrier_wait                                                 89.524049
29        fzppm                                                                    87.241202
30        lmtppm                                                                   82.941413

Press ENTER to continue, or Q/q then ENTER to quit >>>

Tip

Use the -l argument to display a different number of lines per screen. For example:

$  $ python -m gcpy.profile.vtune_list_hotspots -f hotspots.by-function.csv -l 40

will display 40 lines per screen, etc.

List hotspots by source code line

You may also generate a hotspot report that shows the module name and line number of each hotspot:

$ vtune -report        "hotspots"                \
        -result-dir    "rNNNhs"                  \
        -format        "csv"                     \
        -group-by      "source-line"             \
        -report-output "hotspots.by-line.csv"

The report will be in comma-separated-variable (CSV) format, using the horizontal tab (\t) character as the separator. Use this command to display the list:

$ python -m gcpy.profile.vtune_list_hotspots -f hotspots.by-line.csv

You will see output similar to this:

Rank      Source File                      Source Line          CPU Time [s]
1         simple-bar.h                              60          24722.051310
2         bar.c                                    112           9435.561971
3         wait.h                                    56           3215.631958
4         [Unknown source file]              [Unknown]           1276.917637
5         gckpp_Integrator.F90                     186           1122.006955
6         carbon_gases_mod.F90                     536            833.929713
7         hco_calc_mod.F90                        1248            636.648621
8         wait.h                                    56            499.983260
9         iter.c                                   197            420.879619
10        mutex.c                                   41            345.307192
11        wait.h                                    57            331.221428
12        team.c                                   956            209.822250
13        bar.c                                    112            203.079367
14        wait.h                                    57            182.085065
15        bar.c                                    133            176.957387
16        hco_calc_mod.F90                        2216            138.241661
17        hco_calc_mod.F90                        1651            123.538416
18        loop.c                                   130             98.411270
19        gckpp_Function.F90                        67             95.191678
20        futex.h                                  123             90.769803
21        hco_calc_mod.F90                        1418             74.508612
22        hco_calc_mod.F90                        1541             70.619145
23        hco_tidx_mod.F90                         366             67.885338
24        hco_calc_mod.F90                        1019             67.284800
25        mixing_mod.F90                           780             62.574882
26        hco_calc_mod.F90                        1475             60.544203
27        calc_met_mod.F90                         697             57.911982
28        hco_interface_common.F90                 165             57.280402
29        history_mod.F90                         2670             54.087905
30        hco_calc_mod.F90                        1046             53.123816

Press ENTER to continue, or Q/q then ENTER to quit >>>

Comparing hotspots

Let’s say you have used Intel VTune to generate hotspot reports for runs before and after a given fix was applied. You can compare a hotspot by name to see how much time it took to execute in both runs. Use the following command:

python -m gcpy.profile.vtune_compare_hotspots       \
  --ref-file     "hotspots.by-function.before.csv"  \
  --ref-label    "Before"                           \
  --dev-file     "hotspots.by-function.after.csv"   \
  --dev-label    "After"                            \
  --hotspot-name "get_current_emissions"

You will then see output similar to this:

Hotspot                          Before         After      Abs Diff    % Diff
get_current_emissions         113.13094     103.69507      -9.43587     -8.34

Plotting hotspots

Make sure that you have specified the proper Matplotlib backend for your system. Then run the example script with the following command:

$ python -m gcpy.profile.vtune_plot_hotspots hotspots.by-function.csv 1 40

or

$ python -m gcpy.profile.vtune_plot_hotspots hotspots.by-line.csv 1 40

This will show the first 40 hotspots plotted in decreasing order of CPU time, similar to the example shown in our gprofng section above.