.. |br| raw:: html
.. _code-profiling: ################################### Plotting output from code profilers ################################### GCPy contains functions to plot output from the :program:`gprofng` and :program:`Intel VTune` code profilers, which can help you to identify computational bottlenecks (aka "hotspots") in GEOS-Chem Classic, GCHP, and HEMCO. .. _code-profiling-gprofng: ======= gprofng ======= This example demonstrates how you can plot function profiles generated by the :program:`gprofng` performance profiler. .. image:: _static/images/gprofng\_functions.png :align: center .. _code-profiling-gprofng-source: Source code ----------- **Script location:** `gcpy/profile/gprofng_functions.py `_ .. _code-profiling-gprofng-usage: Usage ----- First, generate a function profile with :program:`gprofng`: .. code-block:: console $ gprofng collect app /path/to/executable/file where :code:`/path/to/executable/file` is the path to the program that you wish to profile. For example, to profile GEOS-Chem Classic, you would use this command: .. code-block:: console $ gprofng collect app ./gcclassic :program:`Gprofng` will send profiling output to a folder named :file:`test.N.er`, where :code:`N` is an integer index and :code:`er` stands for "experiment record". Next, send function profiling information to a file: .. code-block:: console $ echo functions | gprofng display text test.N.er > functions_profile.txt Here is a sample :file:`functions_profile.txt` for GEOS-Chem Classic. .. code-block:: text (gp-display-text) Functions sorted by metric: Exclusive Total CPU Time Excl. Total Incl. Total Name CPU CPU sec. % sec. % 508.406 100.00 508.406 100.00 62.594 12.31 62.594 12.31 __unitconv_mod_MOD_convertbox_kgm2_to_kg 61.653 12.13 61.653 12.13 __unitconv_mod_MOD_convertbox_kg_to_kgm2 58.401 11.49 58.401 11.49 @0x7696c () 46.833 9.21 64.075 12.60 __tomas_mod_MOD_mnfix 41.609 8.18 41.609 8.18 __gckpp_linearalgebra_MOD_kppdecomp 29.941 5.89 29.941 5.89 __gckpp_linearalgebra_MOD_kppsolve 26.609 5.23 26.609 5.23 __gckpp_function_MOD_fun_split ... etc ... The :program:`Excl. Total` (total Exclusive Time) metric is useful for identifying computational bottlenecks. This represents the amount of time spent in a subroutine, excluding time spent in any subroutines called by the subroutine. Make sure that you have :ref:`specified the proper Matplotlib backend ` for your system. Then run the example script with the following command: .. code-block:: console $ python -m gcpy.profile.gprofng_functions functions_profile.txt 1 40 This will create a plot similar to that shown above, which shows the top 40 functions sorted by exclusive time. To see the next 40 functions sorted by exclusive time, use this command: .. code-block:: console $ python -m gcpy.profile.gprofng_functions functions_profile.txt 41 80 etc. You should display fewer than 50 functions in order to prevent the time and percent labels from overlapping. .. _code-profiling-vtune: =========== Intel VTune =========== These examples demonstrate how you can display output from the Intel VTune profiler in an easy-to-read format. .. _code-profiling-vtune-source: Source code ----------- **Script locations** - `gcpy/profile/vtune_utils.py `_ - `gcpy/profile/vtune_list_hotspots.py `_ - `gcpy/profile/vtune_compare_hotspots.py `_ - `gcpy/profile/vtune_plot_hotspots.py `_ .. _code-profiling-vtune-usage: Usage ----- First, use :program:`Intel VTune` to collect information about hotspots: .. code-block:: console $ vtune -collect hotspots -- /path/to/executable/file where :code:`/path/to/executable/file` is the path to the program that you wish to profile. For example, to profile GEOS-Chem Classic, you would use this command: .. code-block:: console $ vtune -collect hotspots -- ./gcclassic :program:`Intel VTune` will send profiling output to a folder named :file:`rNNNhs`, where :code:`NNN` is a 3-digit integer (e.g. :file:`r000hs`, :file:`r001hs`, .etc). List hotspots by function ~~~~~~~~~~~~~~~~~~~~~~~~~ Generate a hotspot report that shows the amount of CPU time that each function takes to execute: .. code-block:: console $ vtune -report "hotspots" \ -result-dir "rNNNhs" \ -format "csv" \ -group-by "function" \ -report-output "hotspots.by-function.csv" The report will be in comma-separated-variable (CSV) format, using the horizontal tab (:literal:`\\t`) character as the separator. Use this command to display the list: .. code-block:: console $ python -m gcpy.profile.vtune_list_hotspots -f hotspots.by-function.csv You will see output similar to this: .. code-block:: text Rank Function CPU Time [s] 1 gomp_simple_barrier_wait 24632.527261 2 gomp_team_barrier_wait_end 9641.441448 3 do_spin 2989.702093 4 do_spin 1192.336839 5 __gckpp_integrator_MOD_forwardeuler 1127.055528 6 __carbon_gases_mod_MOD_chem_carbon_gases._omp_fn.1 867.404384 7 __hco_calc_mod_MOD_get_current_emissions._omp_fn.1 746.427654 8 gomp_iter_dynamic_next 459.442222 9 gomp_mutex_lock_slow 386.295001 10 pow 374.182018 11 __memmove_evex_unaligned_erms 281.754054 12 __memset_evex_unaligned_erms 219.870543 13 gomp_team_end 210.792107 14 getvertindx 199.950757 15 __hco_calc_mod_MOD_get_current_emissions._omp_fn.0 196.184759 16 gomp_team_barrier_wait_final 177.007387 17 apply_scale_factor 171.323264 18 get_value_from_datacont 160.885377 19 __hco_calc_mod_MOD_hco_calcemis 138.678973 20 __calc_met_mod_MOD_airqnt._omp_fn.1 118.066415 21 do_cloud_convection 112.604329 22 get_current_emissions 103.695072 23 gomp_loop_dynamic_start 100.352473 24 __gckpp_function_MOD_fun 99.050549 25 __hco_tidx_mod_MOD_tidx_getindx 96.415549 26 __vdiff_mod_MOD_vdiffdr._omp_fn.0 93.715739 27 expf64 93.333107 28 gomp_simple_barrier_wait 89.524049 29 fzppm 87.241202 30 lmtppm 82.941413 Press ENTER to continue, or Q/q then ENTER to quit >>> .. tip:: Use the :literal:`-l` argument to display a different number of lines per screen. For example: .. code-block:: console $ $ python -m gcpy.profile.vtune_list_hotspots -f hotspots.by-function.csv -l 40 will display 40 lines per screen, etc. List hotspots by source code line ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You may also generate a hotspot report that shows the module name and line number of each hotspot: .. code-block:: console $ vtune -report "hotspots" \ -result-dir "rNNNhs" \ -format "csv" \ -group-by "source-line" \ -report-output "hotspots.by-line.csv" The report will be in comma-separated-variable (CSV) format, using the horizontal tab (:literal:`\\t`) character as the separator. Use this command to display the list: .. code-block:: console $ python -m gcpy.profile.vtune_list_hotspots -f hotspots.by-line.csv You will see output similar to this: .. code-block:: text Rank Source File Source Line CPU Time [s] 1 simple-bar.h 60 24722.051310 2 bar.c 112 9435.561971 3 wait.h 56 3215.631958 4 [Unknown source file] [Unknown] 1276.917637 5 gckpp_Integrator.F90 186 1122.006955 6 carbon_gases_mod.F90 536 833.929713 7 hco_calc_mod.F90 1248 636.648621 8 wait.h 56 499.983260 9 iter.c 197 420.879619 10 mutex.c 41 345.307192 11 wait.h 57 331.221428 12 team.c 956 209.822250 13 bar.c 112 203.079367 14 wait.h 57 182.085065 15 bar.c 133 176.957387 16 hco_calc_mod.F90 2216 138.241661 17 hco_calc_mod.F90 1651 123.538416 18 loop.c 130 98.411270 19 gckpp_Function.F90 67 95.191678 20 futex.h 123 90.769803 21 hco_calc_mod.F90 1418 74.508612 22 hco_calc_mod.F90 1541 70.619145 23 hco_tidx_mod.F90 366 67.885338 24 hco_calc_mod.F90 1019 67.284800 25 mixing_mod.F90 780 62.574882 26 hco_calc_mod.F90 1475 60.544203 27 calc_met_mod.F90 697 57.911982 28 hco_interface_common.F90 165 57.280402 29 history_mod.F90 2670 54.087905 30 hco_calc_mod.F90 1046 53.123816 Press ENTER to continue, or Q/q then ENTER to quit >>> Comparing hotspots ~~~~~~~~~~~~~~~~~~ Let's say you have used :program:`Intel VTune` to generate hotspot reports for runs before and after a given fix was applied. You can compare a hotspot by name to see how much time it took to execute in both runs. Use the following command: .. code-block:: console python -m gcpy.profile.vtune_compare_hotspots \ --ref-file "hotspots.by-function.before.csv" \ --ref-label "Before" \ --dev-file "hotspots.by-function.after.csv" \ --dev-label "After" \ --hotspot-name "get_current_emissions" You will then see output similar to this: .. code-block:: text Hotspot Before After Abs Diff % Diff get_current_emissions 113.13094 103.69507 -9.43587 -8.34 Plotting hotspots ~~~~~~~~~~~~~~~~~ Make sure that you have :ref:`specified the proper Matplotlib backend ` for your system. Then run the example script with the following command: .. code-block:: console $ python -m gcpy.profile.vtune_plot_hotspots hotspots.by-function.csv 1 40 or .. code-block:: console $ python -m gcpy.profile.vtune_plot_hotspots hotspots.by-line.csv 1 40 This will show the first 40 hotspots plotted in decreasing order of CPU time, similar to the :ref:`example shown in our gprofng section above `.