Plotting

This page describes in depth the general plotting capabilities of GCPy, including possible argument values for every plotting function.

For information about GCPy functions that are specific to the GEOS-Chem benchmark workflow, please see our Benchmarking chapter.

Six-panel comparison plots

The functions listed below generate six-panel plots comparing variables between two datasets:

Plotting function

Located in module

compare_single_level

gcpy.plot.compare_single_level

compare_zonal_mean

gcpy.plot.compare_zonal_mean

Both compare_single_level() and compare_zonal_mean() generate a six panel plot for each variable passed. These plots can either be saved to PDFs or generated sequentially for visualization in the Matplotlib GUI using matplotlib.pyplot.show(). Each plot uses data passed from a reference (Ref) dataset and a development (Dev) dataset. Both functions share significant structural overlap both in output appearance and code implementation.

You can import these routines into your code with these statements:

from gcpy.plot.compare_single_level import compare_single_level
from gcpy.plot.compare_zonal_mean import compare_zonal_mean

Each panel has a title describing the type of panel, a colorbar for the values plotted in that panel, and the units of the data plotted in that panel. The upper two panels of each plot show actual values from the Ref (left) and Dev (right) datasets for a given variable. The middle two panels show the difference (Dev - Ref) between the values in the Dev dataset and the values in the Ref dataset. The left middle panel uses a full dynamic color map, while the right middle panel caps the color map at the 5th and 95th percentiles. The bottom two panels show the ratio (Dev/Ref) between the values in the Dev dataset and the values in the Ref Dataset. The left bottom panel uses a full dynamic color map, while the right bottom panel caps the color map at 0.5 and 2.0.

Function compare_single_level

The compare_single_level function accepts takes the following arguments:

def compare_single_level(
        refdata,
        refstr,
        devdata,
        devstr,
        varlist=None,
        ilev=0,
        itime=0,
        refmet=None,
        devmet=None,
        weightsdir='.',
        pdfname="",
        cmpres=None,
        match_cbar=True,
        normalize_by_area=False,
        enforce_units=True,
        convert_to_ugm3=False,
        flip_ref=False,
        flip_dev=False,
        use_cmap_RdBu=False,
        verbose=False,
        log_color_scale=False,
        extra_title_txt=None,
        extent=None,
        n_job=-1,
        sigdiff_list=None,
        second_ref=None,
        second_dev=None,
        spcdb_dir=os.path.dirname(__file__),
        sg_ref_path='',
        sg_dev_path='',
        ll_plot_func='imshow',
        **extra_plot_args
):
    """
    Create single-level 3x2 comparison map plots for variables common
    in two xarray Datasets. Optionally save to PDF.

    Args:
        refdata: xarray dataset
            Dataset used as reference in comparison
        refstr: str
            String description for reference data to be used in plots
        devdata: xarray dataset
            Dataset used as development in comparison
        devstr: str
            String description for development data to be used in plots

    Keyword Args (optional):
        varlist: list of strings
            List of xarray dataset variable names to make plots for
            Default value: None (will compare all common variables)
        ilev: integer
            Dataset level dimension index using 0-based system.
            Indexing is ambiguous when plotting differing vertical grids
            Default value: 0
        itime: integer
            Dataset time dimension index using 0-based system
            Default value: 0
        refmet: xarray dataset
            Dataset containing ref meteorology
            Default value: None
        devmet: xarray dataset
            Dataset containing dev meteorology
            Default value: None
        weightsdir: str
            Directory path for storing regridding weights
            Default value: None (will create/store weights in
            current directory)
        pdfname: str
            File path to save plots as PDF
            Default value: Empty string (will not create PDF)
        cmpres: str
            String description of grid resolution at which
            to compare datasets
            Default value: None (will compare at highest resolution
            of ref and dev)
        match_cbar: bool
            Set this flag to True if you wish to use the same colorbar
            bounds for the Ref and Dev plots.
            Default value: True
        normalize_by_area: bool
            Set this flag to True if you wish to normalize the Ref
            and Dev raw data by grid area. Input ref and dev datasets
            must include AREA variable in m2 if normalizing by area.
            Default value: False
        enforce_units: bool
            Set this flag to True to force an error if Ref and Dev
            variables have different units.
            Default value: True
        convert_to_ugm3: bool
            Whether to convert data units to ug/m3 for plotting.
            Default value: False
        flip_ref: bool
            Set this flag to True to flip the vertical dimension of
            3D variables in the Ref dataset.
            Default value: False
        flip_dev: bool
            Set this flag to True to flip the vertical dimension of
            3D variables in the Dev dataset.
            Default value: False
        use_cmap_RdBu: bool
            Set this flag to True to use a blue-white-red colormap
            for plotting the raw data in both the Ref and Dev datasets.
            Default value: False
        verbose: bool
            Set this flag to True to enable informative printout.
            Default value: False
        log_color_scale: bool
            Set this flag to True to plot data (not diffs)
            on a log color scale.
            Default value: False
        extra_title_txt: str
            Specifies extra text (e.g. a date string such as "Jan2016")
            for the top-of-plot title.
            Default value: None
        extent: list
            Defines the extent of the region to be plotted in form
            [minlon, maxlon, minlat, maxlat].
            Default value plots extent of input grids.
            Default value: [-1000, -1000, -1000, -1000]
        n_job: int
            Defines the number of simultaneous workers for parallel
            plotting.  Set to 1 to disable parallel plotting.
            Value of -1 allows the application to decide.
            Default value: -1
        sigdiff_list: list of str
            Returns a list of all quantities having significant
            differences (where |max(fractional difference)| > 0.1).
            Default value: None
        second_ref: xarray Dataset
            A dataset of the same model type / grid as refdata,
            to be used in diff-of-diffs plotting.
            Default value: None
        second_dev: xarray Dataset
            A dataset of the same model type / grid as devdata,
            to be used in diff-of-diffs plotting.
            Default value: None
        spcdb_dir: str
            Directory containing species_database.yml file.
            Default value: Path of GCPy code repository
        sg_ref_path: str
            Path to NetCDF file containing stretched-grid info
            (in attributes) for the ref dataset
            Default value: '' (will not be read in)
        sg_dev_path: str
            Path to NetCDF file containing stretched-grid info
            (in attributes) for the dev dataset
            Default value: '' (will not be read in)
        ll_plot_func: str
            Function to use for lat/lon single level plotting with
            possible values 'imshow' and 'pcolormesh'. imshow is much
            faster but is slightly displaced when plotting from
            dateline to dateline and/or pole to pole.
            Default value: 'imshow'
        extra_plot_args: various
            Any extra keyword arguments are passed through the
            plotting functions to be used in calls to pcolormesh() (CS)
            or imshow() (Lat/Lon).
"""

and generates a comparison plot such as:

_images/six_panel_single_level.png

Function compare_zonal_mean

def compare_zonal_mean(
        refdata,
        refstr,
        devdata,
        devstr,
        varlist=None,
        itime=0,
        refmet=None,
        devmet=None,
        weightsdir='.',
        pdfname="",
        cmpres=None,
        match_cbar=True,
        pres_range=None,
        normalize_by_area=False,
        enforce_units=True,
        convert_to_ugm3=False,
        flip_ref=False,
        flip_dev=False,
        use_cmap_RdBu=False,
        verbose=False,
        log_color_scale=False,
        log_yaxis=False,
        extra_title_txt=None,
        n_job=-1,
        sigdiff_list=None,
        second_ref=None,
        second_dev=None,
        spcdb_dir=os.path.dirname(__file__),
        sg_ref_path='',
        sg_dev_path='',
        ref_vert_params=None,
        dev_vert_params=None,
        **extra_plot_args
):
    """
    Creates 3x2 comparison zonal-mean plots for variables
    common in two xarray Datasets. Optionally save to PDF.

    Args:
        refdata: xarray dataset
            Dataset used as reference in comparison
        refstr: str
            String description for reference data to be used in plots
        devdata: xarray dataset
            Dataset used as development in comparison
        devstr: str
            String description for development data to be used in plots

    Keyword Args (optional):
        varlist: list of strings
            List of xarray dataset variable names to make plots for
            Default value: None (will compare all common 3D variables)
        itime: integer
            Dataset time dimension index using 0-based system
            Default value: 0
        refmet: xarray dataset
            Dataset containing ref meteorology
            Default value: None
        devmet: xarray dataset
            Dataset containing dev meteorology
            Default value: None
        weightsdir: str
            Directory path for storing regridding weights
            Default value: None (will create/store weights in
            current directory)
        pdfname: str
            File path to save plots as PDF
            Default value: Empty string (will not create PDF)
        cmpres: str
            String description of grid resolution at which
            to compare datasets
            Default value: None (will compare at highest resolution
            of Ref and Dev)
        match_cbar: bool
            Set this flag to True to use same the colorbar bounds
            for both Ref and Dev plots.
            Default value: True
        pres_range: list of two integers
            Pressure range of levels to plot [hPa]. The vertical axis
            will span the outer pressure edges of levels that contain
            pres_range endpoints.
            Default value: [0, 2000]
        normalize_by_area: bool
            Set this flag to True to to normalize raw data in both
            Ref and Dev datasets by grid area. Input ref and dev
            datasets must include AREA variable in m2 if normalizing
            by area.
            Default value: False
        enforce_units: bool
            Set this flag to True force an error if the variables in
            the Ref and Dev datasets have different units.
            Default value: True
        convert_to_ugm3: str
            Whether to convert data units to ug/m3 for plotting.
            Default value: False
        flip_ref: bool
            Set this flag to True to flip the vertical dimension of
            3D variables in the Ref dataset.
            Default value: False
        flip_dev: bool
            Set this flag to True to flip the vertical dimension of
            3D variables in the Dev dataset.
            Default value: False
        use_cmap_RdBu: bool
            Set this flag to True to use a blue-white-red colormap for
            plotting raw reference and development datasets.
            Default value: False
        verbose: logical
            Set this flag to True to enable informative printout.
            Default value: False
        log_color_scale: bool
            Set this flag to True to enable plotting data (not diffs)
            on a log color scale.
            Default value: False
        log_yaxis: bool
            Set this flag to True if you wish to create zonal mean
            plots with a log-pressure Y-axis.
            Default value: False
        extra_title_txt: str
            Specifies extra text (e.g. a date string such as "Jan2016")
            for the top-of-plot title.
            Default value: None
        n_job: int
            Defines the number of simultaneous workers for parallel
            plotting.  Set to 1 to disable parallel plotting.
            Value of -1 allows the application to decide.
            Default value: -1
        sigdiff_list: list of str
            Returns a list of all quantities having significant
            differences (where |max(fractional difference)| > 0.1).
            Default value: None
        second_ref: xarray Dataset
            A dataset of the same model type / grid as refdata,
            to be used in diff-of-diffs plotting.
            Default value: None
        second_dev: xarray Dataset
            A dataset of the same model type / grid as devdata,
            to be used in diff-of-diffs plotting.
            Default value: None
        spcdb_dir: str
            Directory containing species_database.yml file.
            Default value: Path of GCPy code repository
        sg_ref_path: str
            Path to NetCDF file containing stretched-grid info
            (in attributes) for the ref dataset
            Default value: '' (will not be read in)
        sg_dev_path: str
            Path to NetCDF file containing stretched-grid info
            (in attributes) for the dev dataset
            Default value: '' (will not be read in)
        ref_vert_params: list(AP, BP) of list-like types
            Hybrid grid parameter A in hPa and B (unitless).
            Needed if ref grid is not 47 or 72 levels.
            Default value: None
        dev_vert_params: list(AP, BP) of list-like types
            Hybrid grid parameter A in hPa and B (unitless).
            Needed if dev grid is not 47 or 72 levels.
            Default value: None
        extra_plot_args: various
            Any extra keyword arguments are passed through the
            plotting functions to be used in calls to pcolormesh()
            (CS) or imshow() (Lat/Lon).
    """

and generates a comparison plot such as:

_images/six_panel_zonal_mean.png

Shared structure

Both compare_single_level() and compare_zonal_mean() have four positional (required) arguments.

refdata : xarray.Dataset

Dataset used as reference in comparison

refstr : str OR list of str

String description for reference data to be used in plots OR list containing [ref1str, ref2str] for diff-of-diffs plots

devdata : xarray.Dataset

Dataset used as development in comparison

devstr : str OR list of str

String description for development data to be used in plots OR list containing [dev1str, dev2str] for diff-of-diffs plots

refstr and devstr title the top two panels of each six panel plot.

Functions compare_single_level() and compare_zonal_mean() share many arguments. Some of these arguments are plotting options that change the format of the plots:

For example, you may wish to convert units to ug/m3 when generating comparison plots of aerosol species. Activate this option by setting the keyword argument convert_to_ugm3=True.

Other arguments are necessary to achieve a correct plot depending on the format of refdata and devdata and require you to know certain traits of your input data. For example, you must specify if one of the datasets should be flipped vertically if Z coordinates in that dataset do not denote decreasing pressure as Z index increases, otherwise the vertical coordinates between your two datasets may be misaligned and result in an undesired plotting outcome. This may be done with by setting the boolean options flip_ref=True and/or flip_dev=True.

The n_job argument governs the parallel plotting settings of compare_single_level() and compare_zonal_mean() . GCPy uses the JobLib library to create plots in parallel. Due to limitations with matplotlib, this parallelization creates plots (pages) in parallel rather than individual panels on a single page. Parallel plot creation is not enabled when you do not save to a PDF. The default value of n_job=-1 allows the function call to automatically scale up to, at most, the number of cores available on your system.

Note

On systems with higher (12+) core counts, the maximum number of cores is not typically reached because of the process handling mechanics of JobLib. However, on lower-end systems with lower core counts or less available memory, it is advantageous to use n_job to limit the max number of processes.

Due to how Python handles memory management on Linux systems, using more cores may result in memory not returned to the system after the plots are created. Requesting fewer cores with n_job may help to avoid this situation.

Example script

Here is a basic script that calls both compare_zonal_mean() and compare_single_level():

#!/usr/bin/env python

import xarray as xr
import matplotlib.pyplot as plt
from gcpy.plot.compare_single_level import compare_single_level
from gcpy.plot.compare_zonal_mean import compare_zonal_mean

file1 = '/path/to/ref'
file2 = '/path/to/dev'
ds1 = xr.open_dataset(file1)
ds2 = xr.open_dataset(file2)
compare_zonal_mean(ds1, 'Ref run', ds2, 'Dev run')
plt.show()
compare_single_level(ds1, 'Ref run', ds2, 'Dev run')
plt.show()

Single panel plots

Function single_panel() (contained in GCPy module gcpy.plot.single_panel) is used to create plots containing only one panel of GEOS-Chem data. This function is used within compare_single_level() and compare_zonal_mean() to generate each panel plot. It can also be called directly on its own to quickly plot GEOS-Chem data in zonal mean or single level format.

Function: single_panel

Function single_panel() accepts the following arguments:

def single_panel(
        plot_vals,
        ax=None,
        plot_type="single_level",
        grid=None,
        gridtype="",
        title="fill",
        comap=WhGrYlRd,
        norm=None,
        unit="",
        extent=None,
        masked_data=None,
        use_cmap_RdBu=False,
        log_color_scale=False,
        add_cb=True,
        pres_range=None,
        pedge=np.full((1, 1), -1),
        pedge_ind=np.full((1, 1), -1),
        log_yaxis=False,
        xtick_positions=None,
        xticklabels=None,
        proj=ccrs.PlateCarree(),
        sg_path='',
        ll_plot_func="imshow",
        vert_params=None,
        pdfname="",
        weightsdir='.',
        vmin=None,
        vmax=None,
        return_list_of_plots=False,
        **extra_plot_args
):
    """
    Core plotting routine -- creates a single plot panel.

    Args:
        plot_vals: xarray.DataArray, numpy.ndarray, or dask.array.Array
            Single data variable GEOS-Chem output to plot

    Keyword Args (Optional):
        ax: matplotlib axes
            Axes object to plot information
            Default value: None (Will create a new axes)
        plot_type: str
            Either "single_level" or "zonal_mean"
            Default value: "single_level"
        grid: dict
            Dictionary mapping plot_vals to plottable coordinates
            Default value: {} (will attempt to read grid from plot_vals)
        gridtype: str
            "ll" for lat/lon or "cs" for cubed-sphere
            Default value: "" (will automatically determine from grid)
        title: str
            Title to put at top of plot
            Default value: "fill" (will use name attribute of plot_vals
            if available)
        comap: matplotlib Colormap
            Colormap for plotting data values
            Default value: WhGrYlRd
        norm: list
            List with range [0..1] normalizing color range for matplotlib
            methods. Default value: None (will determine from plot_vals)
        unit: str
            Units of plotted data
            Default value: "" (will use units attribute of plot_vals
            if available)
        extent: tuple (minlon, maxlon, minlat, maxlat)
            Describes minimum and maximum latitude and longitude of input
            data.  Default value: None (Will use full extent of plot_vals
            if plot is single level).
        masked_data: numpy array
            Masked area for avoiding near-dateline cubed-sphere plotting
            issues  Default value: None (will attempt to determine from
            plot_vals)
        use_cmap_RdBu: bool
            Set this flag to True to use a blue-white-red colormap
            Default value: False
        log_color_scale: bool
            Set this flag to True to use a log-scale colormap
            Default value: False
        add_cb: bool
            Set this flag to True to add a colorbar to the plot
            Default value: True
        pres_range: list(int)
            Range from minimum to maximum pressure for zonal mean
            plotting. Default value: [0, 2000] (will plot entire
            atmosphere)
        pedge: numpy array
            Edge pressures of vertical grid cells in plot_vals
            for zonal mean plotting.  Default value: np.full((1, 1), -1)
            (will determine automatically)
        pedge_ind: numpy array
            Index of edge pressure values within pressure range in
            plot_vals for zonal mean plotting.
            Default value: np.full((1, 1), -1) (will determine
            automatically)
        log_yaxis: bool
            Set this flag to True to enable log scaling of pressure in
            zonal mean plots.  Default value: False
        xtick_positions: list(float)
            Locations of lat/lon or lon ticks on plot
            Default value: None (will place automatically for
            zonal mean plots)
        xticklabels: list(str)
            Labels for lat/lon ticks
            Default value: None (will determine automatically from
            xtick_positions)
        proj: cartopy projection
            Projection for plotting data
            Default value: ccrs.PlateCarree()
        sg_path: str
            Path to NetCDF file containing stretched-grid info
            (in attributes) for plot_vals.
            Default value: '' (will not be read in)
        ll_plot_func: str
            Function to use for lat/lon single level plotting with
            possible values 'imshow' and 'pcolormesh'. imshow is much
            faster but is slightly displaced when plotting from dateline
            to dateline and/or pole to pole.  Default value: 'imshow'
        vert_params: list(AP, BP) of list-like types
            Hybrid grid parameter A in hPa and B (unitless). Needed if
            grid is not 47 or 72 levels.  Default value: None
        pdfname: str
            File path to save plots as PDF
            Default value: "" (will not create PDF)
        weightsdir: str
            Directory path for storing regridding weights
            Default value: "." (will store regridding files in
            current directory)
        vmin: float
            minimum for colorbars
            Default value: None (will use plot value minimum)
        vmax: float
            maximum for colorbars
            Default value: None (will use plot value maximum)
        return_list_of_plots: bool
            Return plots as a list. This is helpful if you are using
            a cubedsphere grid and would like access to all 6 plots
            Default value: False
        extra_plot_args: various
            Any extra keyword arguments are passed to calls to
            pcolormesh() (CS) or imshow() (Lat/Lon).

    Returns:
        plot: matplotlib plot
            Plot object created from input
    """

Function single_panel() expects data with a 1-length (or non-existent) T (time) dimension, as well as a 1-length or non-existent Z (vertical level) dimension.

single_panel() contains a few amenities to help with plotting GEOS-Chem data, including automatic grid detection for lat/lon or standard cubed-sphere xarray DataArray-s. You can also pass NumPy arrays to plot, though you’ll need to manually pass grid info in this case (with the gridtype, pedge, and pedge_ind keyword arguments).

The sample script shown below shows how you can data at a single level and timestep from an xarray.DataArray object.

#!/usr/bin/env python

import xarray as xr
import matplotlib.pyplot as plt
from gcpy.plot.single_panel import single_panel

# Read data from a file into an xr.Dataset object
dset = xr.open_dataset('GEOSChem.SpeciesConc.20160701_0000z.nc4')

# Extract ozone (v/v) from the xr.Dataset object,
# for time=0 (aka first timestep) and lev=0 (aka surface)
sfc_o3 = dset['SpeciesConcVV_O3'].isel(time=0).isel(lev=0)

# Plot the data!
single_panel(sfc_o3)
plt.show()