GCPy: GEOS-Chem Python toolkit¶
Welcome to the GCPy ReadTheDocs documentation! This site provides documentation on the functionality of GCPy and instructions for common use cases.
GCPy is a Python-based toolkit containing useful functions for working specifically with the GEOS-Chem model of atmospheric chemistry and composition. GCPy is primarily used for plotting/tabling GEOS-Chem output and regridding input files in special cases.
For documentation on setting up and running GEOS-Chem please see our list of manuals for GEOS-Chem and related software.
About GCPy¶
GCPy is a Python-based toolkit containing useful functions for working specifically with the GEOS-Chem model of atmospheric chemistry and composition.
GCPy aims to build on the well-established scientific Python technical stack, leveraging tools like cartopy and xarray to simplify the task of working with model output and performing atmospheric chemistry analyses.
What GCPy was intended to do¶
Produce plots and tables from GEOS-Chem output using simple function calls.
Generate the standard evaluation plots and tables from GEOS-Chem benchmark output.
Obtain GEOS-Chem’s horizontal/vertical grid information.
Implement GCHP-specific regridding functionalities (e.g. cubed-sphere to lat-lon regridding).
Provide example scripts for creating specific types of plots or analysis from GEOS-Chem output.
What GCPY was not intended to do¶
General NetCDF file modification: (crop a domain, extract some variables):
Use xarray instead.
Also see Work with netCDF data files at the GEOS-Chem ReadTheDocs site.
Statistical analysis:
Use scipy and scikit-learn tools instead.
Machine Learning:
Use the standard machine learning utilities (pytorch, tensorflow, julia, etc.).
Installing GCPy¶
GCPy and its dependencies can be installed using Conda in either standard user mode
(allow Conda to handle installation without Git support) or
development mode using Conda and Conda-build (install from a Git
clone). You can also manually install GCPy using a clone of the source
code, but this option requires you to add the package to your
PYTHONPATH
manually and to install properly versioned
dependencies on your own.
Requirements for GCPy¶
Software Prerequisites¶
GCPy is currently supported for Linux and MacOS operating systems. Due to a reliance on several packages without Windows support, GCPy is not currently supported for Windows. You will receive an error message if you attempt to install GCPy through Conda on Windows.
The only essential software package you need before installing GCPy is a distribution of the Conda package manager, which is used to install GCPy and its dependencies. It is also highly recommended that you create an environment in Conda for working with GCPy. Steps to setup Conda are described below:
Install Miniconda or Anaconda. Miniconda is much more lightweight and functions perfectly well for GCPy purposes, while Anaconda automatically includes many extra packages that are not directly relevant to GCPy.
After installing Miniconda or Anaconda, create a Conda environment for using GCPy. The basic usage (also found on the Conda Github hompeage) is:
# Navigate to the top-level GCPy folder cd /path/to/gcpy # Create a Conda environment for working with GCPy conda env create -n gcpy_env --file=environment.yml # Activate (enter) your new Conda environment $ conda activate gcpy_env # Deactivate (exit) your Conda environment $ conda deactivate
From within your Conda environment, you can follow the instructions on Installing normally through Conda (if you don’t plan on modifying GCPy source code) or Installing in development mode through Conda-build (for developers).
Python dependencies¶
Conda handles the installation of all dependencies for GCPy automatically. Most dependencies have minimum version requirements. We recommend using GCPy with Python 3.9. The list of dependencies (not including sub-dependencies) that are installed by Conda includes:
A full list of package version requirements may be found in
docs/source/environment.yml
. There is also a symbolic link to
this file from the top-level gcpy folder.
Installing GCPy for non-developers using Conda¶
GCPy is available through the conda-forge
channel under the
name geoschem-gcpy
. Installing GCPy in your Conda environment
requires two commands:
$ conda config --add channels conda-forge
$ conda install geoschem-gcpy
Conda will handle the installation of all dependencies and sub-dependencies for GCPy, which includes many Python packages and several non-Python libraries.
Installing GCPy for developers¶
If you wish to make changes to the GCPy source code with the goal of contributing to GCPy development, you will need to install GCPy from a clone of the GCPy Git repository:
$ git clone https://github.com/geoschem/gcpy.git
$ cd gcpy
$ conda config --add channels conda-forge
$ conda install geoschem-gcpy --only-deps
$ pip install -e .
Conda will handle the installation of dependencies when you install from this clone, and pip will point all GCPy links to this directory.
Overview of Capabilities¶
This page outlines the capabilities of GCPy with links to detailed function documentation.
Spatial Plotting¶
One hallmark of GCPy is easy-to-use spatial plotting of GEOS-Chem data. Available plotting falls into two layouts: single panel (one map of one variable from a dataset) and six panel (six maps comparing a variable between two datasets). The maps in these plots can display data at a single vertical level of your input dataset or in a zonal mean for all layers of the atmosphere.
Single Panel Plots¶
Single panel plots are generated through the
plot.single_panel()
function. plot.single_panel()
uses
Matplotlib and Cartopy plotting capabilities while handling certain
behind the scenes operations that are necessary for plotting GEOS-Chem
data, particularly for cubed-sphere and/or zonal mean data.
import xarray as xr
import gcpy.plot as gcplot
import matplotlib.pyplot as plt
# Read data
ds = xr.open_dataset(
'GEOSChem.Restart.20160701_0000z.nc4'
)
# plot surface Ozone over the North Pacific
gcplot.single_panel(
ds['SpeciesRst_O3'].isel(lev=0),
title='Surface Ozone over the North Pacific',
extent=[80, -90, -10, 60]
)
plt.show()

#plot global zonal mean of Ozone
gcplot.single_panel(
ds['SpeciesRst_O3'],
plot_type='zonal_mean',
title='Global Zonal Mean of Ozone'
)
plt.show()

Click here for an example single panel plotting script.
Click here for detailed documentation for single_panel()
.
Six Panel Comparison Plots¶
Six panel plots are used to compare results across two different model runs. Single level and zonal mean plotting options are both available. The two model runs do not need to be the same resolution or even the same grid type (GEOS-Chem Classic and GCHP output can be mixed at will).
import xarray as xr
import gcpy.plot as gcplot
import matplotlib.pyplot as plt
# Read data
gcc_ds = xr.open_dataset(
'GEOSChem.SpeciesConc.20160701_0000z.nc4'
)
gchp_ds = xr.open_dataset(
'GCHP.SpeciesConc.20160716_1200z.nc4'
)
#Plot comparison of surface ozone over the North Pacific
gcplot.compare_single_level(
gcc_ds,
'GEOS-Chem Classic',
gchp_ds,
'GCHP',
varlist=['SpeciesConc_O3'],
extra_title_txt='Surface'
)
plt.show()

#Plot comparison of global zonal mean ozone
gcplot.compare_zonal_mean(
gcc_ds,
'GEOS-Chem Classic',
gchp_ds,
'GCHP',
varlist=['SpeciesConc_O3']
)
plt.show()

Click here for an example six panel plotting script.
Click here
for complete documentation for compare_single_level()
and compare_zonal_mean()
.
Comprehensive Benchmark Plotting¶
The GEOS-Chem Support Team uses comprehensive plotting functions from
benchmark.py
to generate full plots of benchmark
diagnostics. Functions like
benchmark.make_benchmark_conc_plots
by default create plots
for every variable in a given collection
(e.g. SpeciesConc
) at multiple vertical levels (surface,
500hPa, zonal mean) and divide plots into separate folders based on
category (e.g. Chlorine, Aerosols). The GCST uses full benchmark
plotting / table scripts similar to this example to produce plots and tables for official
model benchmarks. Full documentation for the benchmark plotting
functions can be found
here.
Table Creation¶
GCPy has several dedicated functions for tabling GEOS-Chem output data in text file format. These functions and their outputs are primarily used for model benchmarking purposes.
Budget Tables¶
Currently, budget tables can be created for “operations” (table shows
change in mass after each category of model operation, as contained in
the GEOS-Chem Budget
diagnostics) or in overall averages for
different aerosols or the Transport Tracers simulation.
Operations budget tables are created using the
benchmark.make_benchmark_operations_budget
function and appear as
follows:

Full documentation for operations budget table creation can be found here.
Mass Tables¶
The benchmark.make_benchmark_mass_tables
function uses species
concentrations and info from meteorology files to generate the total
mass of species in certain segments of the atmosphere (currently
global or only the troposphere). An example table is shown below:

Full documentation for mass table creation can be found here.
Regridding¶
General Regridding Rules¶
GCPy supports regridding between all horizontal GEOS-Chem grid types, including latitude/longitude grids (the grid format of GEOS-Chem Classic), standard cubed-sphere (the standard grid format of GCHP), and stretched-grid (an optional grid format in GCHP). GCPy contains several horizontal regridding functions built off of xESMF. GCPy automatically handles most regridding needs when plotting GEOS-Chem data.
gcpy.file_regrid
allows you to regrid NetCDF files between
different grid types / resolutions and can be called from the
command line or as a function.
The 72-level and 47-level vertical grids are pre-defined in GCPy. Other vertical grids can also be defined if you provide the A and B coefficients of the hybrid vertical grid.
When plotting data of differing grid types or horizontal resolutions
using compare_single_level
or compare_zonal_mean
, you
can specify a comparison resolution using the cmpres
argument. This resolution will be used for the difference panels in
each plot (the bottom four panels rather than the top two raw data
panels). If you do not specify a comparison resolution, GCPy will
automatically choose one.
For more extensive regridding information, visit the detailed regridding documentation.
Plotting¶
This page describes in depth the plotting capabilities of GCPy, including possible argument values for every plotting function.
compare_single_level and compare_zonal_mean¶
gcpy.plot.compare_single_level()
and
gcpy.plot.compare_zonal_mean()
both generate six panel
plots comparing variables between two datasets. They share significant
structural overlap both in output appearance and code
implementation. This section gives an overview of the components of
the plots generated by these functions, their shared arguments, and
features unique to each function.
compare_single_level¶
def compare_single_level(refdata, refstr, devdata, devstr,
varlist=None, ilev=0, itime=0,
refmet=None, devmet=None, weightsdir='.',
pdfname="", cmpres=None, match_cbar=True,
normalize_by_area=False, enforce_units=True,
convert_to_ugm3=False, flip_ref=False, flip_dev=False,
use_cmap_RdBu=False, verbose=False, log_color_scale=False,
extra_title_txt=None, extent = [-1000, -1000, -1000, -1000],
n_job=-1, sigdiff_list=[], second_ref=None, second_dev=None,
spcdb_dir=os.path.dirname(__file__), sg_ref_path='', sg_dev_path='',
ll_plot_func='imshow', **extra_plot_args
):
compare_single_level()
features several keyword arguments that
are not relevant to compare_zonal_mean()
, including specifying
which level to plot, the lat/lon extent of the plots, and which
underlying matplotlib.plot
function to use for plotting.
Function-specific keyword arguments:¶
-
ilev
: int
¶ Dataset level dimension index using 0-based system
Default value: 0
-
extent
: list of float
¶ Defines the extent of the region to be plotted in form [minlon, maxlon, minlat, maxlat]. Default value plots extent of input grids.
Default value: [-1000, -1000, -1000, -1000]
-
ll_plot_func
: str
¶ Function to use for lat/lon single level plotting with possible values ‘imshow’ and ‘pcolormesh’. imshow is much faster but is slightly displaced when plotting from dateline to dateline and/or pole to pole.
Default value: ‘imshow’
-
**extra_plot_args
¶
Any extra keyword arguments are passed through the plotting functions to be used in calls to
pcolormesh()
(CS) orimshow()
(Lat/Lon).
compare_zonal_mean¶
def compare_zonal_mean(refdata, refstr, devdata, devstr,
varlist=None, itime=0, refmet=None, devmet=None,
weightsdir='.', pdfname="", cmpres=None,
match_cbar=True, pres_range=[0, 2000],
normalize_by_area=False, enforce_units=True,
convert_to_ugm3=False, flip_ref=False, flip_dev=False,
use_cmap_RdBu=False, verbose=False, log_color_scale=False,
log_yaxis=False, extra_title_txt=None, n_job=-1, sigdiff_list=[],
second_ref=None, second_dev=None, spcdb_dir=os.path.dirname(__file__),
sg_ref_path='', sg_dev_path='', ref_vert_params=[[],[]],
dev_vert_params=[[],[]], **extra_plot_args
):
compare_zonal_mean()
features several keyword arguments that
are not relevant to compare_single_level()
, including
specifying the pressure range to plot (defaulting to the complete
atmosphere), whether the y-axis of the plots (pressure) should be in
log format, and hybrid vertical grid parameters to pass if one or more
of Ref and Dev do not use the typical 72-level or 47-level grids.
Function-specific keyword arguments:¶
-
pres_range
: list of ints
¶ Pressure range of levels to plot [hPa]. The vertical axis will span the outer pressure edges of levels that contain pres_range endpoints.
Default value: [0,2000]
-
log_yaxis
: bool
¶ Set this flag to True if you wish to create zonal mean plots with a log-pressure Y-axis.
Default value: False
-
ref_vert_params
: list of list-like types
¶ Hybrid grid parameter A in hPa and B (unitless). Needed if ref grid is not 47 or 72 levels.
Default value: [[], []]
-
dev_vert_params
: list of list-like types
¶ Hybrid grid parameter A in hPa and B (unitless). Needed if dev grid is not 47 or 72 levels.
Default value: [[], []]
-
**extra_plot_args
¶
Any extra keyword arguments are passed through the plotting functions to be used in calls to
pcolormesh()
.
Single_panel¶
def single_panel(plot_vals, ax=None, plot_type="single_level",
grid={}, gridtype="", title="fill",comap=WhGrYlRd,
norm=[],unit="",extent=(None, None, None, None),
masked_data=None,use_cmap_RdBu=False,
log_color_scale=False, add_cb=True,
pres_range=[0, 2000], pedge=np.full((1, 1), -1),
pedge_ind=np.full((1,1), -1), log_yaxis=False,
xtick_positions=[], xticklabels=[], proj=ccrs.PlateCarree(),
sg_path='', ll_plot_func="imshow", vert_params=[[],[]],
pdfname="", return_list_of_plots=False **extra_plot_args
):
gcpy.plot.single_panel()
is used to create plots containing
only one panel of GEOS-Chem data. This function is used within
compare_single_level()
and compare_zonal_mean()
to
generate each panel plot. It can also be called directly on its own to
quickly plot GEOS-Chem data in zonal mean or single level format.
#!/usr/bin/env python
import xarray as xr
import gcpy.plot as gcplot
import matplotlib.pyplot as plt
ds = xr.open_dataset('GEOSChem.SpeciesConc.20160701_0000z.nc4')
#get surface ozone
plot_data = ds['SpeciesConc_O3'].isel(lev=0)
gcplot.single_panel(plot_data)
plt.show()
Currently single_panel()
expects data with a 1-length ( or
non-existent) time dimension, as well as a 1-length or non-existent Z
dimension for single level plotting, so you’ll need to do some
pre-processing of your input data as shown in the above code snippet.
single_panel()
contains a few amenities to help with plotting
GEOS-Chem data, including automatic grid detection for lat/lon or
standard cubed-sphere xarray DataArray
-s. You can also pass NumPy
arrays to plot, though you’ll need to manually pass grid info in this
case.
Arguments:¶
In addition to the specific arguments listed below, any other keyword
arguments will be forwarded to matplotlib.pyplot.imshow()
/
matplotlib.pyplot.pcolormesh()
.
-
plot_vals
: xarray.DataArray or numpy array
¶ Single data variable GEOS-Chem output to plot
-
ax
: matplotlib axes
¶ Axes object to plot information
Default value: None (Will create a new axes)
-
plot_type
: str
¶ Either “single_level” or “zonal_mean”
Default value: “single_level”
-
grid
: dict
¶ Dictionary mapping plot_vals to plottable coordinates
Default value: {} (will attempt to read grid from plot_vals)
-
gridtype
: str
¶ “ll” for lat/lon or “cs” for cubed-sphere
Default value: “” (will automatically determine from grid)
-
title
: str
¶ Title to put at top of plot
Default value: “fill” (will use name attribute of plot_vals if available)
-
comap
: matplotlib Colormap
¶ Colormap for plotting data values
Default value: WhGrYlRd
-
norm
: list
¶ List with range [0..1] normalizing color range for matplotlib methods
Default value: [] (will determine from plot_vals)
-
unit
: str
¶ Units of plotted data
Default value: “” (will use units attribute of plot_vals if available)
-
extent
: tuple (minlon
,
maxlon
,
minlat
,
maxlat)
¶
Describes minimum and maximum latitude and longitude of input data
Default value: (None, None, None, None) (Will use full extent of plot_vals if plot is single level.
-
masked_data
: numpy array
¶ Masked area for avoiding near-dateline cubed-sphere plotting issues
Default value: None (will attempt to determine from plot_vals)
-
use_cmap_RdBu
: bool
¶ Set this flag to True to use a blue-white-red colormap
Default value: False
-
log_color_scale
: bool
¶ Set this flag to True to use a log-scale colormap
Default value: False
-
add_cb
: bool
¶ Set this flag to True to add a colorbar to the plot
Default value: True
-
pres_range
: list of int
¶ Range from minimum to maximum pressure for zonal mean plotting
Default value: [0, 2000] (will plot entire atmosphere)
-
pedge
: numpy array
¶ Edge pressures of vertical grid cells in plot_vals for zonal mean plotting
Default value: np.full((1, 1), -1) (will determine automatically)
-
pedge_ind
: numpy array
¶ Index of edge pressure values within pressure range in plot_vals for zonal mean plotting
Default value: np.full((1, 1), -1) (will determine automatically)
-
log_yaxis
: bool
¶ Set this flag to True to enable log scaling of pressure in zonal mean plots
Default value: False
-
xtick_positions
: list of float
¶ Locations of lat/lon or lon ticks on plot
Default value: [] (will place automatically for zonal mean plots)
-
xticklabels
: list of str
¶ Labels for lat/lon ticks
Default value: [] (will determine automatically from xtick_positions)
-
sg_path
: str
¶ Path to NetCDF file containing stretched-grid info (in attributes) for plot_vals
Default value: ‘’ (will not be read in)
-
ll_plot_func
: str
¶ Function to use for lat/lon single level plotting with possible values ‘imshow’ and ‘pcolormesh’. imshow is much faster but is slightly displaced when plotting from dateline to dateline and/or pole to pole.
Default value: ‘imshow’
-
vert_params
: list(AP
,
BP)
of list-like types
¶ Hybrid grid parameter A in hPa and B (unitless). Needed if grid is not 47 or 72 levels.
Default value: [[], []]
-
pdfname
: str
¶ File path to save plots as PDF
Default value: “” (will not create PDF)
-
extra_plot_args
: various
¶ Any extra keyword arguments are passed to calls to pcolormesh() (CS) or imshow() (Lat/Lon).
Benchmark Plotting Functions¶
gcpy.benchmark
contains several functions for plotting
GEOS-Chem output in formats requested by the GEOS-Chem Steering
Committee. The primary use of these functions is to create plots of
most GEOS-Chem output variables divided into specific categories,
e.g. species categories such as Aerosols or Bromine for the
SpeciesConc diagnostic. In each category, these functions create
single level PDFs for the surface and 500hPa and zonal mean PDFs for
the entire a tmosphere and only the stratosphere (defined a 1-100hPa).
For make_benchmark_emis_plots()
, only single level plots at
the surface are produced.
All of these plotting functions include bookmarks within the generated PDFs that point to the pages containing each plotted quantity.
Thus these functions serve as tools for quickly creating comprehensive plots comparing two GEOS-Chem runs. These functions are used to create
the publicly available plots for 1-month and 1-year benchmarks of new versions of GEOS-Chem.
Many of these functions use pre-defined (via YAML files included in GCPy) lists of variables. If one dataset includes a variable but the other dataset does not, the data for that variable in the latter dataset will be considered to be NaN and will be plotted as such.
make_benchmark_aod_plots¶
def make_benchmark_aod_plots(ref, refstr, dev, devstr, varlist=None,
dst="./benchmark", subdst=None, overwrite=False, verbose=False,
log_color_scale=False, sigdiff_files=None, weightsdir='.', n_job=-1,
spcdb_dir=os.path.dirname(__file__)
):
"""
Creates PDF files containing plots of column aerosol optical
depths (AODs) for model benchmarking purposes.
"""
Function-specific keyword args:
-
varlist
: list of str
¶ List of AOD variables to plot. If not passed, then all AOD variables common to both Dev and Ref will be plotted. Use the varlist argument to restrict the number of variables plotted to the pdf file when debugging.
Default value: None
This function creates column optical depth plots using the Aerosols diagnostic output.
make_benchmark_conc_plots¶
def make_benchmark_conc_plots(ref, refstr, dev, devstr, dst="./benchmark",
subdst=None, overwrite=False, verbose=False, collection="SpeciesConc",
benchmark_type="FullChemBenchmark", plot_by_spc_cat=True, restrict_cats=[],
plots=["sfc", "500hpa", "zonalmean"], use_cmap_RdBu=False, log_color_scale=False,
sigdiff_files=None, normalize_by_area=False, cats_in_ugm3=["Aerosols", "Secondary_Organic_Aerosols"],
areas=None, refmet=None, devmet=None, weightsdir='.', n_job=-1, second_ref=None
second_dev=None, spcdb_dir=os.path.dirname(__file__)
):
"""
Creates PDF files containing plots of species concentration
for model benchmarking purposes.
"""
Function-specific keyword arguments:¶
-
collection
: str
¶ Name of collection to use for plotting.
Default value: “SpeciesConc”
-
benchmark_type:
str
¶ A string denoting the type of benchmark output to plot, either FullChemBenchmark or TransportTracersBenchmark.
Default value: “FullChemBenchmark”
-
plot_by_spc_cat:
logical
¶ Set this flag to False to send plots to one file rather than separate file per category.
Default value: True
-
restrict_cats
: list of str
¶ List of benchmark categories in benchmark_categories.yml to make plots for. If empty, plots are made for all categories.
Default value: empty
-
plots
: list of str
¶ List of plot types to create.
Default value: [‘sfc’, ‘500hpa’, ‘zonalmean’]
-
normalize_by_area:
bool
¶ Set this flag to true to enable normalization of data by surfacea area (i.e. kg s-1 –> kg s-1 m-2).
Default value: False
-
cats_in_ugm3:
list of str
¶ List of benchmark categories to to convert to ug/m3
Default value: [“Aerosols”, “Secondary_Organic_Aerosols”]
-
areas
: dict of xarray DataArray:
¶ Grid box surface areas in m2 on Ref and Dev grids.
Default value: None
-
refmet
: str
¶ Path name for ref meteorology
Default value: None
-
devmet
: str
¶ Path name for dev meteorology
Default value: None
-
second_ref:
str
¶ Path name for a second “Ref” (aka “Reference”) data set for diff-of-diffs plotting. This dataset should have the same model type and grid as ref.
Default value: None
-
second_dev:
str
¶ Path name for a second “Ref” (aka “Reference”) data set for diff-of-diffs plotting. This dataset should have the same model type and grid as ref.
Default value: None
This function creates species concentration plots using the
SpeciesConc
diagnostic output by default. This function is the
only benchmark plotting function that supports diff-of-diffs plotting,
in which 4 datasets are passed and the differences between two groups
of Ref datasets vs. two groups of Dev datasets is plotted (typically
used for comparing changes in GCHP vs. changes in GEOS-Chem Classic
across model versions). This is also the only benchmark plotting
function that sends plots to separate folders based on category
(as denoted by the plot_by_spc_cat
flag). The full list of
species categories is denoted in benchmark_categories.yml
(included in GCPy) as follows:
"""
FullChemBenchmark:
Aerosols:
Dust: DST1, DST2, DST3, DST4
Inorganic: NH4, NIT, SO4
OC_BC: BCPI, BCPO, OCPI, OCPO
SOA: Complex_SOA, Simple_SOA
Sea_Salt: AERI, BrSALA, BrSALC, ISALA, ISALC, NITs,
SALA, SALAAL, SALACL, SALC, SALCAL, SALCCL, SO4s
Bromine: Bry, BrOx, Br, Br2, BrCl, BrNO2, BrNO3, BrO,
CH3Br, CH2Br2, CHBr3, HOBr, HBr
Chlorine: Cly, ClOx, Cl, ClO, Cl2, Cl2O2, ClOO, ClNO2, ClNO3,
CCl4, CFCs, CH3Cl, CH2Cl2, CH3CCl3, CHCl3, HOCl, HCl, Halons, HCFCs, OClO
Iodine: Iy, IxOy, I, I2, IBr, ICl, IO, ION, IONO2, CH3I, CH2I2,
CH2ICl, CH2IBr, HI, HOI, OIO
Nitrogen: NOy, NOx, HNO2, HNO3, HNO4, MPAN, NIT, 'NO', NO2, NO3,
N2O5, MPN, PAN, PPN, N2O, NHx, NH3, NH4, MENO3, ETNO3, IPRNO3, NPRNO3
Oxidants: O3, CO, OH, NOx
Primary_Organics:
Alcohols: EOH, MOH
Biogenics: ISOP, MTPA, MTPO, LIMO
HCs: ALK4, BENZ, CH4, C2H6, C3H8, PRPE, TOLU, XYLE
ROy: H2O2, H, H2, H2O, HO2, O1D, OH, RO2
Secondary_Organic_Aerosols:
Complex_SOA: TSOA0, TSOA1, TSOA2, TSOA3, ASOA1, ASOA2, ASOA3,
ASOAN, TSOG0, TSOG1, TSOG2, TSOG3, ASOG1, ASOG2, ASOG3
Isoprene_SOA: INDIOL, LVOCOA, SOAIE, SOAGX
Simple_SOA: SOAP, SOAS
Secondary_Organics:
Acids: ACTA
Aldehydes: ALD2, CH2O, HPALDs, MACR
Epoxides: IEPOX
Ketones: ACET, MEK, MVK
Nitrates: ISOPN
Other: GLYX, HCOOH, MAP, RCHO
Peroxides: MP
Sulfur: SOx, DMS, OCS, SO2, SO4
TransportTracersBenchmark:
RnPbBeTracers: Rn222, Pb210, Pb210Strat, Be7, Be7Strat, Be10, Be10Strat
PassiveTracers: PassiveTracer, SF6Tracer, CH3ITracer, COAnthroEmis25dayTracer,
COAnthroEmis50dayTracer, COUniformEmis25dayTracer, GlobEmis90dayTracer,
NHEmis90dayTracer, SHEmis90dayTracer
"""
make_benchmark_emis_plots
def make_benchmark_emis_plots(ref, refstr, dev, devstr, dst="./benchmark",
subdst=None, plot_by_spc_cat=False, plot_by_hco_cat=False, overwrite=False,
verbose=False, flip_ref=False, flip_dev=False, log_color_scale=False,
sigdiff_files=None, weightsdir='.', n_job=-1, spcdb_dir=os.path.dirname(__file__)
):
"""
Creates PDF files containing plots of emissions for model
benchmarking purposes. This function is compatible with benchmark
simulation output only. It is not compatible with transport tracers
emissions diagnostics.
Remarks:
--------
(1) If both plot_by_spc_cat and plot_by_hco_cat are
False, then all emission plots will be placed into the
same PDF file.
(2) Emissions that are 3-dimensional will be plotted as
column sums.
"""
Function-specific keyword args:¶
-
plot_by_spc_cat
: bool
¶ - Set this flag to True to separate plots into PDF files
according to the benchmark species categories (e.g. Oxidants, Aerosols, Nitrogen, etc.) These categories are specified in the YAML file benchmark_species.yml.
Default value: False
-
plot_by_hco_cat
: bool
¶ Set this flag to True to separate plots into PDF files according to HEMCO emissions categories (e.g. Anthro, Aircraft, Bioburn, etc.)
Default value: False
-
flip_ref
: bool
¶ Set this flag to True to reverse the vertical level ordering in the “Ref” dataset (in case “Ref” starts from the top of atmosphere instead of the surface).
Default value: False
-
flip_dev
: bool
¶ Set this flag to True to reverse the vertical level ordering in the “Dev” dataset (in case “Dev” starts from the top of atmosphere instead of the surface).
Default value: False
This function generates plots of total emissions using output from HEMCO_diagnostics
(for GEOS-Chem Classic) and/or GCHP.Emissions
output files.
make_benchmark_jvalue_plots¶
def make_benchmark_jvalue_plots(ref, refstr, dev, devstr, varlist=None,
dst="./benchmark", subdst=None, local_noon_jvalues=False,
plots=["sfc", "500hpa", "zonalmean"],overwrite=False, verbose=False,
flip_ref=False, flip_dev=False, log_color_scale=False, sigdiff_files=None,
weightsdir='.', n_job=-1, spcdb_dir=os.path.dirname(__file__)
):
"""
Creates PDF files containing plots of J-values for model
benchmarking purposes.
Remarks:
--------
Will create 4 files containing J-value plots:
(1 ) Surface values
(2 ) 500 hPa values
(3a) Full-column zonal mean values.
(3b) Stratospheric zonal mean values
These can be toggled on/off with the plots keyword argument.
At present, we do not yet have the capability to split the
plots up into separate files per category (e.g. Oxidants,
Aerosols, etc.). This is primarily due to the fact that
we archive J-values from GEOS-Chem for individual species
but not family species. We could attempt to add this
functionality later if there is sufficient demand.
"""
Function-specific keyword args:¶
-
varlist
: list of str
¶ List of J-value variables to plot. If not passed, then all J-value variables common to both dev and ref will be plotted. The varlist argument can be a useful way of restricting the number of variables plotted to the pdf file when debugging.
Default value: None
-
local_noon_jvalues
: bool
¶ Set this flag to plot local noon J-values. This will divide all J-value variables by the JNoonFrac counter, which is the fraction of the time that it was local noon at each location.
Default value: False
-
plots
: list of strings
¶ List of plot types to create.
Default value: [‘sfc’, ‘500hpa’, ‘zonalmean’]
-
flip_ref
: bool
¶ Set this flag to True to reverse the vertical level ordering in the “Ref” dataset (in case “Ref” starts from the top of atmosphere instead of the surface).
Default value: False
-
flip_dev
: bool
¶ Set this flag to True to reverse the vertical level ordering in the “Dev” dataset (in case “Dev” starts from the top of atmosphere instead of the surface).
Default value: False
This function generates plots of J-values using the
JValues
GEOS-Chem output files.
make_benchmark_wetdep_plots¶
def make_benchmark_wetdep_plots(ref, refstr, dev, devstr, collection,
dst="./benchmark", datestr=None, overwrite=False, verbose=False,
benchmark_type="TransportTracersBenchmark", plots=["sfc", "500hpa", "zonalmean"],
log_color_scale=False, normalize_by_area=False, areas=None, refmet=None,
devmet=None, weightsdir='.', n_job=-1, spcdb_dir=os.path.dirname(__file__)
):
"""
Creates PDF files containing plots of species concentration
for model benchmarking purposes.
"""
Function-specific keyword args:¶
-
datestr
: str
¶ A string with date information to be included in both the plot pdf filename and as a destination folder subdirectory for writing plots
Default value: None
-
benchmark_type:
str
¶ A string denoting the type of benchmark output to plot, either FullChemBenchmark or TransportTracersBenchmark.
Default value: “FullChemBenchmark”
-
plots
: list of strings
¶ List of plot types to create.
Default value: [‘sfc’, ‘500hpa’, ‘zonalmean’]
-
normalize_by_area:
bool
¶ Set this flag to true to enable normalization of data by surfacea area (i.e. kg s-1 –> kg s-1 m-2).
Default value: False
-
areas
: dict of xarray DataArray:
¶ Grid box surface areas in m2 on Ref and Dev grids.
Default value: None
-
refmet
: str
¶ Path name for ref meteorology
Default value: None
-
devmet
: str
¶ Path name for dev meteorology
Default value: None
This function generates plots of wet deposition using
WetLossConv
and WetLossLS
GEOS-Chem output files.
It is currently primarily used for 1-Year Transport Tracer benchmarks,
plotting values for the following species as defined in
benchmark_categories.yml
:
"""
WetLossConv: Pb210, Pb210Strat, Be7, Be7Strat, Be10, Be10Strat
WetLossLS: Pb210, Pb210Strat, Be7, Be7Strat, Be10, Be10Strat
"""
Tabling¶
This page describes the tabling capabilities of GCPy, including possible argument values for every tabling function. These functions are primarily used for model benchmarking purposes. All tables are printed to text files.
Emissions tables¶
def make_benchmark_emis_tables(reflist, refstr, devlist,
devstr, dst="./benchmark", refmet=None, devmet=None,
overwrite=False, ref_interval=[2678400.0], dev_interval=[2678400.0],
spcdb_dir=os.path.dirname(__file__)
):
"""
Creates a text file containing emission totals by species and
category for benchmarking purposes.
"""
Arguments:¶
-
reflist:
list of str
¶ List with the path names of the emissions file or files (multiple months) that will constitute the “Ref” (aka “Reference”) data set.
-
refstr
: str
¶ A string to describe ref (e.g. version number)
-
devlist
: list of str
¶ List with the path names of the emissions file or files (multiple months) that will constitute the “Dev” (aka “Development”) data set
-
devstr
: str
¶ A string to describe dev (e.g. version number)
Keyword arguments:¶
-
dst
: str
¶ A string denoting the destination folder where the file containing emissions totals will be written.
Default value: ./benchmark
-
refmet
: str
¶ Path name for ref meteorology
Default value: None
-
devmet
: str
¶ Path name for dev meteorology
Default value: None
-
overwrite
: bool
¶ Set this flag to True to overwrite files in the destination folder (specified by the dst argument).
Default value: False
-
ref_interval
: list of float
¶ The length of the ref data interval in seconds. By default, interval is set to [2678400.0], which is the number of seconds in July (our 1-month benchmarking month).
Default value: [2678400.0]
-
dev_interval
: list of float
¶ The length of the dev data interval in seconds. By default, interval is set to [2678400.0], which is the number of seconds in July (our 1-month benchmarking month).
Default value: [2678400.0]
-
spcdb_dir
: str
¶ Directory of species_datbase.yml file
Default value: Directory of GCPy code repository
gcpy.benchmark.make_benchmark_emis_tables()
generates tables
of total emissions categorized by species or by inventory. These
tables contain total global emissions over the lengths of the Ref and
Dev datasets, as well as the differences between totals across the two
datasets. Passing a list of datasets as Ref or Dev (e.g. multiple
months of emissions files) will result in printing totals emissions
summed across all files in the list. Make sure to update the
literal:ref_interval and/or dev_interval
arguments if you
pass input that does not correspond with 1 31 day month.
Mass Tables¶
def make_benchmark_mass_tables(ref, refstr, dev, devstr,
varlist=None, dst="./benchmark", subdst=None, overwrite=False,
verbose=False, label="at end of simulation",
spcdb_dir=os.path.dirname(__file__),
ref_met_extra='', dev_met_extra=''
):
"""
Creates a text file containing global mass totals by species and
category for benchmarking purposes.
"""
Arguments:¶
-
reflist
: str
¶ Pathname that will constitute the “Ref” (aka “Reference”) data set.
-
refstr
: str
¶ A string to describe ref (e.g. version number)
-
dev
: list of str
¶ Pathname that will constitute the “Dev” (aka “Development”) data set. The “Dev” data set will be compared against the “Ref” data set.
-
devstr
: str
¶ A string to describe dev (e.g. version number)
Keyword arguments:¶
-
varlist
: list of str
¶ List of variables to include in the list of totals. If omitted, then all variables that are found in either “Ref” or “Dev” will be included. The varlist argument can be a useful way of reducing the number of variables during debugging and testing.
Default value: None
-
dst
: str
¶ A string denoting the destination folder where the file containing emissions totals will be written.
Default value: ./benchmark
-
subdst
: str
¶ A string denoting the sub-directory of dst where PDF files containing plots will be written. In practice, subdst is only needed for the 1-year benchmark output, and denotes a date string (such as “Jan2016”) that corresponds to the month that is being plotted.
Default value: None
-
overwrite
: bool
¶ Set this flag to True to overwrite files in the destination folder (specified by the dst argument).
Default value: False
-
verbose
: bool
¶ Set this flag to True to print extra informational output.
Default value: False.
-
spcdb_dir
: str
¶ Directory of species_datbase.yml file
Default value: Directory of GCPy code repository
-
ref_met_extra
: str
¶ Path to ref Met file containing area data for use with restart files which do not contain the Area variable. Default value : ‘’
-
dev_met_extra
: str
¶ Path to dev Met file containing area data for use with restart files which do not contain the Area variable.
Default value: ‘’
gcpy.benchmark.make_benchmark_mass_tables
is used to create
global mass tables of GEOS-Chem species from a
Restart
file. This function will create one table of total
mass by species from the earth’s surface to the top of the
stratosphere and one table for only the troposphere.
The tables contain total mass for each of the ref and dev datasets in
Gg, as well as absolute and percentage difference between the two
datasets. If your restart files do not contain an Area variable
(AREA
for GEOS-Chem Classic or Met_AREAM2
for
GCHP) then you will need to use the ref_met_extra
and/or
dev_met_extra
arguments to pass the paths of NetCDF files
containing the corresponding area variables (usually contained in
meteorology diagnostic output).
Operations Budget Tables¶
def make_benchmark_operations_budget(refstr, reffiles, devstr,
devfiles, ref_interval, dev_interval, benchmark_type=None,
label=None, col_sections=["Full", "Trop", "PBL", "Strat"],
operations=["Chemistry","Convection","EmisDryDep","Mixing",
"Transport","WetDep"], compute_accum=True,
require_overlap=False, dst='.', species=None, overwrite=True
):
"""
Prints the "operations budget" (i.e. change in mass after
each operation) from a GEOS-Chem benchmark simulation.
"""
Arguments:¶
-
refstr
: str
¶ Labels denoting the “Ref” versions
-
reffiles
: list of str
¶ Lists of files to read from the “Ref” version.
-
devstr
: str
¶ Labels denoting the “Dev” versions
-
devfiles
: list of str
¶ Lists of files to read from “Dev” version.
-
interval
: float
¶ Number of seconds in the diagnostic interval.
Keyword arguments:¶
-
benchmark_type
: str
¶ “TransportTracersBenchmark” or “FullChemBenchmark”.
Default value: None
-
label
: str
¶ Contains the date or date range for each dataframe title.
Default value: None
-
col_sections
: list of str
¶ List of column sections to calculate global budgets for. May include Strat eventhough not calculated in GEOS-Chem, but Full and Trop must also be present to calculate Strat.
Default value: [“Full”, “Trop”, “PBL”, “Strat”]
-
operations
: list of str
¶ List of operations to calculate global budgets for. Accumulation should not be included. It will automatically be calculated if all GEOS-Chem budget operations are passed and optional arg compute_accum is True.
Default value: [“Chemistry”,”Convection”,”EmisDryDep”, “Mixing”,”Transport”,”WetDep”]
-
compute_accum
: bool
¶ Optionally turn on/off accumulation calculation. If True, will only compute accumulation if all six GEOS-Chem operations budgets are computed. Otherwise a message will be printed warning that accumulation will not be calculated.
Default value: True
-
require_overlap
: bool
¶ Whether to calculate budgets for only species that are present in both Ref or Dev.
Default value: False
-
dst
: str
¶ Directory where plots & tables will be created.
Default value: ‘.’ (directory in which function is called)
-
species
: list of str
¶ List of species for which budgets will be created.
Default value: None (all species)
-
overwrite
: bool
¶ Denotes whether to overwrite existing budget file.
Default value: True
gcpy.benchmark.make_benchmark_operations_budget()
creates
tables of budgets for species separated by model operation. The tables
show budgets for each of the ref and dev datasets in Gg, as well as
absolute and percentage difference between the two datasets.
Note that total accumulation across all operations will only be
printed if you set compute_accum==True
and
all operations are included in operations
. Note also that
when using the non-local mixing scheme (default), 'Mixing'
includes emissions and dry deposition applied below the
PBL. 'EmisDryDep'
therefore only captures fluxes above the
PBL. When using full mixing, 'Mixing'
and
'EmisDryDep'
are fully separated.
Aerosol Budgets and Burdens¶
def make_benchmark_aerosol_tables(devdir, devlist_aero, devlist_spc,
devlist_met, devstr, year, days_per_mon, dst='./benchmark',
overwrite=False, is_gchp=False, spcdb_dir=os.path.dirname(__file__)
):
"""
Compute FullChemBenchmark aerosol budgets & burdens
"""
Arguments:¶
-
devdir:
str
¶ Path to development (“Dev”) data directory
-
devlist_aero
: list of str
¶ List of Aerosols collection files (different months)
-
devlist_spc
: list of str
¶ List of SpeciesConc collection files (different months)
-
devlist_met
: list of str
¶ List of meteorology collection files (different months)
-
devstr
: str
¶ Descriptive string for datasets (e.g. version number)
-
year
: str
¶ The year of the benchmark simulation (e.g. ‘2016’).
-
days_per_mon
: list of int
¶ List of number of days per month for all months
Keyword arguments:¶
-
dst
: str
¶ Directory where budget tables will be created.
Default value: ‘./benchmark’
-
overwrite
: bool
¶ Overwrite burden & budget tables? (default=True)
Default value: False
-
is_gchp
: bool
¶ Whether datasets are for GCHP
Default value: False
-
spcdb_dir
: str
¶ Directory of species_datbase.yml file
Default value: Directory of GCPy code repository
gcpy.benchmark.make_benchmark_aerosol_tables()
generates two
different tables using output from a single dataset. One
contains annual mean aerosol burdens in Tg in the stratosphere,
troposphere, and combined stratosphere and troposphere. The other
table shows annual global mean AOD in the stratosphere, troposphere,
and combined stratosphere and troposphere. Aerosol species used are
pre-defined in aod_species.yml
: BCPI, OCPI, SO4, DST1, SALA,
and SALC.
Regridding¶
This page describes the regridding capabilities of GCPy. GCPy currently supports regridding of data from GEOS-Chem restarts and output NetCDF files. Regridding is supported across any horizontal resolution and any grid type available in GEOS-Chem, including lat/lon (global or non-global), global standard cubed-sphere, and global stretched-grid. GCPy also supports arbitrary vertical regridding across different vertical resolutions. .. _regrid-plot:
Regridding for Plotting in GCPy¶
When plotting in GCPy (e.g. through compare_single_level()
or
compare_zonal_mean()
), the vast majority of regridding is
handled internally. You can optionally request a specific
horizontal comparison resolution in compare_single_level()`
and compare_zonal_mean()
. Note that all regridding in these
plotting functions only applies to the comparison panels (not the top
two panels which show data directly from each dataset). There are only
two scenarios where you will need to pass extra information to GCPy to
help it determine grids and to regrid when plotting.
Pass stretched-grid file paths¶
Stretched-grid parameters cannot currently be automatically determined
from grid coordinates. If you are plotting stretched-grid data in
compare_single_level()
or compare_zonal_mean()
(even
if regridding to another format), you need to use the
sg_ref_path
or sg_dev_path
arguments to pass the path
of your original stretched-grid restart file to GCPy.
If using single_panel()
, pass the file path using
sg_path
. Stretched-grid restart files created using GCPy
contain the specified stretch factor, target longitude, and
target latitude in their metadata. Currently, output files from
stretched-grid runs of GCHP do not contain any metadata that specifies
the stretched-grid used.
Pass vertical grid parameters for non-72/47-level grids¶
GCPy automatically handles regridding between different vertical grids
when plotting except when you pass a dataset that is not on the
typical 72-level or 47-level vertical grids. If using a different
vertical grid, you will need to pass the corresponding grid
parameters
using the ref_vert_params
or dev_vert_params
keyword
arguments.
Automatic regridding decision process¶
When you do not specify a horizontal comparison resolution using the
cmpres
argument in compare_single_level()
and
compare_zonal_mean()
, GCPy follows several steps to determine
what comparison resolution it should use:
If both input grids are lat/lon, use the highest resolution between them (don’t regrid if they are the same resolution).
Else if one grid is lat/lon and the other is cubed-sphere (standard or stretched-grid), use a 1x1.25 lat/lon grid.
Else if both grids are cubed-sphere and you are plotting zonal means, use a 1x1.25 lat/lon grid.
Else if both grids are standard cubed-sphere, use the highest resolution between them (don’t regrid if they are the same resolution).
Else if one or more grids is a stretched-grid, use the grid of the ref dataset.
For differing vertical grids, the smaller vertical grid is currently used for comparisons.
Regridding Files¶
You can regrid existing GEOS-Chem restart or output diagnostic files
between lat/lon and cubed-sphere formats using
gcpy.file_regrid
. gcpy.file_regrid
can either be
called directly from the command line using python -m
gcpy.file_regrid
or as a function
(gcpy.file_regrid.file_regrid()
) from a Python script or
interpreter. The syntax of file_regrid
is as follows:
def file_regrid(fin, fout, dim_format_in, dim_format_out,
cs_res_out=0, ll_res_out='0x0',
sg_params_in=[1.0, 170.0, -90.0], sg_params_out=[1.0, 170.0, -90.0]
):
"""
Regrids an input file to a new horizontal grid specification and saves it
as a new file.
"""
Required Arguments:¶
-
fin
: str
¶ The input filename
-
fout
: str
¶ The output filename (file will be overwritten if it already exists)
-
dim_format_in
: str
¶ Format of the input file’s dimensions (choose from: classic, checkpoint, diagnostic), where classic denotes lat/lon and checkpoint / diagnostic are cubed-sphere formats
-
dim_format_out
: str
¶ Format of the output file’s dimensions (choose from: classic, checkpoint, diagnostic), where classic denotes lat/lon and checkpoint / diagnostic are cubed-sphere formats
Optional arguments:¶
-
cs_res_out
: int
¶ The cubed-sphere resolution of the output dataset. Not used if dim_format_out is classic.
Default value: 0
-
ll_res_out
: str
¶ The lat/lon resolution of the output dataset. Not used if dim_format_out is not classic/
Default value: ‘0x0’
-
sg_params_in
: list[float
,
float
,
float]
¶
Input grid stretching parameters [stretch-factor, target longitude, target latitude]. Not used if dim_format_in is classic
Default value: [1.0, 170.0, -90.0] (No stretching)
-
sg_params_out
: list[float
,
float
,
float]
¶
Output grid stretching parameters [stretch-factor, target longitude, target latitude]. Not used if dim_format_out is classic.
Default value: [1.0, 170.0, -90.0] (No stretching)
There are three dimension formats available for regridding: classic
(GEOS-Chem Classic lat/lon format), checkpoint
(GCHP restart file
format), and diagnostic
(GCHP output file format). You can
regrid between any of these formats using file_regrid
, as well as
between different resolutions and/or grid-types within each dimension
format (e.g. standard cubed-sphere checkpoint to stretched-grid
checkpoint). Note that although the cs_res_out
and
ll_res_out
parameters are technically optional in the
function, you must specify at least one of these in your call to
file_regrid
.
As stated previously, you can either call
file_regrid.file_regrid()
directly or call it from the command
line using python -m gcpy.file_regrid ARGS
. An example command
line call (separated by line for readability) for regridding a C90
cubed-sphere restart file to a C48 stretched-grid with a stretch
factor of 3, a target longitude of 260.0, and a target latitude of
40.0 looks like:
python -m gcpy.file_regrid \
-i initial_GEOSChem_rst.c90_standard.nc \
--dim_format_in checkpoint \
-o sg_restart_c48_3_260_40.nc \
--cs_res_out 48 \
--sg_params_out 3.0 260.0 40.0 \
--dim_format_out checkpoint
Regridding with gridspec and sparselt¶
GCPy 1.3.0 and later supports regridding with the gridspec and sparselt utilities.
First-time setup¶
Install command line tool gridspec in your bin directory
$ pip install git+https://github.com/LiamBindle/gridspec.git
Make sure location of installation is added to path in your bashrc (or equivalent)
$ export PATH=/path/to/home/.local/bin:$PATH
Install sparselt as a python package.
$ conda install -c conda-forge sparselt==0.1.3
One-time setup per grid resolution combination¶
Create a directory structure to store files that you will use in regridding. Ideally this would be in a shared location where all of the GCPy users at your institution coud access it.
Navigate to this directory.
$ mkdir /path/to/RegridInfo
Within this top level directory, create two directories that will store grid information and regridding weights. Navigate to the grid information folder.
$ mkdir Grids $ mkdir Weights $ cd Grids
Create tilefiles (if cubed-sphere) and grid spec file for each input and output grid resolution (see also gridspec README):
For uniform cubed-sphere global grid, specify face side length.
For simplicity, keep all cubed-sphere data in subdirectories of the Grids folder.
$ mkdir c24 $ gridspec-create gcs 24 $ mv c24*.nc c24 $ mkdir c48 $ gridspec-create gcs 48 $ mv c48*.nc c48 ... etc for other grids ...
For cubed-sphere stretched grid, specify face side length, stretch factor, and target latitude and longitude:
$ mkdir sc24 $ gridspec-create sgcs 24 -s 2 -t 40 -100 $ mv *c24*.nc sc24
For uniform global lat-lon grid, specify the number of latitude and longitude grid boxes. For a list of optional settings, run the command gridspec-create latlon --help.
Create a subdirectory named latlon and move all of your latlon grid specification files there.
$ gridspec-create latlon 90 180 # Generic 1 x 1 grid $ gridspec-create latlon 46 72 -dc -pc -hp # GEOS-Chem Classic 4 x 5 $ gridspec-create latlon 91 144 -dc -pc -hp # GEOS-Chem Classic 2 x 2.5 $ gridspec-create latlon 361 576 -dc -pc -hp # MERRA-2 0.5 x 0.625 $ gridspec-create latlon 721 1172 -dc -pc -hp # GEOS-FP 0.25 x 0.3125 $ mkdir latlon $ mv regular_lat_lon*.nc latlon
(Optional) View contents of grid spec file:
$ gridspec-dump c24/c24_gridspec.nc ... etc. for other grids ...
Initialize your GCPy conda environmnt (which includes ESMF as a dependency):
$ conda activate gcpy_env
Navigate to the directory that will store the regridding weights. (Recall that we created this in created this in step #2.
$ cd /path/to/RegridInfo/Weights
Generate regridding weights (see also sparselt sample data files README), specifying the following:
Path to input file horizontal resolution grid spec netcdf file
Path to output file horizontal resolution grid spec netcdf file
Regridding type, e.g. conserve for conservative (string)
Name of output regridding weights file (include input and output resolutions)
Name of directory containing grid spec tilefiles
(gcpy_env) $ /ESMF_RegridWeightGen \ -s /path/to/RegridInfo/Grids/c48/c48_gridspec.nc \ -d /path/to/RegridInfo/Grids/regular_lat_lon_90x180.nc \ -m conserve \ -w ./regrid_weights_c48_to_latlon90x180.nc \ --tilefile_path /path/to/RegridInfo/Grids/c48 ... etc. for other grid combinations ...
(Optional) Consider using a bash script such as the one shown below if you need to create regridding weights to/from several grids.
#!/bin/bash # Generates regridding weights with ESMF_RegridWeightGen # The top-level directory containing Grids and Weights subdirectories # (EDIT AS NEEDED) main_dir="/path/to/RegridInfo" # Subdirectories for grid specifications and regridding weights grids_dir="${main_dir}/Grids" weights_dir="${main_dir}/Weights" # GCHP cubed-sphere grids (EDIT AS NEEDED) cs_list=(c24 c48 c90 c180 c360) # GCClassic lat-lon grids (EDIT AS NEEDED) ll_list=(46x72 91x144 361x576 721x1172) # Loop over cubed-sphere grids for cs in ${cs_list[@]}; do # Cubed-sphere gridspec file cs_grid_info="${grids_dir}/${cs}/${cs}_gridspec.nc" if [[ ! -f ${cs_grid_info} ]]; then echo "Could not find ${cs_grid_info}!" exit 1 fi # Loop over latlon grids for ll in ${ll_list[@]}; do # Latlon gridspec file ll_grid_info="${grids_dir}/latlon/regular_lat_lon_${ll}.nc" if [[ ! -f ${ll_grid_info} ]]; then echo "Could not find ${ll_grid_info}!" exit 1 fi # Cubed-sphere -> latlon regridding echo "----" echo "Regridding from ${cs} to ${ll}" weightfile="${weights_dir}/regrid_weights_${cs}_to_latlon${ll}.nc" ESMF_RegridWeightGen \ -s ${cs_grid_info} \ -d ${ll_grid_info} \ -m conserve \ -w ${weightfile} \ --tilefile_path ${grids_dir}/${cs} unset weightfile # Latlon -> cubed-sphere regridding echo "----" echo "Regridding from ${ll} to ${cs}" weightfile="${weights_dir}/regrid_weights_latlon${ll}_to_${cs}.nc" ESMF_RegridWeightGen \ -s ${ll_grid_info} \ -d ${cs_grid_info} \ -m conserve \ -w ${weightfile} \ --tilefile_path ${grids_dir}/${cs} unset weightfile done done
Sample regridding script¶
Once you have created the tilefiles and regridding weights, you can use them to regrid data files. Shown below is a sample Python script that you can modify.
#!/usr/bin/env python
# Imports
import xarray as xr
import sparselt.esmf
import sparselt.xr
# Create a linear transform object from the regridding weights file
# for the combination of source and target horizontal resolutions.
transform = sparselt.esmf.load_weights(
'path/to/RegridInfo/Weights/regrid_weights_c48_to_latlon90x180.nc',
input_dims=[('nf', 'Ydim', 'Xdim'), (6, 48, 48)]
output_dims=[('lat', 'lon'), (90, 180)],
)
# Open file to regrid as xarray DataSet.
ds = xr.open_dataset('my_data_c48.nc')
# Regrid the DataSet using the transform object.
ds = sparselt.xr.apply(transform, ds)
# Write xarray DataSet contents to netcdf file.
ds.to_netcdf("my_data_latlon90x180.nc")
Six Panel Plotting¶
#!/usr/bin/env python
"""
Six Panel Comparison Plots
--------------------------------------
This example script demonstrates the comparitive plotting capabilities of GCPy,
including single level plots as well as global zonal mean plots.
These comparison plots are frequently used to evaluate results from different runs / versions
of GEOS-Chem, but can also be used to compare results from different points in one run that
are stored in separate xarray datasets.
The example data described here is in lat/lon format, but the same code works equally
well for cubed-sphere (GCHP) data.
"""
#xarray allows us to read in any NetCDF file, the format of most GEOS-Chem diagnostics,
#as an xarray Dataset
import xarray as xr
ref_ds = xr.open_dataset('first_run/GEOSChem.Restart.20160801_0000z.nc4')
dev_ds = xr.open_dataset('second_run/GEOSChem.Restart.20160801_0000z.nc4')
import gcpy.plot as gcplot
"""
Single level plots
------------------
"""
#compare_single_level generates sets of six panel plots for data at a specified level in your datasets.
#By default, the level at index 0 (likely the surface) is plotted. Here we will plot data at ~500 hPa,
#which is located at index 21 in the standard 72-level and 47-level GMAO vertical grids.
ilev=21
#You likely want to look at the same variables across both of your datasets. If a variable is in
#one dataset but not the other, the plots will show NaN values for the latter.
#You can pass variable names in a list to these comparison plotting functions (otherwise all variables will plot).
varlist = ['SpeciesRst_O3', 'SpeciesRst_CO2']
#compare_single_level has many arguments which can be optionally specified. The first four arguments are required.
#They specify your first xarray Dataset, the name of your first dataset, your second xarray Dataset, and the name of
#your second dataset. Here we will also pass a specific level and the names of the variables you want to plot.
import matplotlib.pyplot as plt
gcplot.compare_single_level(ref_ds, 'Dataset 1', dev_ds, 'Dataset 2', ilev=ilev, varlist=varlist)
plt.show()
#Using plt.show(), you can view the plots interactively. You can also save out the plots to a PDF.
gcplot.compare_single_level(ref_ds, 'Dataset 1', dev_ds, 'Dataset 2', ilev=ilev, varlist=varlist, pdfname='single_level.pdf')
"""
Zonal Mean Plotting
-------------------
"""
#compare_zonal_mean generates sets of six panel plots containing zonal mean data across your dataset.
#compare_zonal_mean shares many of the same arguments as compare_single_level.
#You can specify pressure ranges in hPa for zonal mean plotting (by default every vertical level is plotted)
gcplot.compare_zonal_mean(ref_ds, 'Dataset 1', dev_ds, 'Dataset 2', pres_range=[0, 100], varlist=varlist, pdfname='zonal_mean.pdf')
Single Panel Plotting¶
#!/usr/bin/env python
"""
Global and Regional Single Panel Plots
--------------------------------------
This example script demonstrates the core single panel plotting capabilities of GCPy,
including global and regional single level plots as well as global zonal mean plots.
The example data described here is in lat/lon format, but the same code works equally
well for cubed-sphere (GCHP) data.
"""
#xarray allows us to read in any NetCDF file, the format of most GEOS-Chem diagnostics,
#as an xarray Dataset
import xarray as xr
ds = xr.open_dataset('GEOSChem.Restart.20160701_0000z.nc4')
#You can easily view the variables available for plotting using xarray.
#Each of these variables has its own xarray DataArray within the larger Dataset container.
print(ds.data_vars)
#Most variables have some sort of prefix; in this example all variables are
#prefixed with 'SpeciesRst_'. We'll select the DataArray for ozone.
da = ds.SpeciesRst_O3
#Printing a DataArray gives a summary of the dimensions and attributes of the data.
print(da)
#This Restart file has a time dimension of size 1, with 72 vertical levels,
#46 latitude indicies, and 72 longitude indices.
import gcpy.plot as gcplot
"""
Single level plots
------------------
"""
#gcpy.single_panel is the core plotting function of GCPy, able to create a one panel zonal mean or
#single level plot. Here we will create a single level plot of ozone at ~500 hPa.
#We must manually index into the level that we want to plot (index 22 in the standard 72-layer
#and 47-layer GMAO vertical grids).
slice_500 = da.isel(lev=22)
#single_panel has many arguments which can be optionally specified. The only argument you must always
#pass to a call to single_panel is the DataArray that you want to plot.
#By default, the created plot includes a colorbar with units read from the DataArray, an automatic title
#(the data variable name in the DataArray), and an extent equivalent to the full lat/lon extent of the DataArray
import matplotlib.pyplot as plt
gcplot.single_panel(slice_500)
plt.show()
#You can specify a specific area of the globe you would like plotted using the 'extent' argument,
#which uses the format [min_longitude, max_longitude, min_latitude, max_latitude] with bounds [-180, 180, -90, 90]
gcplot.single_panel(slice_500, extent=[50, -90, -10, 60])
plt.show()
#Other commonly used arguments include specifying a title and a colormap (defaulting to a White-Green-Yellow-Red colormap)
#You can find more colormaps at https://matplotlib.org/tutorials/colors/colormaps.html
gcplot.single_panel(slice_500, title='500mb Ozone over the North Pacific', comap = plt.cm.viridis,
log_color_scale=True, extent=[80, -90, -10, 60])
plt.show()
"""
Zonal Mean Plotting
-------------------
"""
#Use the plot_type argument to specify zonal_mean plotting
gcplot.single_panel(da, plot_type="zonal_mean")
plt.show()
#You can specify pressure ranges in hPa for zonal mean plot (by default every vertical level is plotted)
gcplot.single_panel(da, pres_range=[0, 100], log_yaxis=True, log_color_scale=True)
plt.show()
Benchmark Plotting / Tabling¶
Below is an example configuration file used to input the desired
options for the comprehensive benchmark comparison script
run_benchmark.py
. Additional configuration file examples can
be found in the benchmarks
directory of GCpy.
The run_benchmark.py
script allows one to perform benchmark
comparisons between any simulation duration supplied in the
configuration file provided the ref and dev simulations time periods
match. Additionally, if the durations specified are exactly one year,
then the corresponding bmk_type
specialty comparison script
will be run (either run_1yr_fullchem_benchmark.py
or
run_1yr_tt_benchmark.py
). Any other duration will run the standard
suite of benchmark comparisons.
To generate plots from a 1-month benchmark simulation, you would call
run_benchmark.py
as follows:
(gcpy_env) $ run_benchmark.py 1mo_benchmark.yml
Where 1mo_benchmark.yml
contains the following inputs:
&---
# =====================================================================
# Benchmark configuration file (**EDIT AS NEEDED**)
# customize in the following manner:
# (1) Edit the path variables so that they point to folders w/ model data
# (2) Edit the version strings for each benchmark simulation
# (3) Edit the switches that turn on/off creating of plots and tables
# (4) If necessary, edit labels for the dev and ref versions
# Note: When doing GCHP vs GCC comparisions gchp_dev will be compared
# to gcc_dev (not gcc_ref!). This ensures consistency in version names
# when doing GCHP vs GCC diff-of-diffs (mps, 6/27/19)
# =====================================================================
#
# Configuration for 1 month FullChemBenchmark
#
# paths:
# main_dir: High-level directory containing ref & dev rundirs
# results_dir: Directory where plots/tables will be created
# weights_dir: Path to regridding weights
# spcdb_dir: Folder in which the species_database.yml file is
# located. If set to "default", then will look for
# species_database.yml in one of the Dev rundirs.
#
paths:
main_dir: /n/holyscratch01/external_repos/GEOS-CHEM/gcgrid/geos-chem/validation/gcpy_test_data/1mon
results_dir: /path/to/BenchmarkResults
weights_dir: /n/holyscratch01/external_repos/GEOS-CHEM/gcgrid/gcdata/ExtData/GCHP/RegriddingWeights
spcdb_dir: default
#
# data: Contains configurations for ref and dev runs
# version: Version string (must not contain spaces)
# dir: Path to run directory
# outputs_subdir: Subdirectory w/ GEOS-Chem diagnostic files
# restarts_subdir: Subdirectory w/ GEOS-Chem restarts
# bmk_start: Simulation start date (YYYY-MM-DDThh:mm:ss)
# bmk_end: Simulation end date (YYYY-MM-DDThh:mm:ss)
# resolution: GCHP resolution string
#
data:
ref:
gcc:
version: GCC_ref
dir: GCC_ref
outputs_subdir: OutputDir
restarts_subdir: Restarts
bmk_start: "2019-07-01T00:00:00"
bmk_end: "2019-08-01T00:00:00"
gchp:
version: GCHP_ref
dir: GCHP_ref
outputs_subdir: OutputDir
restarts_subdir: Restarts
bmk_start: "2019-07-01T00:00:00"
bmk_end: "2019-08-01T00:00:00"
is_pre_13.1: False
is_pre_14.0: False
resolution: c24
dev:
gcc:
version: GCC_dev
dir: GCC_dev
outputs_subdir: OutputDir
restarts_subdir: Restarts
bmk_start: "2019-07-01T00:00:00"
bmk_end: "2019-08-01T00:00:00"
gchp:
version: GCHP_dev
dir: GCHP_dev
outputs_subdir: OutputDir
restarts_subdir: Restarts
bmk_start: "2019-07-01T00:00:00"
bmk_end: "2019-08-01T00:00:00"
is_pre_13.1: False
is_pre_14.0: False
resolution: c24
#
# options: Specify the types of comparisons to perform
#
options:
bmk_type: FullChemBenchmark
gcpy_test: True # Specify if this is a gcpy test validation run
comparisons:
gcc_vs_gcc:
run: True # True to run this comparison
dir: GCC_version_comparison
tables_subdir: Tables
gchp_vs_gcc:
run: True
dir: GCHP_GCC_comparison
tables_subdir: Tables
gchp_vs_gchp:
run: True
dir: GCHP_version_comparison
tables_subdir: Tables
gchp_vs_gcc_diff_of_diffs:
run: True
dir: GCHP_GCC_diff_of_diffs
#
# outputs: Types of output to generate (plots/tables)
#
outputs:
plot_conc: True
plot_emis: True
emis_table: True
plot_jvalues: True
plot_aod: True
mass_table: True
ops_budget_table: False
OH_metrics: True
ste_table: True # GCC only
plot_options: # Plot concentrations and emissions by category?
by_spc_cat: True
by_hco_cat: True
YAML configuration files for 1-year benchmarks
(1yr_fullchem_benchmark.yml
, 1yr_tt_benchmark.yml
) are
also provided in the benchmarks
folder.
Plot Timeseries¶
#!/usr/bin/env python
'''
Example of plotting timeseries data from GEOS-Chem and saving
the output to a PDF file. You can modify this for your particular
diagnostic output. This also contains a good overview of
This example script creates a PDF file with 2 pages.
Page 1:
-------
O3 from the first model layer (from the "SpeciesConc"
diagnostic collection is) plotted in blue.
O3 at 10 meter height (from the "SpeciesConc_10m"
diagnostic collection) is plotted in red.
Page 2:
-------
HNO3 from the first model layer (from the SpeciesConc
diagnostic collection is) plotted in blue.
HNO3 at 10 meter height (from the SpeciesConc_10m
diagnostic collection) is plotted in red.
You can of course modify this for your own particular applications.
Author:
-------
Bob Yantosca
yantosca@seas.harvard.edu
23 Aug 2019
'''
# Imports
import gcpy.constants as gcon
import os
import numpy as np
import matplotlib.dates as mdates
import matplotlib.ticker as mticker
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
import xarray as xr
import warnings
# Tell matplotlib not to look for an X-window, as we are plotting to
# a file and not to the screen. This will avoid some warning messages.
os.environ['QT_QPA_PLATFORM'] = 'offscreen'
# Suppress harmless run-time warnings (mostly about underflow in division)
warnings.filterwarnings('ignore', category=RuntimeWarning)
warnings.filterwarnings('ignore', category=UserWarning)
def find_files_in_dir(path, substrs):
'''
Returns a list of all files in a directory that match one or more
substrings.
Args:
-----
path : str
Path to the directory in which to search for files.
substrs : list of str
List of substrings used in the search for files.
Returns:
--------
file_list : list of str
List of files in the directory (specified by path)
that match all substrings (specified in substrs).
'''
# Initialize
file_list = []
# Walk through the given data directory. Then for each file found,
# add it to file_list if it matches text in search_list.
for root, directory, files in os.walk(path):
for f in files:
for s in substrs:
if s in f:
file_list.append(os.path.join(root, f))
# Return an alphabetically sorted list of files
file_list.sort()
return file_list
def find_value_index(seq, val):
'''
Finds the index of a numpy array that is close to a value.
Args:
-----
seq : numpy ndarray
An array of numeric values.
val : number
The value to search for in seq.
Returns:
--------
result : integer
The index of seq that has a value closest to val.
Remarks:
--------
This algorithm was found on this page:
https://stackoverflow.com/questions/48900977/find-all-indexes-of-a-numpy-array-closest-to-a-value
'''
r = np.where(np.diff(np.sign(seq - val)) != 0)
idx = r + (val - seq[r]) / (seq[r + np.ones_like(r)] - seq[r])
idx = np.append(idx, np.where(seq == val))
idx = np.sort(idx)
result = np.round(idx)
# NOTE: xarray needs integer values, so convert here!
return int(result[0])
def read_geoschem_data(path, collections):
'''
Returns an xarray Dataset containing timeseries data.
Args:
-----
path : str
Directory path where GEOS-Chem diagnostic output
files may be found.
collections: list of str
List of GEOS-Chem collections. Files for these
collections will be read into the xarray Dataset.
Returns:
--------
ds : xarray Dataset
A Dataset object containing the GEOS-Chem diagnostic
output corresponding to the collections that were
specified.
'''
# Get a list of variables that GCPy should not read.
# These are mostly variables introduced into GCHP with the MAPL v1.0.0
# update. These variables contain either repeated or non-standard
# dimensions that can cause problems in xarray when combining datasets.
skip_vars = gcon.skip_these_vars
# Find all files in the given
file_list = find_files_in_dir(path, collections)
# Return a single xarray Dataset containing data from all files
# NOTE: Need to add combine="nested" for xarray 0.15 and higher
v = xr.__version__.split(".")
if int(v[0]) == 0 and int(v[1]) >= 15:
return xr.open_mfdataset(file_list,
drop_variables=skip_vars,
combine="nested",
concat_dim=None)
else:
return xr.open_mfdataset(file_list,
drop_variables=skip_vars)
def plot_timeseries_data(ds, site_coords):
'''
Plots a timseries of data at a given (lat,lon) location.
Args:
-----
ds : xarray Dataset
Dataset containing GEOS-Chem timeseries data.
site_coords : tuple
Contains the coordinate (lat, lon) of a site location
at which the timeseries data will be plotted.
'''
# ----------------------------------------------------------------------
# Get the GEOS-Chem data for O3 and HNO3 corresponding to the
# location of the observational station. We will save these into
# xarray DataArray objects, which we'll need for plotting.
#
# YOU CAN EDIT THIS FOR YOUR OWN PARTICULAR APPLICATION!
# ----------------------------------------------------------------------
# Find the indices corresponding to the site lon and lat
lat_idx = find_value_index(ds.lat.values, site_coords[0])
lon_idx = find_value_index(ds.lon.values, site_coords[1])
# Save O3 from the first level (~60m height) (ppb) into a DataArray
O3_L1 = ds['SpeciesConc_O3'].isel(lon=lon_idx, lat=lat_idx, lev=0)
O3_L1 *= 1.0e9
O3_L1.attrs['units'] = 'ppbv'
# Save O3 @ 10m height into a DataArray
O3_10m = ds['SpeciesConc10m_O3'].isel(lon=lon_idx, lat=lat_idx)
O3_10m *= 1.0e9
O3_10m.attrs['units'] = 'ppbv'
# Save HNO3 from the first level (~60m height) into a DataArray
HNO3_L1 = ds['SpeciesConc_HNO3'].isel(lon=lon_idx, lat=lat_idx, lev=0)
HNO3_L1 *= 1.0e9
HNO3_L1.attrs['units'] = 'ppbv'
# Save HNO3 @ 10m height into a DataArray
HNO3_10m = ds['SpeciesConc10m_HNO3'].isel(lon=lon_idx, lat=lat_idx)
HNO3_10m *= 1.0e9
HNO3_10m.attrs['units'] = 'ppbv'
# ----------------------------------------------------------------------
# Create a PDF file of the plots
# ----------------------------------------------------------------------
# Get min & max days of the plot span (for setting the X-axis range).
# To better center the plot, add a cushion of 12 hours on either end.
time = ds['time'].values
datemin = np.datetime64(time[0]) - np.timedelta64(12, 'h')
datemax = np.datetime64(time[-1]) + np.timedelta64(12, 'h')
# Define a PDF object so that we can save the plots to PDF
pdf = PdfPages('O3_and_HNO3.pdf')
# Loop over number of desired pages (in this case, 2)
for i in range(0, 2):
# Create a new figure: 1 plot per page, 2x as wide as high
figs, ax0 = plt.subplots(1, 1, figsize=[12, 6])
# -----------------------------
# Plot O3 on the first page
# -----------------------------
if i == 0:
# 1st model level
O3_L1.plot.line(ax=ax0, x='time', color='blue',
marker='o', label='O3 from 1st model level',
linestyle='-')
# 10 mheight
O3_10m.plot.line(ax=ax0, x='time', color='red',
marker='x', label='O3 at 10m height',
linestyle='-')
# Set title (has to be after the line plots are drawn)
ax0.set_title('O3 from the 1st model level and at 10m height')
# Set Y-axis minor tick marks at every 2 ppb (5 intervals)
ax0.yaxis.set_minor_locator(mticker.AutoMinorLocator(5))
# Set y-axis title
ax0.set_ylabel('O3 (ppbv)')
# -----------------------------
# Plot HNO3 on the second page
# -----------------------------
if i == 1:
# 1st model level
HNO3_L1.plot.line(ax=ax0, x='time', color='blue',
marker='o', label='HNO3 from 1st model level',
linestyle='-')
# 10m height
HNO3_10m.plot.line(ax=ax0, x='time', color='red',
marker='x', label='HNO3 at 10m height',
linestyle='-')
# Set title (has to be after the line plots are drawn
ax0.set_title('HNO3 from the 1st model level and at 10m height')
# Set Y-axis minor tick marks at every 0.05 ppb (4 intervals)
ax0.yaxis.set_minor_locator(mticker.AutoMinorLocator(4))
# Set y-axis title
ax0.set_ylabel('HNO3 (ppbv)')
# -----------------------------
# Set general plot parameters
# -----------------------------
# Add the plot legend
ax0.legend()
# Set the X-axis range
ax0.set_xlim(datemin, datemax)
# Set the X-axis major tickmarks
locator = mdates.DayLocator()
formatter = mdates.DateFormatter('%d')
ax0.xaxis.set_major_locator(locator)
ax0.xaxis.set_major_formatter(formatter)
# Set X-axis minor tick marks at noon of each day
# (i.e. split up the major interval into 2 bins)
ax0.xaxis.set_minor_locator(mticker.AutoMinorLocator(2))
# Don't rotate the X-axis jtick labels
ax0.xaxis.set_tick_params(rotation=0)
# Center the X-axis tick labels
for tick in ax0.xaxis.get_major_ticks():
tick.label1.set_horizontalalignment('center')
# Set X-axis and Y-axis labels
ax0.set_xlabel('Day of July (and August) 2016')
# -----------------------------
# Save this page to PDF
# -----------------------------
pdf.savefig(figs)
plt.close(figs)
# ----------------------------------------------------------------------
# Save the PDF file to disk
# ----------------------------------------------------------------------
pdf.close()
def main():
'''
Main program.
'''
# Path where the data files live
# (YOU MUST EDIT THIS FOR YUR OWN PARTICULAR APPLICATION!)
path_to_data = '/path/to/GEOS-Chem/diagnostic/data/files'
# Get a list of files in the ConcAboveSfc and SpeciesConc collections
# (YOU CAN EDIT THIS FOR YOUR OWN PARTICULAR APPLICATION!)
collections = ['ConcAboveSfc', 'SpeciesConc']
# Read GEOS-Chem data into an xarray Dataset
ds = read_geoschem_data(path_to_data, collections)
# Plot timeseries data at Centerville, AL (32.94N, 87.18W)
# (YOU CAN EDIT THIS FOR YOUR OWN PARTICULAR APPLICATION!)
site_coords = (32.94, -87.18)
plot_timeseries_data(ds, site_coords)
if __name__ == "__main__":
main()
Convert BPCH to NetCDF¶
#!/usr/bin/env python
'''
Example script that illustrates how to create a netCDF file
from an old GEOS-Chem binary punch ("bpch") file.
'''
# Imports
import gcpy
import xarray as xr
import xbpch as xb
import warnings
# Suppress harmless run-time warnings (mostly about underflow in division)
warnings.filterwarnings('ignore', category=RuntimeWarning)
warnings.filterwarnings('ignore', category=UserWarning)
# ----------------------------------------------------------------------
# User configurable settings (EDIT THESE ACCORDINGLY)
# ----------------------------------------------------------------------
# Name of Bpch file
bpchfile = '/path/to/bpch/file'
# tracerinfo.dat and diaginfo,dat fiels
tinfo_file = '/path/to/tracerinfo.dat'
dinfo_file = '/path/to/diaginfo.dat'
# Name of netCDF file
ncfile = '/path/to/netcdf/file'
# Date string for the time:units attribute
datestr = 'YYYY-MM-DD'
# Number of seconds in the diagnostic interval (assume 1-month)
interval = 86400.0 * 31.0
# ----------------------------------------------------------------------
# Open the bpch file and save it into an xarray Dataset object
# NOTE: For best results, also specify the corresponding
# tracerinfo.dat diaginfo.dat metadata files.
# ----------------------------------------------------------------------
try:
ds = xb.open_bpchdataset(filename=bpchfile,
tracerinfo_file=tinfo_file,
diaginfo_file=dinfo_file)
except FileNotFoundError:
print('Could not find file {}'.format(bpchfile))
raise
# ----------------------------------------------------------------------
# Further manipulate the Dataset
# ----------------------------------------------------------------------
# Transpose the order of the xarray Dataset object read by
# xbpch so that its dimensions will be in the same order as
# Dataset objects read from netCDF files.
ds = ds.transpose()
# Convert the bpch variable names to the same naming
# convention as the netCDF ("History") diagnostics.
ds = gcpy.convert_bpch_names_to_netcdf_names(ds)
# xbpch does not include a time dimension, so we'll add one here
coords = ds.coords
coords['time'] = 0.0
# ------------------------------------------------------------------
# Further edit variable attributes
# ------------------------------------------------------------------
for v in ds.data_vars.keys():
# Append time to the data array
ds[v] = xr.concat([ds[v]], 'time')
# Add long_name attribute for COARDS netCDF compliance
ds[v].attrs['long_name'] = ds[v].attrs['full_name']
# Remove some extraneous attributes that xbpch sets
del ds[v].attrs['name']
del ds[v].attrs['full_name']
del ds[v].attrs['scale_factor']
del ds[v].attrs['hydrocarbon']
del ds[v].attrs['tracer']
del ds[v].attrs['category']
del ds[v].attrs['chemical']
del ds[v].attrs['original_shape']
del ds[v].attrs['origin']
del ds[v].attrs['number']
del ds[v].attrs['molwt']
del ds[v].attrs['C']
# Make the units attribute consistent with the units
# attribute from the GEOS-Chem History diagnostics
# NOTE: There probably is a more Pythonic way to code
# this, but this will work for sure.
if 'ug/m3' in ds[v].units:
ds[v].attrs['units'] = 'ug m-3'
if 'ug Celsius/m3' in ds[v].units:
ds[v].attrs['units'] = 'ug C m-3'
if 'count/cm3' in ds[v].units:
ds[v].attrs['units'] = 'molec m-3'
if 'cm/s' in ds[v].units:
ds[v].attrs['units'] = 'cm s-1'
if 'count/cm2/s' in ds[v].units:
ds[v].attrs['units'] = 'molec cm-2 s-1'
if 'kg/m2s' in ds[v].units:
ds[v].attrs['units'] = 'kg m-2 s-1'
if 'kg/m2/s' in ds[v].units:
ds[v].attrs['units'] = 'kg m-2 s-1'
if 'kg/s' in ds[v].units:
ds[v].attrs['units'] = 'kg s-1'
if 'W/m2' in ds[v].units:
ds[v].attrs['units'] = 'W m-2'
if 'm/s' in ds[v].units:
ds[v].attrs['units'] = 'm s-1'
if 'Pa/s' in ds[v].units:
ds[v].attrs['units'] = 'Pa s-1'
if 'g/kg' in ds[v].units:
ds[v].attrs['units'] = 'g kg-1'
if v.strip() == 'TotalOC':
ds[v].attrs['units'] = 'ug m-3'
if v.strip() in [ 'HO2concAfterChem']:
ds[v].attrs['units'] = 'ppb'
if v.strip() in ['O1DconcAfterChem',
'O3PconcAfterChem',
'OHconcAfterChem']:
ds[v].attrs['units'] = 'molec cm-3'
if v.strip() in ['Loss_CO', 'Prod_CO',
'Loss_Ox', 'Prod_Ox', 'Prod_SO4']:
ds[v].attrs['units'] = 'molec/cm3/s'
if v.strip() in 'Met_CLDTOPS':
ds[v].attrs['units'] = 'level'
if v.strip() in 'Met_PHIS':
ds[v].attrs['units'] = 'm2 s-1'
if v.strip() in ['Met_PRECCON', 'Met_PRECTOT']:
ds[v].attrs['units'] = 'kg m-2 s-1'
if v.strip() in 'Met_AVGW':
ds[v].attrs['units'] = 'vol vol-1'
if v.strip() in 'Met_AIRNUMDEN':
ds[v].attrs['units'] = 'molec cm-3'
if v.strip() in ['ProdCOfromCH4', 'ProdCOfromNMVOC']:
ds[v].attrs['units'] = 'molec cm-3 s-1'
# Convert these prodloss diagnostics from kg (bpch) to kg/s
# to be consistent with the GEOS-Chem History diagnostics
# NOTE: Assume a 1-month interval (
if v.strip() in ['ProdSO4fromH2O2inCloud', 'ProdSO4fromO3inCloud',
'ProdSO4fromO2inCloudMetal', 'ProdSO4fromO3inSeaSalt',
'ProdSO4fromHOBrInCloud', 'ProdSO4fromSRO3',
'ProdSO4fromSRHObr', 'ProdSO4fromO3s']:
ds[v].attrs['units'] = 'kg S s-1'
ds[v] = ds[v] / interval
if v.strip() in ['LossHNO3onSeaSalt']:
ds[v].attrs['units'] = 'kg s-1'
ds[v] = ds[v] / interval
# ------------------------------------------------------------------
# Edit attributes for coordinate dimensions
# ------------------------------------------------------------------
# Time
ds['time'].attrs['long_name'] = 'time'
ds['time'].attrs['units'] = \
'hours since {} 00:00:00.00 UTC'.format(datestr)
ds['time'].attrs['calendar'] = 'standard'
ds['time'].attrs['axis'] = 'T'
# "lon", "lat", "lev"
ds['lon'].attrs['axis'] = 'X'
ds['lat'].attrs['axis'] = 'Y'
ds['lev'].attrs['axis'] = 'Z'
ds['lev'].attrs['units'] = 'level'
# Global title
ds.attrs['title'] = 'Created by bpch2nc.py'
ds.attrs['conventions'] = 'COARDS'
ds.attrs['references'] = 'www.geos-chem.org; wiki.geos-chem.org'
# ------------------------------------------------------------------
# Create the netCDF file
# ------------------------------------------------------------------
ds.to_netcdf(ncfile)
Report a Problem or Request a Feature¶
If you encounter an error when using GCPy or if any documentation is unclear, you should open a new issue on the GCPy Github page. Pre-defined templates exist for asking a question or reporting a bug / issue.
We are open to adding new functionality to GCPy as requested by its userbase. Some requested functionality may be better suited to example scripts rather than direct code additions to GCPy. In that case, we can add examples to the Example Scripts section of this ReadTheDocs site.
Contribute to GCPy¶
We welcome new code additions to GCPy in the form of pull requests. If you have an example you
would like to add to this ReadTheDocs site, you can add it to the
examples
folder in the GCPy repository and submit a pull
request with this added file. If you would like to suggest
changes to the documentation on this site, you can do so by
describing your changes in a Github issue or by directly editing
the source ReST files included in the GCPy repository and
submitting a pull request with your changes.
We do not currently have an automated testing pipeline operational for
GCPy. We ask that you test any changes by plotting / tabling relevant
diagnostics using the run_benchmark.py
plotting scripts
included in the benchmark
folder of the repository, then
verifying your results against the results of the same script using an
unchanged version of GCPy. Any further testing before finalizing your
pull request is greatly appreciated.
Editing these docs¶
This documentation is generated with Sphinx. This page describes how to contribute to the GCPy documentation.
Quick start¶
You need the Sphinx Python to build (and therefore edit) this documentation. Assuming you already have Python installed, install Sphinx:
$ pip install sphinx
To build the documentation, navigate to gcpy/docs
and make the html target:
gcuser:~$ cd gcpy/docs
gcuser:~/gcpy/docs$ make html
This will generate the HTML documentation in gcpy/docs/build/html
from the reST files in
gcpy/docs/source
. You can view this local HTML documentation by opening
index.html
in your web-browser.
Note
You can clean the documentation with make clean
.
Learning reST¶
Writing reST can be a bit tricky at first. Whitespace matters (just like in Python), and some directives can be easily miswritten. Two important things you should know right away are:
Indents are 3-spaces
“Things” are separated by 1 blank line (e.g., a list or code-block following a paragraph should be separated from the paragraph by 1 blank line)
You should keep these in mind when you’re first getting started. Dedicating an hour to learning reST will save you time in the long-run. Below are some good resources for learning reST.
reStructuredText primer: (single best resource; however, it’s better read than skimmed)
Official reStructuredText reference (there is a lot of information here)
Presentation by Eric Holscher (co-founder of Read The Docs) at DjangoCon US 2015 (the entire presentation is good, but reST is described from 9:03 to 21:04)
A good starting point would be Eric Holscher’s presentations followed by reading the reStructuredText primer.
Style guidelines¶
Important
This documentation is written in semantic markup. This is important so that the documentation remains maintainable by the GEOS-Chem Support Team. Before contributing to this documentation, please review our style guidelines. When editing the documentation, please refrain from using elements with the wrong semantic meaning for aesthetic reasons. Aesthetic issues should be addressed by changes to the theme (not changes to reST files).
For titles and headers:
H1 titles should be underlined by
#
charactersH2 headers should be underlined by
-
charactersH3 headers should be underlined by
^
charactersH4 headers should be avoided, but if necessary, they should be underlined by
"
characters
File paths occuring in the text should use the :literal:
role.
Inline code, or references to variables in code, occuring in the text should use the :code:
role.
Code snippets should use .. code-block:: <language>
directive like so
.. code-block:: python
import gcpy
print("hello world")
The language can be “none” to omit syntax highlighting.
For command line instructions, the “console” language should be used. The $
should be used
to denote the console’s prompt. If the current working directory is relevant to the instructions,
a prompt like gcuser:~/path1/path2$
should be used.
Inline literals (such as the $
above) should use the :literal:
role.
Releasing new versions¶
This page describes some of the steps required for releasing new versions of GCPy on Github, PyPi, and conda-forge.
For clarity, update version numbers to the new release in the following locations:
setup.py
_version.py
docs/source/conf.py
benchmark/run_benchmark.py
benchmark/modules/run_1yr_fullchem_benchmark.py
benchmark/modules/run_1yr_tt_benchmark.py
Update
CHANGELOG.md
Merge dev into main
Publish the release on Github.
Install
twine
usingpip install twine
(if you haven’t done this before).
To package GCPy for publication to PyPi, run the following from the root of your local GCPy repository:
$ run python setup.py sdist bdist_wheel $ run twine check dist/* $ run twine upload --repository-url https://test.pypi.org/legacy/ dist/*
Enter your login credentials for
test.pypi.org
as requested. Publishing to test.pypi ensures there are no issues with packaging the new release before publication to the primary PyPi database.
Publish to PyPi by running
run twine upload dist/*
, and enter your login information for pypi.org as requested.
Verify the new release is visible at https://pypi.org/project/geoschem-gcpy/ (may take a few minutes).
After a period of time (around an hour), you will be notified of a new PR at https://github.com/conda-forge/geoschem-gcpy-feedstock indicating conda-forge has detected a new release on PyPi. You should be able to merge this PR without any additinal interference once all checks have passed.
Once the feedstock PR has been merged and after another period of waiting, you should see builds for the new release when running
conda search -f geoschem-gcpy
. This indicates the new version is publicly available for installation through conda-forge.