gcpy.util

Internal utilities for helping to manage xarray and numpy objects used throughout GCPy

Functions

add_bookmarks_to_pdf(pdfname, varlist[, ...])

Adds bookmarks to an existing PDF file.

add_missing_variables(refdata, devdata[, ...])

Compares two xarray Datasets, "Ref", and "Dev".

add_nested_bookmarks_to_pdf(pdfname, ...[, ...])

Add nested bookmarks to PDF.

all_zero_or_nan(dset)

Return whether dset is all zeros, or all NaNs.

array_equals(refdata, devdata[, dtype])

Tests two arrays for equality.

check_for_area(dset[, gcc_area_name, ...])

Makes sure that a dataset has a surface area variable contained within it.

compare_stats(refdata, refstr, devdata, ...)

Prints out global statistics (array sizes, mean, min, max, sum) from two xarray Dataset objects.

compare_varnames(refdata, devdata[, ...])

Finds variables that are common to two xarray Dataset objects.

convert_bpch_names_to_netcdf_names(dset[, ...])

Function to convert the non-standard bpch diagnostic names to names used in the GEOS-Chem netCDF diagnostic outputs.

convert_lon(data[, dim, fmt, neg_dateline])

Convert longitudes from -180..180 to 0..360, or vice-versa.

copy_file_to_dir(ifile, dest)

Convenience wrapper for shutil.copyfile, used to copy a file to a directory.

create_blank_dataarray(name, sizes, coords, ...)

Given an xarray DataArray dr, returns a DataArray object with the same dimensions, coordinates, attributes, and name, but with its data set to missing values (default=NaN) everywhere.

create_display_name(diagnostic_name)

Converts a diagnostic name to a more easily digestible name that can be used as a plot title or in a table of totals.

dataset_mean(dset[, dim, skipna])

Convenience wrapper for taking the mean of an xarray Dataset.

dataset_reader(multi_files[, verbose])

Returns a function to read an xarray Dataset.

dict_diff(dict0, dict1)

Function to take the difference of two dict objects.

divide_dataset_by_dataarray(dset, darr[, ...])

Divides variables in an xarray Dataset object by a single DataArray object.

extract_pathnames_from_log(filename[, ...])

Returns a list of pathnames from a GEOS-Chem log file.

filter_names(names[, text])

Returns elements in a list that match a given substring.

format_number_for_table(number[, ...])

Returns a format string for use in the "print_totals" routine.

get_area_from_dataset(dset)

Convenience routine to return the area variable (which is usually called "AREA" for GEOS-Chem "Classic" or "Met_AREAM2" for GCHP) from an xarray Dataset object.

get_diff_of_diffs(ref, dev)

Generate datasets containing differences between two datasets.

get_element_of_series(series, element)

Returns a specified element of a pd.Series object.

get_emissions_varnames(commonvars[, template])

Will return a list of emissions diagnostic variable names that contain a particular search string.

get_filepath(datadir, col, date[, is_gchp, ...])

Routine to return file path for a given GEOS-Chem "Classic" (aka "GCC") or GCHP diagnostic collection and date.

get_filepaths(datadir, collections, dates[, ...])

Routine to return filepaths for a given GEOS-Chem "Classic" (aka "GCC") or GCHP diagnostic collection.

get_gcc_filepath(outputdir, collection, day, ...)

Routine for getting filepath of GEOS-Chem Classic output.

get_gchp_filepath(outputdir, collection, ...)

Routine for getting filepath of GCHP output.

get_molwt_from_metadata(metadata, spc_name)

Extracts molecular weight [g/mol] from a dictionary containing species metadata.

get_nan_mask(data)

Create a mask with NaN values removed from an input array.

get_shape_of_data(data[, vertical_dim, ...])

Convenience routine to return the shape (and dimensions, if requested) of an xarray Dataset, or xarray DataArray.

get_variables_from_dataset(dset, varlist)

Convenience routine to return multiple selected DataArray variables from an xarray Dataset.

insert_text_into_file(filename, search_text, ...)

Convenience routine to insert text into a file.

make_directory(dir_name, overwrite)

Creates a directory where benchmark plots/tables will be placed.

print_totals(ref, dev, ofile, diff_list[, masks])

Computes and prints Ref and Dev totals (as well as the difference Dev - Ref) for two xarray DataArray objects.

read_config_file(config_file[, quiet])

Reads configuration information from a YAML file.

read_species_metadata(files[, quiet])

Reads species metadata from multiple files and returns a dict containing metadata for the union of species.

rename_and_flip_gchp_rst_vars(dset)

Transforms a GCHP restart dataset to match GCClassic names and level conventions.

replace_whitespace(string[, repl_char])

Replaces whitespace in a string with underscores.

reshape_MAPL_CS(darr)

Reshapes data if contains dimensions indicate MAPL v1.0.0+ output (i.e. reshapes from "diagnostic" to "checkpoint" dimension format.).

slice_by_lev_and_time(dset, varname, itime, ...)

Given a Dataset, returns a DataArray sliced by desired time and level.

trim_cloud_benchmark_label(label)

Removes the first part of the cloud benchmark label string (e.g. "gchp-c24-1Hr", "gcc-4x5-1Mon", etc) to avoid clutter.

unique_values(this_list[, drop])

Given a list, returns a sorted list of unique values.

verify_variable_type(var, var_type)

Convenience routine that will raise a TypeError if a variable's type does not match a list of expected types.

wrap_text(text[, width])

Wraps text so that it fits within a certain line width.

gcpy.util.convert_lon(data, dim='lon', fmt='atlantic', neg_dateline=True)[source]

Convert longitudes from -180..180 to 0..360, or vice-versa.

Parameters:
  • data (xarray DataArray or Dataset) – The container holding the data to be converted; the dimension indicated by ‘dim’ must be associated with this container.

  • dim (str, optional) – Name of dimension holding the longitude coordinates. Default value: ‘lon’

  • fmt (str, optional) – Control whether or not to shift from -180..180 to 0..360 (‘pacific’) or from 0..360 to -180..180 (‘atlantic’). Default value: ‘atlantic’

  • neg_dateline (bool, optional) – If True, then the international dateline is set to -180 instead of 180. Default value: True

Returns:

data – Data with dimension ‘dim’ altered according to conversion rule.

Return type:

xarray DataArray or Dataset

gcpy.util.get_emissions_varnames(commonvars, template=None)[source]

Will return a list of emissions diagnostic variable names that contain a particular search string.

Parameters:
  • commonvars (list of str) – A list of common variable names from two data sets. (This can be obtained with method gcpy.util.compare_varnames)

  • template (str, optional) – String template for matching variable names corresponding to emission diagnostics by sector. Default value: None

Returns:

varnames – A list of variable names corresponding to emission diagnostics for a given species and sector.

Return type:

list of str

gcpy.util.create_display_name(diagnostic_name)[source]

Converts a diagnostic name to a more easily digestible name that can be used as a plot title or in a table of totals.

Parameters:

diagnostic_name (str) – Name of the diagnostic to be formatted.

Returns:

display_name – Formatted name that can be used as plot titles or in tables of emissions totals.

Return type:

str

Notes

Assumes that diagnostic names will start with either “Emis” (for emissions by category) or “Inv” (for emissions by inventory). This should be an OK assumption to make since this routine is specifically geared towards model benchmarking.

gcpy.util.format_number_for_table(number, max_thresh=100000000.0, min_thresh=1e-06, f_fmt='18.6f', e_fmt='18.8e')[source]

Returns a format string for use in the “print_totals” routine. If the number is greater than a maximum threshold or smaller than a minimum threshold, then use scientific notation format. Otherwise use floating-point format.

Special case: do not convert 0.0 to exponential notation.

Parameters:
  • number (float) – Number to be printed.

  • max_thresh (float, optional) – If |number| > max_thresh, use scientific notation. Default value: 1.0e8

  • min_thresh (float, optional) – If |number| < min_thresh, use scientific notation. Default value: 1.0e-6

  • f_fmt (str, optional) – The default floating point format string. Default value: ‘18.6f’

  • e_fmt (str, optional) – The default scientific notation format string. Default value: ‘18.8e’

Returns:

fmt_str – Formatted string that can be inserted into the print statement in print_totals.

Return type:

str

gcpy.util.print_totals(ref, dev, ofile, diff_list, masks=None)[source]

Computes and prints Ref and Dev totals (as well as the difference Dev - Ref) for two xarray DataArray objects.

Parameters:
  • ref (xarray DataArray) – The first DataArray to be compared (aka “Reference”).

  • dev (xarray DataArray) – The second DataArray to be compared (aka “Development”).

  • ofile (file) – File object denoting a text file where output will be directed.

  • diff_list (list) – List to which species names with nonzero differences will be appended.

  • masks (dict of xarray DataArray, optional) – Dictionary containing the tropospheric mask arrays for Ref and Dev. If this keyword argument is passed, then print_totals will print tropospheric totals. Default value: None (i.e. print whole-atmosphere totals)

Returns:

diff_list – Updated list with species names appended where differences were found.

Return type:

list

Notes

This is an internal method. It is meant to be called from method create_total_emissions_table or create_global_mass_table instead of being called directly.

gcpy.util.add_bookmarks_to_pdf(pdfname, varlist, remove_prefix='', verbose=False)[source]

Adds bookmarks to an existing PDF file.

Parameters:
  • pdfname (str) – Name of an existing PDF file of species or emission plots to which bookmarks will be attached.

  • varlist (list) – List of variables, which will be used to create the PDF bookmark names.

  • remove_prefix (str, optional) – Specifies a prefix to remove from each entry in varlist when creating bookmarks. For example, if varlist has a variable name “SpeciesConcVV_NO”, and you specify remove_prefix=”SpeciesConcVV_”, then the bookmark for that variable will be just “NO”, etc.

  • verbose (bool, optional) – Set this flag to True to print extra informational output. Default value: False

gcpy.util.add_nested_bookmarks_to_pdf(pdfname, category, catdict, warninglist, remove_prefix='')[source]

Add nested bookmarks to PDF.

Parameters:
  • pdfname (str) – Path of PDF to add bookmarks to.

  • category (str) – Top-level key name in catdict that maps to contents of PDF.

  • catdict (dict) – Dictionary containing key-value pairs where one top-level key matches category and has value fully describing pages in PDF. The value is a dictionary where keys are level 1 bookmark names, and values are lists of level 2 bookmark names, with one level 2 name per PDF page. Level 2 names must appear in catdict in the same order as in the PDF.

  • warninglist (list of str) – Level 2 bookmark names to skip since not present in PDF.

  • remove_prefix (str, optional) – Prefix to be removed from warninglist names before comparing with level 2 bookmark names in catdict. Default value: empty string (warninglist names match names in catdict)

gcpy.util.add_missing_variables(refdata, devdata, verbose=False, **kwargs)[source]

Compares two xarray Datasets, “Ref”, and “Dev”. For each variable that is present in “Ref” but not in “Dev”, a DataArray of missing values (i.e. NaN) will be added to “Dev”. Similarly, for each variable that is present in “Dev” but not in “Ref”, a DataArray of missing values will be added to “Ref”. This routine is mostly intended for benchmark purposes, so that we can represent variables that were removed from a new GEOS-Chem version by missing values in the benchmark plots. NOTE: This function assuming incoming datasets have the same sizes and dimensions, which is not true if comparing datasets with different grid resolutions or types.

Parameters:
  • refdata (xarray Dataset) – The “Reference” (aka “Ref”) dataset.

  • devdata (xarray Dataset) – The “Development” (aka “Dev”) dataset.

  • verbose (bool, optional) – Toggles extra debug print output. Default value: False

Returns:

  • refdata (xarray Dataset) – The returned “Ref” dataset, with placeholder missing value variables added.

  • devdata (xarray Dataset) – The returned “Dev” dataset, with placeholder missing value variables added.

gcpy.util.reshape_MAPL_CS(darr)[source]

Reshapes data if contains dimensions indicate MAPL v1.0.0+ output (i.e. reshapes from “diagnostic” to “checkpoint” dimension format.)

Parameters:

darr (xarray DataArray) – The input data array.

Returns:

darr – The modified data array (w/ dimensions renamed & transposed).

Return type:

xarray DataArray

Notes

Currently only used for GCPy plotting code.

gcpy.util.get_diff_of_diffs(ref, dev)[source]

Generate datasets containing differences between two datasets.

Parameters:
  • ref (xarray Dataset) – The Ref (aka “Reference”) dataset.

  • dev (xarray Dataset) – The Dev (aka “Development”) dataset.

Returns:

  • absdiffs (xarray Dataset) – Absolute differences (Dev - Ref).

  • fracdiffs (xarray Dataset) – Fractional differences (Dev / Ref).

gcpy.util.slice_by_lev_and_time(dset, varname, itime, ilev, flip)[source]

Given a Dataset, returns a DataArray sliced by desired time and level.

Parameters:
  • dset (xarray Dataset) – Dataset containing GEOS-Chem data.

  • varname (str) – Variable name for data variable to be sliced.

  • itime (int) – Index of time by which to slice.

  • ilev (int) – Index of level by which to slice.

  • flip (bool) – Whether to flip ilev to be indexed from ground or top of atmosphere.

Returns:

darr – DataArray of data variable sliced according to ilev and itime.

Return type:

xarray DataArray

gcpy.util.rename_and_flip_gchp_rst_vars(dset)[source]

Transforms a GCHP restart dataset to match GCClassic names and level conventions.

Parameters:

dset (xarray Dataset) – The input dataset.

Returns:

dset – If the input dataset is from a GCHP restart file, then dset will contain the original data with variables renamed to match the GEOS-Chem Classic naming conventions, and with levels indexed as lev:positive=”up”. Otherwise, the original data will be returned.

Return type:

xarray Dataset

gcpy.util.dict_diff(dict0, dict1)[source]

Function to take the difference of two dict objects. Assumes that both objects have the same keys.

Parameters:
  • dict0 (dict) – The first dictionary (to be subtracted from dict1).

  • dict1 (dict) – The second dictionary (dict1 - dict0).

Returns:

result – Key-by-key difference of dict1 - dict0.

Return type:

dict

gcpy.util.compare_varnames(refdata, devdata, refonly=None, devonly=None, quiet=False)[source]

Finds variables that are common to two xarray Dataset objects.

Parameters:
  • refdata (xarray Dataset) – The first Dataset to be compared. (This is often referred to as the “Reference” Dataset.)

  • devdata (xarray Dataset) – The second Dataset to be compared. (This is often referred to as the “Development” Dataset.)

  • quiet (bool, optional) – Set this flag to True if you wish to suppress printing informational output to stdout. Default value: False

Returns:

vardict – Dictionary containing several lists of variable names:

commonvarslist of str

List of variables that are common to both refdata and devdata.

commonvarsOtherlist of str

List of variables that are common to both refdata and devdata, but do not have lat, lon, and/or level dimensions (e.g. index variables).

commonvars2Dlist of str

List of variables that are common to both refdata and devdata, and that have lat and lon dimensions, but not level.

commonvars3Dlist of str

List of variables that are common to refdata and devdata, and that have lat, lon, and level dimensions.

commonvarsDatalist of str

List of all common 2D or 3D data variables, excluding index variables. This is the list of “plottable” variables.

refonlylist of str

List of 2D or 3D variables that are only present in refdata.

devonlylist of str

List of 2D or 3D variables that are only present in devdata.

Return type:

dict of lists of str

gcpy.util.compare_stats(refdata, refstr, devdata, devstr, varname)[source]

Prints out global statistics (array sizes, mean, min, max, sum) from two xarray Dataset objects.

Parameters:
  • refdata (xarray Dataset) – The first Dataset to be compared. (This is often referred to as the “Reference” Dataset.)

  • refstr (str) – Label for refdata to be used in the printout.

  • devdata (xarray Dataset) – The second Dataset to be compared. (This is often referred to as the “Development” Dataset.)

  • devstr (str) – Label for devdata to be used in the printout.

  • varname (str) – Variable name for which global statistics will be printed out.

gcpy.util.convert_bpch_names_to_netcdf_names(dset, verbose=False)[source]

Function to convert the non-standard bpch diagnostic names to names used in the GEOS-Chem netCDF diagnostic outputs.

Parameters:
  • dset (xarray Dataset) – The xarray Dataset object whose names are to be replaced.

  • verbose (bool, optional) – Set this flag to True to print informational output. Default value: False

Returns:

ds_new – A new xarray Dataset object with all of the bpch-style diagnostic names replaced by GEOS-Chem netCDF names.

Return type:

xarray Dataset

Notes

To add more diagnostic names, edit the dictionary contained in the bpch_to_nc_names.yml.

gcpy.util.filter_names(names, text='')[source]

Returns elements in a list that match a given substring. Can be used in conjunction with compare_varnames to return a subset of variable names pertaining to a given diagnostic type or species.

Parameters:
  • names (list of str) – Input list of names.

  • text (str, optional) – Target text string for restricting the search. Default value: ‘’

Returns:

filtered_names – Returns all elements of names that contain the substring specified by the ‘text’ argument. If ‘text’ is omitted, then the original contents of names will be returned.

Return type:

list of str

gcpy.util.divide_dataset_by_dataarray(dset, darr, varlist=None)[source]

Divides variables in an xarray Dataset object by a single DataArray object. Will also make sure that the Dataset variable attributes are preserved. This method can be useful for certain types of model diagnostics that have to be divided by a counter array. For example, local noontime J-value variables in a Dataset can be divided by the fraction of time it was local noon in each grid box, etc.

Parameters:
  • dset (xarray Dataset) – The Dataset object containing variables to be divided.

  • darr (xarray DataArray) – The DataArray object that will be used to divide the variables of dset.

  • varlist (list of str, optional) – If passed, then only those variables of dset that are listed in varlist will be divided by darr. Otherwise, all variables of dset will be divided by darr. Default value: None

Returns:

dset – A new xarray Dataset object with its variables divided by darr.

Return type:

xarray Dataset

gcpy.util.get_shape_of_data(data, vertical_dim='lev', return_dims=False)[source]

Convenience routine to return the shape (and dimensions, if requested) of an xarray Dataset, or xarray DataArray. Can also take as input a dictionary of sizes (i.e. {‘time’: 1, ‘lev’: 72, …}) from an xarray Dataset or xarray DataArray object.

Parameters:
  • data (xarray Dataset, xarray DataArray, or dict) – The data for which the size is requested.

  • vertical_dim (str, optional) – Specify the vertical dimension that you wish to return: ‘lev’ or ‘ilev’. Default value: ‘lev’

  • return_dims (bool, optional) – Set this switch to True if you also wish to return a list of dimensions in the same order as the tuple of dimension sizes. Default value: False

Returns:

  • shape (tuple of int) – Tuple containing the sizes of each dimension of data in order: (time, lev|ilev, nf, lat|YDim, lon|XDim).

  • dims (list of str) – List of dimension names in the same order as shape ([‘time’, ‘lev’, ‘lat’, ‘lon’] for GEOS-Chem “Classic”, or [‘time’, ‘lev’, ‘nf’, ‘Ydim’, ‘Xdim’] for GCHP). Only returned if return_dims is True.

gcpy.util.get_area_from_dataset(dset)[source]

Convenience routine to return the area variable (which is usually called “AREA” for GEOS-Chem “Classic” or “Met_AREAM2” for GCHP) from an xarray Dataset object.

Parameters:

dset (xarray Dataset) – The input dataset.

Returns:

area_m2 – The surface area in m2, as found in dset.

Return type:

xarray DataArray

gcpy.util.get_variables_from_dataset(dset, varlist)[source]

Convenience routine to return multiple selected DataArray variables from an xarray Dataset. All variables must be found in the Dataset, or else an error will be raised.

Parameters:
  • dset (xarray Dataset) – The input dataset.

  • varlist (list of str) – List of DataArray variables to extract from dset.

Returns:

dset_subset – A new dataset containing only the variables that were requested.

Return type:

xarray Dataset

Notes

Use this routine if you absolutely need all of the requested variables to be returned. Otherwise use standard Dataset indexing.

gcpy.util.create_blank_dataarray(name, sizes, coords, attrs, fill_value=nan, fill_type=<class 'numpy.float64'>, vertical_dim='lev')[source]

Given an xarray DataArray dr, returns a DataArray object with the same dimensions, coordinates, attributes, and name, but with its data set to missing values (default=NaN) everywhere. This is useful if you need to plot or compare two DataArray variables, and need to represent one as missing or undefined.

Parameters:
  • name (str) – The name for the DataArray object that will contain NaNs.

  • sizes (dict of int) – Dictionary of the dimension names and their sizes (e.g. {‘time’: 1, ‘lev’: 72, …}) that will be used to create the DataArray of NaNs. This can be obtained from an xarray Dataset as ds.sizes.

  • coords (dict of lists of float) – Dictionary containing the coordinate variables that will be used to create the DataArray of NaNs. This can be obtained from an xarray Dataset with ds.coords.

  • attrs (dict of str) – Dictionary containing the DataArray variable attributes (such as “units”, “long_name”, etc.). This can be obtained from an xarray DataArray with dr.attrs.

  • fill_value (float or NaN, optional) – Value with which the DataArray object will be filled. Default value: np.nan

  • fill_type (numeric type, optional) – Specifies the numeric type of the DataArray object. Default value: np.float64 (aka “double”)

  • vertical_dim (str, optional) – Specifies the name of the vertical dimension (e.g. “lev”, “ilev”). Default value: “lev”

Returns:

dr – The output DataArray object, which will be set to the value specified by the fill_value argument everywhere.

Return type:

xarray DataArray

gcpy.util.check_for_area(dset, gcc_area_name='AREA', gchp_area_name='Met_AREAM2')[source]

Makes sure that a dataset has a surface area variable contained within it. GEOS-Chem Classic files all contain surface area as variable AREA. GCHP files do not and area must be retrieved from the met-field collection from variable Met_AREAM2. To simplify comparisons, the GCHP area name will be appended to the dataset under the GEOS-Chem “Classic” area name if it is present.

Parameters:
  • dset (xarray Dataset) – The Dataset object that will be checked.

  • gcc_area_name (str, optional) – Specifies the name of the GEOS-Chem “Classic” surface area variable. Default value: “AREA”

  • gchp_area_name (str, optional) – Specifies the name of the GCHP surface area variable. Default value: “Met_AREAM2”

Returns:

dset – The modified Dataset object.

Return type:

xarray Dataset

gcpy.util.get_filepath(datadir, col, date, is_gchp=False, gchp_res='c00', gchp_is_pre_14_0=False)[source]

Routine to return file path for a given GEOS-Chem “Classic” (aka “GCC”) or GCHP diagnostic collection and date.

Parameters:
  • datadir (str) – Path name of the directory containing GCC or GCHP data files.

  • col (str) – Name of collection (e.g. Emissions, SpeciesConc, etc.) for which file path will be returned.

  • date (numpy.datetime64) – Date for which file paths are requested.

  • is_gchp (bool, optional) – Set this switch to True to obtain file pathnames to GCHP diagnostic data files. If False, assumes GEOS-Chem “Classic”. Default value: False

  • gchp_res (str, optional) – Cubed-sphere resolution of GCHP data grid. Only needed for restart files. Default value: “c00”

  • gchp_is_pre_14_0 (bool, optional) – Set this switch to True to obtain GCHP file pathnames used in versions before 14.0. Only needed for restart files. Default value: False

Returns:

path – Pathname for the specified collection and date.

Return type:

str

gcpy.util.get_filepaths(datadir, collections, dates, is_gchp=False, gchp_res='c00', gchp_is_pre_14_0=False)[source]

Routine to return filepaths for a given GEOS-Chem “Classic” (aka “GCC”) or GCHP diagnostic collection.

Parameters:
  • datadir (str) – Path name of the directory containing GCC or GCHP data files.

  • collections (list of str) – Names of collections (e.g. Emissions, SpeciesConc, etc.) for which file paths will be returned.

  • dates (array of numpy.datetime64) – Array of dates for which file paths are requested.

  • is_gchp (bool, optional) – Set this switch to True to obtain file pathnames to GCHP diagnostic data files. If False, assumes GEOS-Chem “Classic”. Default value: False

  • gchp_res (str, optional) – Cubed-sphere resolution of GCHP data grid. Only needed for restart files. Default value: “c00”

  • gchp_is_pre_14_0 (bool, optional) – Set this switch to True to obtain GCHP file pathnames used in versions before 14.0. Only needed for diagnostic files. Default value: False

Returns:

paths – A list of pathnames for each specified collection and date. First dimension is collection, and second is date.

Return type:

2D list of str

gcpy.util.extract_pathnames_from_log(filename, prefix_filter='')[source]

Returns a list of pathnames from a GEOS-Chem log file. This can be used to get a list of files that should be downloaded from gcgrid or from Amazon S3.

Parameters:
  • filename (str) – GEOS-Chem standard log file.

  • prefix_filter (str, optional) – Restricts the output to file paths starting with this prefix (e.g. “/home/ubuntu/ExtData/HEMCO/”). Default value: ‘’

Returns:

data_list – List of full pathnames of data files found in the log file.

Return type:

list of str

Notes

Author: Jiawei Zhuang (jiaweizhuang@g.harvard.edu)

gcpy.util.get_gcc_filepath(outputdir, collection, day, time)[source]

Routine for getting filepath of GEOS-Chem Classic output.

Parameters:
  • outputdir (str) – Path of the OutputDir directory.

  • collection (str) – Name of output collection, e.g. Emissions or SpeciesConc.

  • day (str) – Number day of output, e.g. 31.

  • time (str) – Z time of output, e.g. 1200z.

Returns:

filepath – Path of requested file.

Return type:

str

gcpy.util.get_gchp_filepath(outputdir, collection, day, time)[source]

Routine for getting filepath of GCHP output.

Parameters:
  • outputdir (str) – Path of the OutputDir directory.

  • collection (str) – Name of output collection, e.g. Emissions or SpeciesConc.

  • day (str) – Number day of output, e.g. 31.

  • time (str) – Z time of output, e.g. 1200z.

Returns:

filepath – Path of requested file.

Return type:

str

gcpy.util.get_nan_mask(data)[source]

Create a mask with NaN values removed from an input array.

Parameters:

data (numpy array) – Input array possibly containing NaNs.

Returns:

new_data – Original array with NaN values removed.

Return type:

numpy array

gcpy.util.all_zero_or_nan(dset)[source]

Return whether dset is all zeros, or all NaNs.

Parameters:

dset (numpy array) – Input GEOS-Chem data.

Returns:

  • all_zero (bool) – Whether dset is all zeros.

  • all_nan (bool) – Whether dset is all NaNs.

gcpy.util.dataset_mean(dset, dim='time', skipna=True)[source]

Convenience wrapper for taking the mean of an xarray Dataset.

Parameters:
  • dset (xarray Dataset or None) – Input data.

  • dim (str, optional) – Dimension over which the mean will be taken. Default value: “time”

  • skipna (bool, optional) – Flag to omit missing values from the mean. Default value: True

Returns:

ds_mean – Dataset containing mean values. Will return None if dset is not defined.

Return type:

xarray Dataset or None

gcpy.util.dataset_reader(multi_files, verbose=False)[source]

Returns a function to read an xarray Dataset.

Parameters:
  • multi_files (bool) – Denotes whether we will be reading multiple files into an xarray Dataset.

  • verbose (bool, optional) – Set this flag to True to print extra informational output. Default value: False

Returns:

reader – Either xr.open_mfdataset or xr.open_dataset.

Return type:

callable

gcpy.util.read_config_file(config_file, quiet=False)[source]

Reads configuration information from a YAML file.

gcpy.util.unique_values(this_list, drop=None)[source]

Given a list, returns a sorted list of unique values.

Parameters:
  • this_list (list) – Input list (may contain duplicate values).

  • drop (list of str, optional) – List of variable names to exclude. Default value: None

Returns:

unique – List of unique values from this_list.

Return type:

list

gcpy.util.wrap_text(text, width=80)[source]

Wraps text so that it fits within a certain line width.

Parameters:
  • text (str or list of str) – Input text to be word-wrapped.

  • width (int, optional) – Line width, in characters. Default value: 80

Returns:

text – Original text reformatted so that it fits within lines of ‘width’ characters or less.

Return type:

str

gcpy.util.insert_text_into_file(filename, search_text, replace_text, width=80)[source]

Convenience routine to insert text into a file. The best way to do this is to read the contents of the file, manipulate the text, and then overwrite the file.

Parameters:
  • filename (str) – The file with text to be replaced.

  • search_text (str) – Text string in the file that will be replaced.

  • replace_text (str or list of str) – Text that will replace ‘search_text’.

  • width (int, optional) – Will “word-wrap” the text in ‘replace_text’ to this width. Default value: 80

gcpy.util.array_equals(refdata, devdata, dtype=<class 'numpy.float64'>)[source]

Tests two arrays for equality. Useful for checking which species have nonzero differences in benchmark output.

Parameters:
  • refdata (xarray DataArray or numpy ndarray) – The first array to be checked.

  • devdata (xarray DataArray or numpy ndarray) – The second array to be checked.

  • dtype (numpy dtype, optional) – The precision that will be used to make the evaluation. Default value: np.float64

Returns:

result – True if both arrays are equal; False if not.

Return type:

bool

gcpy.util.make_directory(dir_name, overwrite)[source]

Creates a directory where benchmark plots/tables will be placed.

Parameters:
  • dir_name (str) – Name of the directory to be created.

  • overwrite (bool) – Set to True if you wish to overwrite prior contents in the directory ‘dir_name’.

gcpy.util.trim_cloud_benchmark_label(label)[source]

Removes the first part of the cloud benchmark label string (e.g. “gchp-c24-1Hr”, “gcc-4x5-1Mon”, etc) to avoid clutter.

gcpy.util.verify_variable_type(var, var_type)[source]

Convenience routine that will raise a TypeError if a variable’s type does not match a list of expected types.

Parameters:
  • var (any) – The variable to check.

  • var_type (type or tuple of types) – A single type definition (list, str, pandas.Series, etc.) or a tuple of type definitions.

gcpy.util.copy_file_to_dir(ifile, dest)[source]

Convenience wrapper for shutil.copyfile, used to copy a file to a directory.

Parameters:
  • ifile (str) – Input file in original location.

  • dest (str) – Destination folder where ifile will be copied.

gcpy.util.replace_whitespace(string, repl_char='_')[source]

Replaces whitespace in a string with underscores. Useful for removing spaces in filename strings.

Parameters:
  • string (str) – The input string.

  • repl_char (str, optional) – Replacement character. Default value: “_”

Returns:

string – String with whitespace replaced.

Return type:

str

gcpy.util.get_element_of_series(series, element)[source]

Returns a specified element of a pd.Series object.

Parameters:
  • series (pd.Series) – A pd.Series object.

  • element (int) – Element of the pd.Series object to return.

Returns:

value – The returned element.

Return type:

various

gcpy.util.read_species_metadata(files, quiet=True)[source]

Reads species metadata from multiple files and returns a dict containing metadata for the union of species.

Parameters:
  • files (str or list) – Species database file(s) to read.

  • quiet (bool, optional) – Quiet (True) or verbose (False) printout. Default value: True

Returns:

  • ref_spcdb (dict) – Species metadata for the Ref model.

  • dev_spcdb (dict) – Species metadata for the Dev model.

gcpy.util.get_molwt_from_metadata(metadata, spc_name)[source]

Extracts molecular weight [g/mol] from a dictionary containing species metadata.

Parameters:
  • metadata (dict) – Metadata for GEOS-Chem species.

  • spc_name (str) – Name of the desired species.

Returns:

spc_mw_g – Species molecular weight [g/mol].

Return type:

float or None