gcpy.community.format_hemco_data

Contains functions to make sure that data files to be read by HEMCO adhere to the COARDS netCDF conventions.

Functions

check_hemco_variables(dset)

Checks that all variables in a dataset have the attributes required for HEMCO compliance (units and long_name).

format_hemco_dimensions(dset[, start_time, ...])

Formats time, lat, lon, and lev attributes for COARDS compliance (HEMCO compatibility).

format_hemco_variable(dset, var[, ...])

Formats attributes for a non-standard variable for COARDS compliance (HEMCO compatibility).

save_hemco_netcdf(dset, save_dir, save_name)

Saves a COARDS-compliant (HEMCO-compatible) netCDF file.

gcpy.community.format_hemco_data.format_hemco_dimensions(dset, start_time='2000-01-01 00:00:00', lev_long_name='level', lev_units='level', lev_formula_terms=None, gchp=False)[source]

Formats time, lat, lon, and lev attributes for COARDS compliance (HEMCO compatibility).

Parameters:
  • dset (xarray.Dataset) – Dataset containing at least latitude and longitude variables, which must be named lat and lon respectively.

  • start_time (str, optional) – Start time of the dataset used to encode the time dimension, formatted as "YYYY-MM-DD HH:mm:ss". For GCHP compliance, the first time value must be 0 time units from the start of the unit. Default: "2000-01-01 00:00:00"

  • lev_long_name (str, optional) – Descriptive name for the level attribute. Examples include "level", "GEOS-Chem levels", "Eta centers", or "Sigma centers". Default: "level"

  • lev_units (str, optional) – Units of the vertical levels. Should be one of "level", "eta_level", or "sigma_level". Setting both lev_units and lev_long_name to "level" allows HEMCO to regrid between vertical grids. Default: "level"

  • lev_formula_terms (str or None, optional) – If data is not on the model vertical grid, this must contain the surface pressure values and hybrid coefficients of the coordinate system together with the formula terms (e.g. "ap: hyam b: hybm ps: PS"). Default: None

  • gchp (bool, optional) – Whether this file is intended for use in GCHP (True) or GEOS-Chem Classic (False). Primarily used to set the lev attributes. Default: False

Returns:

dset – Updated dataset with encoding and attributes set for COARDS/HEMCO compliance.

Return type:

xarray.Dataset

gcpy.community.format_hemco_data.check_hemco_variables(dset)[source]

Checks that all variables in a dataset have the attributes required for HEMCO compliance (units and long_name).

Parameters:

dset (xarray.Dataset) – Dataset whose variables are to be checked.

Raises:

ValueError – If any variable is missing the units or long_name attribute.

gcpy.community.format_hemco_data.format_hemco_variable(dset, var, long_name=None, units=None, **kwargs)[source]

Formats attributes for a non-standard variable for COARDS compliance (HEMCO compatibility).

Parameters:
  • dset (xarray.Dataset) – Dataset containing HEMCO input data.

  • var (str) – Name of the variable to be formatted.

  • long_name (str, optional) – Descriptive name for var. Required by HEMCO unless already present as a variable attribute. Default: None

  • units (str, optional) – Units of var. Required by HEMCO unless already present as a variable attribute. See the HEMCO input file format documentation for more information. Default: None

  • **kwargs (dict) – Additional attributes to set on the variable.

Returns:

dset – Updated dataset with COARDS/HEMCO-conforming variable attributes.

Return type:

xarray.Dataset

Raises:

ValueError – If long_name or units is not provided and cannot be found in the existing variable attributes.

gcpy.community.format_hemco_data.save_hemco_netcdf(dset, save_dir, save_name, dtype='float', **kwargs)[source]

Saves a COARDS-compliant (HEMCO-compatible) netCDF file.

Parameters:
  • dset (xarray.Dataset) – Dataset containing HEMCO input data.

  • save_dir (str) – Directory in which the file will be saved.

  • save_name (str) – Filename for the output file. A .nc extension will be appended if not already present.

  • dtype (str or numpy.dtype, optional) – Data type used when writing data to disk. Defaults to "float" (float32) to minimise file size.

  • **kwargs (dict) – Additional keyword arguments passed to xarray.Dataset.to_netcdf().

Notes

The time encoding (units and calendar) is preserved explicitly to prevent xarray from overwriting it with its own defaults during the save step.