gcpy.examples.dry_run.download_data

This Python script (assumes Python3) reads a GEOS-Chem or HEMCO-standalone log file containing dry-run output and does the following:

  1. Creates a list of unique files that are required for the GEOS-Chem or HEMCO-standalone simulation;

  2. Creates a bash script to download missing files from either the ComputeCanada server (default) or the AWS s3://gcgrid bucket;

  3. Executes the bash script to download the necessary data;

  4. Removes the bash script upon successful download.

Examples

Downloads data from a GEOS-Chem dry run simulation.

$ conda activate gcpy_env
(gcpy_env) $ python -m gcpy.examples.dry_run.download_data log MIRROR-NAME

Prints the unique log file name and exits.

$ conda activate gcpy_env
(gcpy_env) $ python -m gcpy.examples.dry_run.download_data log MIRROR-NAME

Notes

  1. This script only requires the “os”, “sys”, and “subprocess” packages, which are core Python. Therefore, this script can be shipped with GEOS-Chem run directories. It only requires Python 3 and not a full Anaconda/Miniconda environment (but you can run in an Anaconda environment if you have one).

  2. Jiawei Zhuang found that it is much faster to issue aws s3 cp commands from a bash script than a Python script. Therefore, in this routine we create a bash script with all of the download commands that will be executed by the main routine.

Functions

create_download_script(paths, args)

Creates a data download script to obtain missing files from the ComputeCanada data archive (default), or the GEOS-Chem s3://gcgrid bucket on the AWS cloud,

download_the_data(args)

Downloads GEOS-Chem data files from the ComputeCanada server or the AWS s3://gcgrid bucket.

expand_restart_file_names(paths, args, run_info)

Tests if the GEOS-Chem restart file is a symbolic link to ExtData.

extract_pathnames_from_log(args)

Returns a list of pathnames from a GEOS-Chem log file.

get_run_info()

Searches through the input.geos file for GEOS-Chem run parameters.

main()

Main program.

parse_args()

Reads global settings from the download_data.yml configuration file.

write_unique_paths(paths, unique_log)

Writes unique data paths from dry-run output to a file.

gcpy.examples.dry_run.download_data.extract_pathnames_from_log(args)[source]

Returns a list of pathnames from a GEOS-Chem log file.

Parameters:

args (dict) – Contains output from function parse_args.

Returns:

  • paths (dict) – paths[“comments”]: Dry-run comment lines. paths[“found”] : List of file paths found on disk. paths[“missing”]: List of file paths that are missing. paths[“local_prefix”]: Local data directory root.

  • Author

  • ------

  • Jiawei Zhuang (jiaweizhuang@g.harvard.edu)

  • Modified by Bob Yantosca (yantosca@seas.harvard.edu)

gcpy.examples.dry_run.download_data.get_run_info()[source]

Searches through the input.geos file for GEOS-Chem run parameters.

Returns:

run_info – Contains the GEOS-Chem run parameters: start_date, start_time, end_date, end_time, met, grid, and sim.

Return type:

dict

gcpy.examples.dry_run.download_data.expand_restart_file_names(paths, args, run_info)[source]

Tests if the GEOS-Chem restart file is a symbolic link to ExtData. If so, will append the link to the remote file to the line in which the restart file name is found.

Parameters:
  • paths (dict) – Contains output from function extract_pathnames_from_log.

  • args (dict) – Contains output from function parse_args.

  • run_info (dict) – Contains output from function get_run_info.

gcpy.examples.dry_run.download_data.write_unique_paths(paths, unique_log)[source]

Writes unique data paths from dry-run output to a file.

Parameters:
  • paths (dict) – Contains output from function extract_pathnames_from_log.

  • unique_log (str) – Log file that will hold unique data paths.

gcpy.examples.dry_run.download_data.create_download_script(paths, args)[source]

Creates a data download script to obtain missing files from the ComputeCanada data archive (default), or the GEOS-Chem s3://gcgrid bucket on the AWS cloud,

Parameters:
  • paths (dict) – Contains output from function extract_pathnames_from_log.

  • args (dict) – Contains output from function parse_args.

gcpy.examples.dry_run.download_data.download_the_data(args)[source]

Downloads GEOS-Chem data files from the ComputeCanada server or the AWS s3://gcgrid bucket.

Parameters:

args (dict) – Output of runction parse_args.

gcpy.examples.dry_run.download_data.parse_args()[source]

Reads global settings from the download_data.yml configuration file. Also parses command-line arguments and returns a dictionary containing all of these settings.

Returns:

args – args[“config”] : Dict with global settings from download_data.yml args[“dryrun_log”] Name of the GEOS-Chem dry-run log file args[“mirror”]: Name of the remote mirror for download args[“skip_download”]: Are we skipping the download? (T/F)

Return type:

dict

gcpy.examples.dry_run.download_data.main()[source]

Main program. Gets command-line arguments and calls function download_the_data to initiate a data-downloading process.