gcpy.examples.dry_run.download_data
This Python script (assumes Python3) reads a GEOS-Chem or HEMCO-standalone log file containing dry-run output and does the following:
Creates a list of unique files that are required for the GEOS-Chem or HEMCO-standalone simulation;
Creates a bash script to download missing files from either the ComputeCanada server (default) or the AWS s3://gcgrid bucket;
Executes the bash script to download the necessary data;
Removes the bash script upon successful download.
Examples
Downloads data from a GEOS-Chem dry run simulation.
$ conda activate gcpy_env
(gcpy_env) $ python -m gcpy.examples.dry_run.download_data log MIRROR-NAME
Prints the unique log file name and exits.
$ conda activate gcpy_env
(gcpy_env) $ python -m gcpy.examples.dry_run.download_data log MIRROR-NAME
Notes
This script only requires the “os”, “sys”, and “subprocess” packages, which are core Python. Therefore, this script can be shipped with GEOS-Chem run directories. It only requires Python 3 and not a full Anaconda/Miniconda environment (but you can run in an Anaconda environment if you have one).
Jiawei Zhuang found that it is much faster to issue aws s3 cp commands from a bash script than a Python script. Therefore, in this routine we create a bash script with all of the download commands that will be executed by the main routine.
Functions
|
Creates a data download script to obtain missing files from the ComputeCanada data archive (default), or the GEOS-Chem s3://gcgrid bucket on the AWS cloud, |
|
Downloads GEOS-Chem data files from the ComputeCanada server or the AWS s3://gcgrid bucket. |
|
Tests if the GEOS-Chem restart file is a symbolic link to ExtData. |
Returns a list of pathnames from a GEOS-Chem log file. |
|
Searches through the input.geos file for GEOS-Chem run parameters. |
|
|
Main program. |
Reads global settings from the download_data.yml configuration file. |
|
|
Writes unique data paths from dry-run output to a file. |
- gcpy.examples.dry_run.download_data.extract_pathnames_from_log(args)[source]
Returns a list of pathnames from a GEOS-Chem log file.
- Parameters:
args (
dict) – Contains output from function parse_args.- Returns:
paths (
dict) – paths[“comments”]: Dry-run comment lines. paths[“found”] : List of file paths found on disk. paths[“missing”]: List of file paths that are missing. paths[“local_prefix”]: Local data directory root.Author------Jiawei Zhuang (jiaweizhuang@g.harvard.edu)Modified by Bob Yantosca (yantosca@seas.harvard.edu)
- gcpy.examples.dry_run.download_data.get_run_info()[source]
Searches through the input.geos file for GEOS-Chem run parameters.
- Returns:
run_info – Contains the GEOS-Chem run parameters: start_date, start_time, end_date, end_time, met, grid, and sim.
- Return type:
- gcpy.examples.dry_run.download_data.expand_restart_file_names(paths, args, run_info)[source]
Tests if the GEOS-Chem restart file is a symbolic link to ExtData. If so, will append the link to the remote file to the line in which the restart file name is found.
- gcpy.examples.dry_run.download_data.write_unique_paths(paths, unique_log)[source]
Writes unique data paths from dry-run output to a file.
- gcpy.examples.dry_run.download_data.create_download_script(paths, args)[source]
Creates a data download script to obtain missing files from the ComputeCanada data archive (default), or the GEOS-Chem s3://gcgrid bucket on the AWS cloud,
- gcpy.examples.dry_run.download_data.download_the_data(args)[source]
Downloads GEOS-Chem data files from the ComputeCanada server or the AWS s3://gcgrid bucket.
- Parameters:
args (
dict) – Output of runction parse_args.
- gcpy.examples.dry_run.download_data.parse_args()[source]
Reads global settings from the download_data.yml configuration file. Also parses command-line arguments and returns a dictionary containing all of these settings.
- Returns:
args – args[“config”] : Dict with global settings from download_data.yml args[“dryrun_log”] Name of the GEOS-Chem dry-run log file args[“mirror”]: Name of the remote mirror for download args[“skip_download”]: Are we skipping the download? (T/F)
- Return type: