CPT-DL
A python library for downloading climate model data and observations from the IRI data library
Usage
CPT-DL stores pre-made Ingrid url templates in several python dictionaries, organized by purpose.
cptdl.hindcasts
stores urls designed for accessing GCM hindcasts and CPT training data
cptdl.observations
stores urls designed for downloading observations data
cptdl.forecasts
stores urls designed for downloading individual GCM forecasts. You can view these template URLs here.
If you look closely, these templates are designed to be python “F-Strings”. F-Strings were introduced in Python 3.6 and allow for the dynamic evaluation of python code during string formatting. However, dynamically evaluating f-strings themselves is not a simple task, so cptdl
implements a function called cptdl.evaluate_url
. cptdl.evaluate_url
accepts arbitrary keyword arguments, and inserts them dynamically into the url template provided as the first positional argument, url.
Example:
formatted_url = cptdl.evaluate_url(url, predictor_extent={'east':130, ...}, ... )
cptdl.hindcasts
, cptdl.forecasts
and cptdl.observations
store urls indexed by the names of the datasets they access. For example, if one wanted CanSIPSv2 Precipitation, the key would be “CanSIPSv2.PRCP”. You can view the available variables in each dictionary by typing cptdl.hindcasts.keys()
into a python terminal.
Whereas PyCPT Legacy used an external curl
command to download data, cptdl
uses a custom pure-python download function called cptdl.simple_download
, which simply downloads the provided url to the provided destination file path. It can be used in any python context, almost like curl
. curl
has a “feature” which allows it to download whatever is served by a website, even if its a 404 not-found message. This led PyCPT Legacy to mistakenly download Error 404 HTML pages, for example when months of data didnt exist. It would try to proceed without checking. cptdl
fixes this problem.
def simple_download(url, dest, verbose=False, use_dlauth=False)
Example:
import cptdl as dl
docspage = dl.simple_download("https://raw.githubusercontent.com/kjhall-iri/cpt-dl/master/src/utilities.py", "/Path/To/Destination/utilities/py")
if use_dlauth is set to True
, cptdl
will attempt to read ~/.pycpt_dlauth
and pass it as a cookie in the request to the URL. You can set up DLAuth with cptdl
by making an account here, and then passing the email and password you set up to the cptdl.setup_dlauth("email@email.com")
function
Example:
cptdl.setup_dlauth("pycpt.dev@gmail.com")
PASSWORD: [enter your password here- it will be invisible]
You’ll then have a ~/.pycpt_dlauth
file, which will be used to authenticate with the IRI data library in future requests using use_dlauth=True
Since the rest of the PyCPT stack is built on top of Xarray, cptdl
has a convenient function for rolling the evaluation of a template URL, the download of the file, and the opening of that file with cptio
all in one: dataarray = cptdl.download(url, destfile, **kwargs)
which returns an Xarray.DataArray
with the desired data.
def download(baseurl, dest, verbose=False, format='cptv10.tsv', use_dlauth=True, **kwargs)
Example:
import cptdl as dl
import datetime as dt
template = dl.hindcasts['CanSIPSv2.PRCP']
destination_file = "cansips_prcp.tsv"
kwargs = {
'fdate': dt.datetime.now(),
'first_year': 1982,
'final_year': 2018,
'predictor_extent': {
'east': 90,
'west': 0,
'north': 90,
'south': 0
},
'lead_low': 1.5,
'lead_high': 3.5,
'target': 'Jul-Sep',
'filetype': 'cptv10.tsv'
}
# this returns an xarray dataarray:
da = dl.download(template, destination_file, verbose=True, **kwargs) # verbose controls a progressbar
# and, if you have trouble with DLAuth setup and want to try to skip it, you can turn it off like this:
da = dl.download(template, destination_file, use_dlauth=False, verbose=True, **kwargs)