zequihg50 / ghcnd-dsg

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GHCNd as DSG

Binder

The Global Historical Climatology Network - Daily (GHCN-Daily) dataset integrates daily climate observations from approximately 30 different data sources.

Download by_station data from here. gunzip using find -type f | parallel gunzip.

wget https://www.ncei.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt
awk '{print "https://www.ncei.noaa.gov/pub/data/ghcn/daily/by_station/"$1".csv.gz"}' ghcnd-stations.txt | parallel -j5 wget -nc -q -P by_station/ {}

Takes 5 hours in serial for 1 variable and file size is 600MB. Takes 8 hours and 2,5GB for 5 variables.

ID = 11 character station identification code
YEAR/MONTH/DAY = 8 character date in YYYYMMDD format (e.g. 19860529 = May 29, 1986)
ELEMENT = 4 character indicator of element type 
DATA VALUE = 5 character data value for ELEMENT 
M-FLAG = 1 character Measurement Flag 
Q-FLAG = 1 character Quality Flag 
S-FLAG = 1 character Source Flag 
OBS-TIME = 4-character time of observation in hour-minute format (i.e. 0700 =7:00 am)

ToDo

In the full dataset:

dimensions:
	name_strlen = 11 ;
	timeseries = UNLIMITED ; // (125987 currently)
	obs = UNLIMITED ; // (1122439562 currently)

$ time python field.py 
Processing axis time.
Processing axis lat.
Processing axis lon.
Processing axis alt.
Processing axis station.
[pr: Station: 125987 Lat: 125987, Lon: 125987, Time: 100183 (1750-02-01 00:00:00, 2024-05-17 00:00:00)
]
Subspacing 1916-06-03 00:00:00-2018-06-08 00:00:00 at lat=-29.0.
Done ((23, 37261)).
Subspacing 1916-06-03 00:00:00-2018-06-08 00:00:00 at lat=slice(-29.0, -20.0, None).
Done ((7549, 37261)).
Subspacing 1916-06-03 00:00:00-2018-06-08 00:00:00 at lat=slice(-29.0, -12.0, None),lon=slice(117.0, 170.0, None).
Done ((5367, 37261)).
Subspacing stations ['ASN00001028', 'ASN00003002', 'ASN00006049', 'ASN00007064', 'ASN00004064'].
Done ((5, 100183)).
Subspacing stations ['ASN00001028', 'ASN00003002', 'ASN00006049', 'ASN00007064', 'ASN00004064'] at 1916-06-03 00:00:00-2018-06-08 00:00:00.
Done ((5, 37261)).

real	1m8,879s
user	1m6,967s
sys	0m2,427s

About


Languages

Language:Jupyter Notebook 96.9%Language:Python 3.1%Language:nesC 0.0%