Unidata / siphon

Siphon - A collection of Python utilities for retrieving atmospheric and oceanic data from remote sources, focusing on being able to retrieve data from Unidata data technologies, such as the THREDDS data server.

Home Page:https://unidata.github.io/siphon

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Server Error (403: Forbidden)

rvalent opened this issue · comments

--Problem Description:
I am trying to access Wyoming’s publicly available sounding data from Jupyter Notebooks on NCAR's Casper computer, but am getting this 403 error when trying to access it via a python package called “siphon”, which I set up in as a custom package in my own Conda environment. It works elsewhere but not on Casper Jupyterhub, getting the 403 error.

--Python Code: (Minimal, Complete and Verifiable)
from siphon.simplewebservice.wyoming import WyomingUpperAir
from datetime import datetime
date = datetime(2022, 8, 11, 0)
station = 'DTX'
print(WyomingUpperAir.request_data(date, station))

-- Expected output
pressure height temperature dewpoint direction speed u_wind v_wind station station_number time latitude longitude elevation pw
0 979.0 329 25.4 15.4 235.0 4.0 3.276608 2.294306e+00 DTX 72632 2022-08-11 42.7 -83.46 329.0 21.08
1 944.0 647 21.6 13.6 270.0 7.0 7.000000 1.285879e-15 DTX 72632 2022-08-11 42.7 -83.46 329.0 21.08
2 925.0 823 20.0 13.0 290.0 9.0 8.457234 -3.078181e+00 DTX 72632 2022-08-11 42.7 -83.46 329.0 21.08
3 899.0 1069 17.8 12.8 305.0 10.0 8.191520 -5.735764e+00 DTX 72632 2022-08-11 42.7 -83.46 329.0 21.08
4 890.0 1155 17.4 9.4 310.0 10.0 7.660444 -6.427876e+00 DTX 72632 2022-08-11 42.7 -83.46 329.0 21.08
.. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
141 8.2 33000 -35.9 -78.9 89.0 21.0 -20.996802 -3.665005e-01 DTX 72632 2022-08-11 42.7 -83.46 329.0 21.08
142 8.0 33172 -35.9 -78.9 85.0 22.0 -21.916283 -1.917426e+00 DTX 72632 2022-08-11 42.7 -83.46 329.0 21.08
143 7.0 34099 -35.9 -78.9 85.0 22.0 -21.916283 -1.917426e+00 DTX 72632 2022-08-11 42.7 -83.46 329.0 21.08
144 6.9 34199 -35.9 -78.9 NaN NaN NaN NaN DTX 72632 2022-08-11 42.7 -83.46 329.0 21.08
145 6.5 34613 -36.3 -79.3 NaN NaN NaN NaN DTX 72632 2022-08-11 42.7 -83.46 329.0 21.08

[146 rows x 15 columns]

  • Which platform: Jupyterhub is running on Casper Linux version 3.10.0-1127.18.2.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC)

  • Versions. Including the output of:

    • python --version
      Python 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53)
    • `python -c 'import siphon; print(siphon.version)'
      .9

Can you confirm with NCAR's HPC support that this is even expected to work? Many HPC environments disable access to external networks.

Can you test and see if this works?

import urllib.request

resp = urllib.request.urlopen('http://weather.uwyo.edu/cgi-bin/sounding?region=naconf&TYPE=TEXT%3ALIST&YEAR=2022&MONTH=08&FROM=1100&TO=1100&STNM=72632')
print(resp.read())

Thank you, dopplershift -- your suggestion works! I am very grateful.

If possible, I would like to see NCAR JupyterHub be able to handle such data transfers by default.

Would this entail changes on the Unidata side or on the NCAR/UCAR side?

@rvalent Wait, are you saying that sample code worked on Casper? Because I was expecting it to fail given your previous problem.

@dopplershift You are correct! I just checked it now from the JH window. That sample code fails in the same way:

Traceback (most recent call last):
File "", line 3, in
File "/glade/work/valent/conda-envs/casper-rozoff-siphon/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/glade/work/valent/conda-envs/casper-rozoff-siphon/lib/python3.7/urllib/request.py", line 531, in open
response = meth(req, response)
File "/glade/work/valent/conda-envs/casper-rozoff-siphon/lib/python3.7/urllib/request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "/glade/work/valent/conda-envs/casper-rozoff-siphon/lib/python3.7/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/glade/work/valent/conda-envs/casper-rozoff-siphon/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/glade/work/valent/conda-envs/casper-rozoff-siphon/lib/python3.7/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

@rvalent Ok, well that sample code will work on any machine with a non-impeded network configuration--and involves Siphon in no way.

In other words, the only way this works is if NCAR tweaks something, but I want to be clear: it is very common, if not expected, that you do not have arbitrary network access in an HPC environment.

Either way, there's no problem with Siphon here and there's nothing we can do to solve this on our end.

@dopplershift Thank you. This is very helpful.

The network on this cluster is not bound by any outgoing rules. It's more closer architected to a cloud like element. I believe it might be a data server issue. running this several times on various machines, I would potentially get a 403 on Casper systems or sometimes a too many request or system is too busy. This suggests a rate limiter impoementation on the server side that may have blocked our system at some point (forever because of lots of requests). Hard to say without getting in touch with the data server administrator.

@jbaksta That's interesting. I regularly get a 503 from the Wyoming server when they're overloaded, but I have never experienced a 403.