500 Internal Server Error on the first day of the month

Question

500 Internal Server Error on the first day of the month

clowncracker opened this issue 4 months ago · comments

Describe the issue

I have installed the integration via HACS. Starting today I've hit a weird error that I haven't had before: Tried to configure it the way I want, but then deleted and installed again with bare-minimum default configuration. When it loads, it shows on the Integrations page as the following:

Failed setup, will retry: 500, message='Internal Server Error', url=URL('https://api.pirateweather.net/forecast/xxxx/xxxx,xxxxunits=si&extend=hourly&version=2')

I uninstalling an resinstalling via HACS, tried creating a new API key, and I tried using the web URL with my latitude/longtitude and it does not work. When I made both of the numbers positive with the web url, it showed me a location in China. It looks like it's an issue with my location specifically.

Home Assistant version

2024.6.4

Integration version

1.5.2

Troubleshooting steps

I have updated my Home Assistant installation to the latest version.
I have updated the Pirate Weather Integration to the latest version.
I have gone through the documentation before opening this issue.
I have searched this repository and API Repository to see if the issue has already been reported.
I have restarted my Home Assistant installation.
I have queried the API in my browser to confirm the issue is not with the API.
I have written an informative title.

Aaron · Answer 1 · Mon Jul 01 2024 12:15:40 GMT+0800 (China Standard Time)

I'm also seeing this behavior, I am on version 1.5.3 though. I backed up to 1.5.2 and had the same issue on that version as well.

Adam Mercer · Answer 2 · Mon Jul 01 2024 17:02:31 GMT+0800 (China Standard Time)

I was seeing this, and was in the middle of writing an update to this issue saying as much and then I reloaded HA (for the n’th time whilst encountering this) and it’s seems to be back now… So it looks like this may be fixed?

Aaron · Answer 3 · Mon Jul 01 2024 23:25:33 GMT+0800 (China Standard Time)

Like skymoo said, this morning refreshing my HA dash and PirateWeather is returning data so no longer having an issue myself.

Kev · Answer 4 · Mon Jul 01 2024 23:55:57 GMT+0800 (China Standard Time)

This issue is an API issue which occurs on the first of every month and it seems to fix itself after a number of hours. (See #242 and #208) I've transferred this issue over to the API repository and will leave open for @alexander0042 to look into and fix.

Alexander Rey · Answer 5 · Tue Jul 02 2024 03:42:26 GMT+0800 (China Standard Time)

Shoot, I thought I fixed this last time, but clearly not! It’s a bug with the date time conversion, but really should have been fixed, so this is frustrating!

Regardless, this sort of downtime isn’t acceptable! Let’s keep this issue open and high priority until I get the test working again

Kev · Answer 6 · Tue Jul 02 2024 23:43:32 GMT+0800 (China Standard Time)

Shoot, I thought I fixed this last time, but clearly not! It’s a bug with the date time conversion, but really should have been fixed, so this is frustrating!

What I find weird is that even though the fix didn't fully fix the issue from popping up again it seemed to be able to recover from the issue itself.

Alexander Rey · Answer 7 · Wed Jul 03 2024 01:41:13 GMT+0800 (China Standard Time)

Yea, it has to do with the datetime conversion when going back to the start of the day for the high/ low values. Shouldn't be difficult to fix, but irritating to get working correctly

Kev · Answer 8 · Wed Jul 03 2024 02:28:48 GMT+0800 (China Standard Time)

Ah, that would explain why it would eventually fix itself after a period of time/

Kev · Answer 9 · Wed Jul 10 2024 02:07:02 GMT+0800 (China Standard Time)

@alexander0042 Currently seeing an Internal Service Error for my location again currently. Seems to only affect locations in the HRRR domain atm.

Alexander Rey · Answer 10 · Wed Jul 10 2024 02:09:36 GMT+0800 (China Standard Time)

Seeing it too- nothing related to ingest, so looking into other causes now

Kev · Answer 11 · Wed Jul 10 2024 02:37:52 GMT+0800 (China Standard Time)

Seems to be working again on my end now. I know there was some downtime yesterday evening around 6pm EDT and it sorted itself out shortly after I noticed which maybe happened here as well?

Alexander Rey · Answer 12 · Wed Jul 10 2024 02:53:28 GMT+0800 (China Standard Time)

Yea, same root cause, and was actually ingest. Every so often one of the forecast files doesn't download, so I end up with the wrong sized file. I though I'd added checks to every script, but missed the 0-18h HRRR. Added it now and checked the others, so this particular glitch should hopefully be closed for good! Couple other thoughts:

I'm updating the status page to query a different location. It's currently querying somewhere outside of HRRR (0,0), which means it misses things like this.
Going to push out a 2.0.11 with a new fallback to GFS instead of 500 if there's anything wrong with HRRR

Kev · Answer 13 · Wed Jul 10 2024 03:04:26 GMT+0800 (China Standard Time)

Good to know this should be fixed going forward. I'll leave this issue open since it's pertaining to an issue at the start of the month,

I'm updating the status page to query a different location. It's currently querying somewhere outside of HRRR (0,0), which means it misses things like this.

Would it make sense then to have the status page query multiple different locations? 1 in the NBM domain but not in the HRRR domain, 1 in the HRRR domain and 1 in the GFS domain

EDIT: Looking at the status page it seems to show the development endpoint as being down since June 15. Maybe whatever location its using is having issues? I think you said you use 0,0?

Going to push out a 2.0.11 with a new fallback to GFS instead of 500 if there's anything wrong with HRRR

In this case wouldn't a better fallback be NBM and then use GFS if there are issues with NBM and HRRR? Or would that be too complicated and just falling back to GFS be easier?

Alexander Rey · Answer 14 · Wed Jul 10 2024 03:38:06 GMT+0800 (China Standard Time)

Yea, ideally it falls back to HRRR/NBM, and in most cases it should, this is just a massive try catch around a bunch of code as a backup!

Testing out the auto update approach, so 2.0.11 should propagate slowly later today!

Kev · Answer 15 · Thu Aug 01 2024 00:04:33 GMT+0800 (China Standard Time)

@alexander0042 Since today is the last day of July just checking in to see if this has been fixed at all. Will the fallback solution added in 2.0.11 solve this issue in the short term while a long term fix is worked on?

jhemak · Answer 16 · Thu Aug 01 2024 11:18:02 GMT+0800 (China Standard Time)

I think it's back

Kev · Answer 17 · Thu Aug 01 2024 11:29:20 GMT+0800 (China Standard Time)

Yup, just checked and the API is down currently. The workaround to exclude HRRR still works and the issue will sort itself out in a few hours.

I guess the fix in 2.0.11 didn't fix this issue. Also checked the dev endpoint which is running 2.1 and it's also down.

Kev · Answer 18 · Thu Aug 01 2024 22:56:56 GMT+0800 (China Standard Time)

Just commenting that the API is back up and running again this morning. Will ping @alexander0042 to look into this issue to hopefully solve it by the time September rolls around.

Kev · Answer 19 · Thu Aug 15 2024 22:03:29 GMT+0800 (China Standard Time)

With the release of V2.1 this should finally be fixed. Will close this for now but we can always re-open it again if it occurs again.