StevenWeathers / thunderdome-planning-poker

⚑ Thunderdome is an open source agile planning poker, sprint retro, and story mapping tool

Home Page:https://thunderdome.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bug: My Games overview has an error

Adrian-Grimm opened this issue Β· comments

List of /games is and stays empty.

ERR poker/client.go:87 > | ts=1711586159.74078 msg=close error version=3.8.3 error=close tcp 172.17.0.6:8080->172.17.0.1:42525: use of closed network connection poker_id=22fc1095-527c-429d-823b-caa99150cfc1 session_user_id=d3ce7a1d-9bfa-422c-b05d-225380e7cc0e stacktrace=github.com/StevenWeathers/thunderdome-planning-poker/internal/http/poker.subscription.readPump.func1
	/home/runner/work/thunderdome-planning-poker/thunderdome-planning-poker/internal/http/poker/client.go:87
github.com/StevenWeathers/thunderdome-planning-poker/internal/http/poker.subscription.readPump
	/home/runner/work/thunderdome-planning-poker/thunderdome-planning-poker/internal/http/poker/client.go:153 

The error you mentioned is for the Poker game itself not the list of games, do you have any other errors?

with version 3.9 the error message in docker stdout log is gone πŸ‘πŸ».

But still no games in my users list. The list for teams instead is working fine...

I see some error messages in JS console:

image

Sorry, the page you are looking for is not found. HTTP 502 from reverse proxy as response.

https://****.com/api/users/d3ce7a1d-9bfa-422c-b05d-225380e7cc0e/battles?limit=10&offset=0

no further logs...

Is your reverse proxy forwarding /api endpoints correctly?

No further logs makes me suspect the call isn't making it to Thunderdome.

Could be but this would then be api endpoint specific... may an exception in the endpoint?
Other API endpoints are working fine e.g. retros, teams..:

image

But I see it's somthing specific to my user's data... Tested with a new user - there it's working.

I try to be fairly verbose with logs when errors occur, however a silent error could still be happening. Let me know if you figure out what data caused the issue.

With disabled custom error pages on reverse proxy I get only this:
image
not further logs ...

I'll take a look in DB if I could see some strage data or something.

Is the Thunderdome server restarting at that time? 502 could mean the server crashed during that request which would leave me to believe a possible panic happened.

No it's not crashing I think.
Docker container runs normal when I trigger this.

... I'm not familiar with GO code...
But I think the breaking code is in GetGamesByUser() in /internal/db/poker
/poker.go

In my case the code should get following data:

  1. my user has 50 battles
  2. data of all games should also get loaded ...
  3. Easy test case "/api/users/d3ce7a1d-9bfa-422c-b05d-225380e7cc0e/battles?limit=1&offset=1" --> HTTP-502
  4. DB Query should look like:
SELECT b.id, b.name, b.voting_locked, COALESCE(b.active_story_id::text, ''), b.point_values_allowed, b.auto_finish_voting,
		  b.point_average_rounding, b.created_date, b.updated_date,
		  CASE WHEN COUNT(p) = 0 THEN '[]'::json ELSE array_to_json(array_agg(row_to_json(p))) END AS stories,
		  CASE WHEN COUNT(bl) = 0 THEN '[]'::json ELSE array_to_json(array_agg(bl.user_id)) END AS facilitators,
		  min(COALESCE(t.name, '')) as team_name
		FROM thunderdome.poker b
		LEFT JOIN thunderdome.poker_story p ON b.id = p.poker_id
		LEFT JOIN thunderdome.poker_facilitator bl ON b.id = bl.poker_id
		LEFT JOIN thunderdome.poker_user bw ON b.id = bw.poker_id
		LEFT JOIN thunderdome.team t ON t.id = b.team_id
		LEFT JOIN thunderdome.team_user tu ON tu.team_id = t.id
		WHERE (bw.user_id = 'd3ce7a1d-9bfa-422c-b05d-225380e7cc0e' AND bw.abandoned = false) OR (tu.user_id = 'd3ce7a1d-9bfa-422c-b05d-225380e7cc0e')
		GROUP BY b.id ORDER BY b.created_date DESC
		LIMIT 1 OFFSET 1
  1. Result got respond after 5secs - quite slow DB server XD .. Timeouts?:
id                                  |name|voting_locked|coalesce                            |point_values_allowed                                                                |auto_finish_voting|point_average_rounding|created_date                 |updated_date                 |stories                                                                                                                                                                                                                                                        |facilitators                            |team_name|
------------------------------------+----+-------------+------------------------------------+------------------------------------------------------------------------------------+------------------+----------------------+-----------------------------+-----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+---------+
2b63612b-83c3-4dc2-9551-7ec0192d4252|test|true         |73521815-5154-4159-adea-4ee9ac2e17b6|["0", " 1", " 2", " 3", " 5", " 8", " 13", " 21", " 34", " 55", " ?", " WTF", " β˜•οΈ"]|true              |ceil                  |2024-03-28 01:44:31.288 +0100|2024-03-28 01:45:18.197 +0100|[{"id":"73521815-5154-4159-adea-4ee9ac2e17b6","name":"test","points":"","active":false,"poker_id":"2b63612b-83c3-4dc2-9551-7ec0192d4252","votes":[{"vote": " 55", "warriorId": "d3ce7a1d-9bfa-422c-b05d-225380e7cc0e"}],"created_date":"2024-03-28T01:44:58.328|["d3ce7a1d-9bfa-422c-b05d-225380e7cc0e"]|         |
  1. may I find time to debug this further.. but I need to dig into GO first XP..

Success Case:

  1. Test user with just 1 game
  2. /api/users/7c01fde2-08bb-432c-941b-4490b0c4ac47/battles?limit=1&offset=0 --> HTTP200 with data
  3. DB query runs way faster... quite similar data

So may realy a timeout issue?

I try to tune my PostgreSQL DB in the meanwhile.. πŸ™ˆπŸ˜… ... 5seconds for 50 records is quite bad.
But may it would be nice to check if this fit's to your timeout's in the GO code.. I think up to min. 10 sec's should be acceptable before failing...

without the LEFT JOIN thunderdome.team_user tu ON tu.team_id = t.id it's also fast in my case:

SELECT b.id, b.name, b.voting_locked, COALESCE(b.active_story_id::text, ''), b.point_values_allowed, b.auto_finish_voting,
		  b.point_average_rounding, b.created_date, b.updated_date,
		  CASE WHEN COUNT(p) = 0 THEN '[]'::json ELSE array_to_json(array_agg(row_to_json(p))) END AS stories,
		  CASE WHEN COUNT(bl) = 0 THEN '[]'::json ELSE array_to_json(array_agg(bl.user_id)) END AS facilitators,
		  min(COALESCE(t.name, '')) as team_name
		FROM thunderdome.poker b
		LEFT JOIN thunderdome.poker_story p ON b.id = p.poker_id
		LEFT JOIN thunderdome.poker_facilitator bl ON b.id = bl.poker_id
		LEFT JOIN thunderdome.poker_user bw ON b.id = bw.poker_id
		LEFT JOIN thunderdome.team t ON t.id = b.team_id

		WHERE (bw.user_id = 'd3ce7a1d-9bfa-422c-b05d-225380e7cc0e' AND bw.abandoned = false)
		GROUP BY b.id ORDER BY b.created_date DESC
		LIMIT 10 OFFSET 1

was executed in 0,012 sec 🧐

even also with the join and tu.user_id = null it's same fast...

Ok so that the team_user table does not have an index on the team_id column, that could improve performance for a larger table of users.

Good news is you can increase the http timeouts in the short term, currently the default is 5 seconds.

viper.SetDefault("http.write_timeout", 5)
viper.SetDefault("http.read_timeout", 5)

Could you may expose those timeout values as a DB-Timeout env-var in docker?
But also an index on this sounds like a good improvement πŸ‘πŸ»

So there is no enforced DB timeout on the application, the http handler timeout is what your usecase is running into. Those are configurable already so I would suggest increasing them from 5 seconds to 10.

As far as index goes, at least testing with over 1000 team users locally (28ms query time) I'm not seeing any noticeable performance gains.

I'm just digging deeper before I start tweaking DB server performance ..:

image

some strange numbers:

image

I've 1031 stories in poker_story table

Those are some great details to dig into, I'm always open to improving query performance. One trade-off we could make right away is to remove the count of stories pointed vs not pointed in the games listing page.

there is a bottlenec right:
image

I think I definetly need to give my database server more ressources for this DB.

But also this is quite a lot of data just for the List of games of a single user - over 50mb for just 50 games where the paging by default shows only 10 I think... 🧐

I think there's a few things going on with that query, looking at it I'm confused why this scan is going on
image

Ok I see what's going on, its the poker_user join thats actually increasing the query scans.

I'm playing with the Query and have the number of rows way down, still trying to optimize further.

I also had success with improving this DB server instance
Helped a lot to increase the shared_buffers and set the work_mem to 60MB.
With that in place the list loads - but query is still takes 3 secs in my case.

But now I get this when trying to get on the second page XP

image

.. so I think this requires still more ressources to get into the "green zone":

image

(network connection get's cut off - but data was getting through... I think this is the 5sec timeout you mentioned...)

image I think I've got the query to a super optimal state...

If you could give this a shot replace USER_ID_HERE and let me know your performance.

WITH user_teams AS (
	SELECT t.id, t.name FROM thunderdome.team_user tu
	LEFT JOIN thunderdome.team t ON t.id = tu.team_id
	WHERE tu.user_id = 'USER_ID_HERE'
),
team_games AS (
	SELECT id FROM thunderdome.poker WHERE team_id IN (SELECT id FROM user_teams)
),
user_games AS (
	SELECT u.poker_id AS id FROM thunderdome.poker_user u
	WHERE u.user_id = 'USER_ID_HERE' AND u.abandoned = false
),
games AS (
	SELECT id from user_games UNION ALL SELECT id FROM team_games
),
stories AS (
	SELECT poker_id, points FROM thunderdome.poker_story WHERE poker_id IN (SELECT poker_id FROM games)
)
SELECT p.id, p.name, p.voting_locked, COALESCE(p.active_story_id::text, ''), p.point_values_allowed, p.auto_finish_voting,
		  p.point_average_rounding, p.created_date, p.updated_date,
		  CASE WHEN COUNT(s) = 0 THEN '[]'::json ELSE array_to_json(array_agg(row_to_json(s))) END AS stories,
		  CASE WHEN COUNT(bl) = 0 THEN '[]'::json ELSE array_to_json(array_agg(bl.user_id)) END AS facilitators,
		  min(COALESCE(t.name, '')) as team_name
		FROM thunderdome.poker p
		LEFT JOIN stories AS s ON s.poker_id = p.id
		LEFT JOIN (
			SELECT poker_id, user_id FROM thunderdome.poker_facilitator WHERE poker_id IN (SELECT poker_id FROM games)
		) AS bl ON bl.poker_id = p.id
		LEFT JOIN user_teams t ON t.id = p.team_id
		WHERE p.id IN (SELECT id FROM games)
		GROUP BY p.id ORDER BY p.created_date DESC;

Quite impressive result! πŸŽ‰πŸ‘πŸ»
image

looks also great in detail:
image

... this will also safe a lot on scale πŸ˜…πŸ‘πŸ»

@Adrian-Grimm Thank you for bringing this to my attention and aiding in hunting this pesky rabbit down, I will have the fix out shortly!

Thx to you, for the great support and maintaining all of this with so much πŸ’ͺ🏻