Destruction of object while method is in `yield` state
tavurth opened this issue · comments
When the task is in a yield
ing state, and the task object is queue_freed
there can be weird errors and crashes.
Any pointers as to where you check for task state?
Hi @tavurth!
When you say "task object is queue_free
d", do you mean the actual Task
class, that is returned from the dispatch
method?
Since Task
extends Reference
, it doesn't have a queue_free
method. Also, you should never free
Reference
objects directly, it's best to let Godot manage the reference counting and free the objects when it's the right time.
Now, regarding yield
, the current implementation does not take that into account, it gets the result from the method call and assumes right away that the task is completed (see Task.Execute). To be honest, I've never tried using yield
in these tasks, so that's probably why I didn't implement support for it, thanks for pointing that out!
Anyway, do you have a reproduction project where I could see how you are using yield
in the tasks? This way I could have a better idea of what you're trying to do and test any fixes against this reproduction project.
Hi @gilzoide!
Sorry for the misunderstanding, by task object
I mean the object on which the task is calling the method
.
Actually seems like it's not even a queue free
issue, something weird going on with yielding inside the function that's called by the thread. I created a sample repository.
Running this project causes a crash for me 1 time in every 3 or so. Details below:
~/src/dispatch-queue-test ⍉
▶ godot --verbose
arguments
0: /Applications/Godot.app/Contents/MacOS/Godot
1: --verbose
Current path: /Users/will/src/dispatch-queue-test
Godot Engine v3.4.3.stable.official.242c05d12 - https://godotengine.org
Using GLES2 video driver
OpenGL debugging not supported!
OpenGL ES 2.0 Renderer: Intel(R) Iris(TM) Plus Graphics 655
OpenGL ES Batching: ON
OPTIONS
max_join_item_commands 16
colored_vertex_format_threshold 0.25
batch_buffer_size 16384
light_scissor_area_threshold 1
item_reordering_lookahead 4
light_max_join_items 32
single_rect_fallback False
debug_flash False
diagnose_frame False
CoreAudio: detected 2 channels
CoreAudio: audio buffer frames: 512 calculated latency: 11ms
Registered camera FaceTime HD Camera (Built-in) with id 1 position 0 at index 0
CORE API HASH: 15296446336143176771
EDITOR API HASH: 4915204304684122520
Loading resource: res://default_env.tres
Loading resource: res://addons/dispatch_queue/dispatch_queue_node.gd
Loading resource: res://addons/dispatch_queue/dispatch_queue.gd
Loaded builtin certs
Loading resource: res://Main.tscn
Loading resource: res://Main.gd
Loading resource: res://TaskWhichIsFreed.gd
ERROR: Condition "p_I->data != this" is true. Returned: false
at: erase (./core/list.h:150)
Godot(18521,0x118f90600) malloc: Heap corruption detected, free list is damaged at 0x6000015e0240
*** Incorrect guard value: 140704381972240
Godot(18521,0x118f90600) malloc: *** set a breakpoint in malloc_error_break to debug
[1] 18521 abort /Applications/Godot.app/Contents/MacOS/Godot --verbose
Other times I get the following:
ERROR: Error calling method from signal 'idle_frame': 'GDScriptFunctionState::': Method not found..
at: emit_signal (core/object.cpp:1236)
ERROR: Disconnecting nonexistent signal 'idle_frame', slot: 1302:.
at: _disconnect (core/object.cpp:1538)
ERROR: Condition "p_I->data != this" is true. Returned: false
at: erase (./core/list.h:150)
Could this be a bug with the godot
threading side of things?
Interestingly if I change the code in Main.gd
to delay even 500ms
the project runs every time, perhaps because the _ready
function is now also yielding to the parent.
extends Node2D
var TaskWhichIsFreed = preload("res://TaskWhichIsFreed.gd")
func _ready():
for _i in range(20):
self.add_child(TaskWhichIsFreed.new())
yield(get_tree().create_timer(0.5), "timeout")
for child in self.get_children():
Threads.dispatch(child, "task")
Btw, I love this plugin it's super helpful and clean. Great job!
Ok, now I got it!
Running some tests here, both problems happened and it was not 100% reproduction, so sometimes it works, sometimes not, just like how you mentioned.
For the crashes, it's most likely some race condition happening, since in multithreaded code a lot can go wrong if more than one thread accesses the same memory. I can't tell yet if the problem is when yielding or after resuming, probably the later.
For the second problem (Method not found..
), I think that when the Task object is destroyed, the GDScriptFunctionState is not valid anymore (GDScriptFunctionState.is_valid docs mention the object must exist for the resume to happen) and the error occurs.
So if we are going to support yielding from Tasks, we'll need to keep a reference to the Task alive until it completes.
This shouldn't be too hard: first we check if Task.execute
returns a GDScriptFunctionState (here and here). If so, we keep it in an Array or Dictionary or something like that and listen to it's finished
signal to remove from there afterwards.
if I change the code in Main.gd to delay even 500ms the project runs every time
Hmm, that's really weird... It happens a lot less, but if you try enough, it may still crash. I've been able to reproduce it with a 0.5s delay after lots of runs.
Btw, I love this plugin it's super helpful and clean. Great job!
Thanks! I'm really glad you like it ^^
Thank you for the pointers! With your help I managed to write a solution which seems to function well.
I've tested it on my sample repository using yield inside the child (and no yield) and everything seems to be running well 🥳
Please see the attached PR if you would like to merge it with your repo. I tried to keep the code isolated so you can easily change it for 4.x.