TGtkGadget Free() from Delete() has problems.

Question

TGtkGadget Free() from Delete() has problems.

woollybah opened this issue 6 years ago · comments

I am getting the following crash when starting MaxIDE on Linux :

#0  0x0000000000700390 in _maxgui_gtk3maxgui_gtkgadget_TGTKGadget_Free ()
#1  0x000000000070b839 in _maxgui_gtk3maxgui_gtkgadget_TGTKDesktop_Delete ()
#2  0x0000000000793b0e in GC_invoke_finalizers ()
#3  0x0000000000793d0c in GC_notify_or_invoke_finalizers ()
#4  0x0000000000795067 in GC_generic_malloc ()
#5  0x000000000079579f in GC_malloc_atomic ()
#6  0x0000000000783328 in bbGCAllocObject ()
#7  0x0000000000780ee6 in allocateArray.constprop.2 ()
#8  0x0000000000781b45 in bbArrayNew1D ()
#9  0x000000000071be47 in maxgui_localization_TMaxGUILocalizationEngine_LocalizeString_S_S.part.0 ()
#10 0x00000000007155e5 in _maxgui_maxgui_driver_TMaxGUIDriver_ApplyLocalization_TTGadget ()
#11 0x0000000000715365 in _maxgui_maxgui_driver_TMaxGUIDriver_ApplyLanguage ()
#12 0x0000000000715199 in _maxgui_maxgui_driver_TMaxGUIDriver_SetLanguage_TTMaxGuiLanguage ()
#13 0x00000000007151d7 in maxgui_maxgui_driver_driver_SetLocalizationLanguage
    ()
#14 0x000000000071c1ca in maxgui_localization_SetLocalizationLanguage ()
#15 0x0000000000427b96 in __m_maxide_TOptionsRequester_Read_TTStream ()
#16 0x000000000041c594 in __m_maxide_TCodePlay_ReadConfig ()
#17 0x0000000000438881 in __m_maxide_TCodePlay_Initialize ()
#18 0x000000000043a9ee in _bb_main ()

The backtrace implies that the TGTKDesktop instance is being collected, which is somewhat concerning, and implies that its parent "driver" is up for collection too.

I believe this relates to the recent changes that were made to the GC, and will need further investigation to work out what exactly is going on.

Ronny Otto · Answer 1 · Tue Mar 12 2019 18:55:54 GMT+0800 (China Standard Time)

Just to make sure: you recompiled all modules (gtk.mod and the likes)? Hope it isn't that time consuming to track it down. I only tested my game TVTower for a very short while and it seemed to work. My retro comp project seems to work too but your issue here now leaves a ...bad feel in my stomach.

Brucey · Answer 2 · Tue Mar 12 2019 19:31:07 GMT+0800 (China Standard Time)

Well, it might just be a bug somewhere... something not being set when it should.
The GC should now be correctly recognising when a particular instance of something is no longer reachable - and GC'ing it accordingly.

There are probably two issues here, at least.

The main problem as observed by the backtrace.
The fact TGTKDesktop may not have various properties set that TGTKGadget.Free() is attempting to clean up - hence the crash.

Ronny Otto · Answer 3 · Tue Mar 12 2019 19:56:26 GMT+0800 (China Standard Time)

When compiling MaxIDE in debug I also get this:

Executing:maxide.debug
DebugLog:WARNING: Toolbars should *only* be parented to window gadgets.
Segmentation fault

Maybe it is related

HurryStarfish · Answer 4 · Tue Mar 12 2019 20:06:59 GMT+0800 (China Standard Time)

Yeah, this could be connected to the recent GC changes.
I remember there was a problem with globals being collected before... although back then the problem was that they weren't added as GC roots and that had been solved iirc?

Ronny Otto · Answer 5 · Tue Mar 12 2019 20:13:41 GMT+0800 (China Standard Time)

For me it segfaults here:

Type TGadget
...
	Method Free:Int()
If TTypeId.ForObject(Self).name = "TGTKDesktop" Then DebugStop
		Local gadget:TGTKGadget
		Local rkids:TList
If Not kids Then Print "no kids"
		rkids=kids.Reversed() ' <=====

"no kids" was not printed, so kids was available.

Next time I tried the "debugstop" was not executed before the segfault - so segfault happened somewhere else. Run it via GDB and ... it segfaulted again within free() of TGTKDesktop.

So it seems to not be related to a "property" of TGTKDesktop but some GC thingy. To make sure I wrapped all property accesses with "if property...". Exception is if the underlaying object is no longer "valid" without NG being aware of it.

Ronny Otto · Answer 6 · Tue Mar 12 2019 20:21:07 GMT+0800 (China Standard Time)

Disabled that kids.reversed() + free-part and it builds - so something is freed before.

Interesting output:

Executing:maxide.debug
DebugLog:WARNING: Toolbars should *only* be parented to window gadgets.
openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-2ubuntu0.16.04.1-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)

Ronny Otto · Answer 7 · Tue Mar 12 2019 20:32:41 GMT+0800 (China Standard Time)

More detailed: kids.reversed() is what segfaults here - something within kids is already collected.

So I replaced:

		rkids=kids.Reversed()

with:

		if not kids._head then print "kids._head is empty"
		Local list:TList=New TList
		list._count = 0
		Local link:TLink=kids._head._succ
		While link<>kids._head
			if not link then print "link is empty"
			if not link._succ then print "link._succ is empty"
			list.AddFirst link._value
			link=link._succ
		Wend
		rkids = list

And it runs (without one of the print lines getting output) ... seems the method call is what fails here. Maybe that helps a bit to see the culprit?

Ronny Otto · Answer 8 · Tue Mar 12 2019 20:47:50 GMT+0800 (China Standard Time)

Meanwhile I have to revisit what I said above my game tests above - receive segfaults here and there on startup. Sometimes it works, sometimes TVTower segfaults during startup phase. So it seems more and more to not be a GTK issue.

Brucey · Answer 9 · Tue Mar 12 2019 21:14:50 GMT+0800 (China Standard Time)

Of course, it may just be that there's a load of crap code...

Executing:maxide
TGTKDesktop:New
TGTKDesktop:New
TGTKDesktop:New
TGTKDesktop:New
TGTKDesktop:New
TGTKDesktop:New
TGTKDesktop:New
TGTKDesktop:New
TGTKDesktop:Delete
TGTKDesktop:Delete
TGTKGadget:TGTKDesktop:Delete
Segmentation fault (core dumped)

which seems wrong in so many ways.

Brucey · Answer 10 · Tue Mar 12 2019 21:17:26 GMT+0800 (China Standard Time)

On a plus point though, now that the GC appears to be freeing more stuff up (that it maybe should be freeing up), I'm sure we'll come across lots of little issues like this along the way.

Yay! :-p

Brucey · Answer 11 · Tue Mar 12 2019 21:35:03 GMT+0800 (China Standard Time)

re: reversed()
The problem with calling Reversed() in a method that may have been called by Delete(), is that Reversed() creates objects, which may result in the collection of other objects, which may result in various children of the original object being Freed whilst the reversed list to hold them is being created.

Which all seems like a horrible mess...

HurryStarfish · Answer 12 · Tue Mar 12 2019 22:02:05 GMT+0800 (China Standard Time)

Yeah, this is one of the reasons why complex logic in finalizers is usually a bad idea...
The way I understood it, the JAVA_FINALIZATION flag should guarantee that objects do not get collected as long as they're still accessible by yet-unfinalized objects. If that's correct, then while an object is running its Delete method, all other objects it has references to should still exist - but those objects may be about to be collected and may already have been finalized.
Considering TGTKGadget.Delete calls Free, which in turn explicitly Frees more gadgets, some of them may be in an inconsistent state at that point or get Freed twice. Which certainly seems bad, but I'm not sure if that can explain the segfaults.

Ronny Otto · Answer 13 · Tue Mar 12 2019 23:02:09 GMT+0800 (China Standard Time)

duplicate ...

Ronny Otto · Answer 14 · Tue Mar 12 2019 23:05:24 GMT+0800 (China Standard Time)

(excuse if the following post appears again - sent it by mail but for now it did not appear here):

Sorry if I derail the issue now: how to properly do an "auto-free" of complex types then?
Manual CleanUp() methods plus calls are requiring knowledge by the end-developer / user of your sources.
Eg. I have objects which can register their methods (and themselves) as event listeners. The event manager holds a list of all listeners ..and so also via the listener a reference to the object.

Now the collection/objects containing this object get set to null (in hope to "free/delete" it then...no longer "of use").

As the event manager still references the object it won't get freed.

So for now I have this manual CleanUp() method which unregisters previously registered listeners, nulls object links... and so on.

This requires to know about that CleanUp() and that you need to call it. I would of course prefer to get such things automated.

How to tackle such a thing? Maybe introduce something like "weak references" (something which says "this reference is not important...if only such references exist then GC can collect it).
Couldn't this be emulated with some C magic (so circumventing BMX and GC)?

Adding weak references would mean to no longer guarantee the existence of something - so in my case the "event listener" would still be registered - just the reference to the object would be invalid now. So still here and there the registered listeners would need to get checked for "validity" (and invalid get unregistered/removed then).

HurryStarfish · Answer 15 · Wed Mar 13 2019 22:11:19 GMT+0800 (China Standard Time)

Eg. I have objects which can register their methods (and themselves) as event listeners. The event manager holds a list of all listeners ..and so also via the listener a reference to the object.

How to tackle such a thing? Maybe introduce something like "weak references" (something which says "this reference is not important...if only such references exist then GC can collect it).

I think the regular approach is to just explicitly remove your event listener again once you don't need it anymore. To do that, your event manager could provide an Unregister method that takes the same argument as you previously passed to the Register method. Or the Register method could return an object that represents the event subscription and has a method for removing it.
Alternatively, weak references can indeed be a solution. You're basically looking for something like this in BlitzMax.

Couldn't this be emulated with some C magic (so circumventing BMX and GC)?

No, how would you do that when the GC manages your objects? Of course you can acquire a pointer to an object, which won't affect garbage collection (at least in theory; in practice the GC is currently conservative afaik), but that doesn't help you here, since you won't know when the object gets collected. BoehmGC does however seem to have support for weak pointers (GC_general_register_disappearing_link looks promising), so using that in NG should be possible.

Ronny Otto · Answer 16 · Wed Mar 13 2019 22:41:02 GMT+0800 (China Standard Time)

I think the regular approach is to just explicitly remove your event listener again once you don't need it anymore. To do that, your event manager could provide an Unregister method that takes the same argument as you previously passed to the Register method. Or the Register method could return an object that represents the event subscription and has a method for removing it.
Alternatively, weak references can indeed be a solution. You're basically looking for something like this in BlitzMax.

My event manager has already multiple possibilities to remove/unregister listeners (the listener itself or by various identifiers/filters - for "anonymous" possibilities).
What I was looking for is some kind of "automatic" way. So for example a TGUIPanel listens to "onClick" events of a TGUIScroller - which listens to it's TGUIButton's onClick events (a scroller has two buttons and some scroll area). Once I do not need the TGUIPanel anymore I need to explicitely inform the TGUIScroller - which informs the TGUIButtons ...
So once I set my "guiPanel" to "null" I would like to see some automatic "event listener unregister" call. I think "Delete()" is called only if there is nothing referencing it anymore. With vanilla BlitzMax the "TGUIScroller" would not get collected as the parent still references it - and so even the buttons would keep existing. All event listeners would stay active and so on.
Dunno if the latest GC changes will lead to a "Delete()" call within the TGUIScroller and then also in its TGUIButtons. If that changed behaviour at least the issue is partially solved (and if not could be overcome in "direct coupling" like registering custom callbacks to the child widgets yet I prefer the loose-coupling approach of the event listeners).
Still open is, that the event manager would still have references to the TGUIScroller (listening to button clicks) and TGUIPanel (listening to events of the TGUIScroller) - leading to neither the scroller nor the panel being collected.

BoehmGC does however seem to have support for weak pointers (GC_general_register_disappearing_link looks promising), so using that in NG should be possible.

Dunno if it is worth the hassle - I am not sure but I think weak references would most often be misused instead of rethinking your code structure.
What are your thoughts on weak references?

Brucey · Answer 17 · Thu Mar 14 2019 00:09:22 GMT+0800 (China Standard Time)

Wouldn't you be better with some kind of Remove() method which sets to Null and unregisters?

Ronny Otto · Answer 18 · Thu Mar 14 2019 01:39:01 GMT+0800 (China Standard Time)

This is how I handle it now. Just wanted to know about some semi-auto-magic approach.

HurryStarfish · Answer 19 · Thu Mar 14 2019 04:02:28 GMT+0800 (China Standard Time)

So once I set my "guiPanel" to "null" I would like to see some automatic "event listener unregister" call.

I'm not entirely sure what you mean by that. Setting something to Null is just a variable assignment like any other. It has no side effects and does nothing to the object that was being referenced. What are you suggesting should happen there?

I think "Delete()" is called only if there is nothing referencing it anymore. With vanilla BlitzMax the "TGUIScroller" would not get collected as the parent still references it - and so even the buttons would keep existing. All event listeners would stay active and so on.

Delete (aka the finalizer) is called when the garbage collector is about to reclaim the object. This can, in principle, happen soon after the object becomes unreachable, or later, or never; that's the GC's decision to make. It will however only ever happen after the object becomes unreachable; in other words when there are no more variables accessible to the running program that hold references leading to it. However, at that point references to the object may still exist, but only in other unreachable objects. That last sentence is the crucial difference to reference counting systems like the one employed by vanilla BlitzMax (in single threaded mode): vanilla would not collect any object that was still referenced at all, even if it was only by another object that was unreachable; hence objects connected by circular references would never get collected. The same was, until recently, true in NG for objects that had Delete methods - this is what the recent GC change fixed. Unreachable objects can now always be finalized and collected, no matter if and how they're connected between each other.
This means that if you have some sort of "event manager" that holds references to "listener" objects, then the listener objects (given there are no more references to them anywhere else) will stay reachable for exactly as long as the event manager does.

So for example a TGUIPanel listens to "onClick" events of a TGUIScroller - which listens to it's TGUIButton's onClick events (a scroller has two buttons and some scroll area). Once I do not need the TGUIPanel anymore I need to explicitely inform the TGUIScroller - which informs the TGUIButtons ...

If I understand this correctly, then you're saying that you have "managers" stacked onto each other: the panel is a listener with the scroller as its manager, but the scroller is also a listener for a button etc.? In that case you should only need to disconnect the "top-most" manager's listeners and if you have no more references to any of those objects elsewhere in the program, the GC should do the rest.

Dunno if it is worth the hassle - I am not sure but I think weak references would most often be misused instead of rethinking your code structure.
What are your thoughts on weak references?

I don't know, I don't remember ever really using weak references explicitly. But this sort of problem would be one of their uses. If a "manager" held its listeners only via weak references, then those listeners could be collected once they're not referenced anywhere else in the program.
Ignoring the question of whether/how we should add this feature to the standard modules, I looked into it and it actually seems surprisingly easy to implement. I'll give it a try.

Ronny Otto · Answer 20 · Thu Mar 14 2019 04:12:11 GMT+0800 (China Standard Time)

Thanks for your elaborative answer and "excourse" into the GC behaviour - which is exactly how I assumed it behaves now (and behaved before).

If I understand this correctly, then you're saying that you have "managers" stacked onto each other: the panel is a listener with the scroller as its manager, but the scroller is also a listener for a button etc.? In that case you should only need to disconnect the "top-most" manager's listeners and if you have no more references to any of those objects elsewhere in the program, the GC should do the rest.

Nah, did not nail it :-)
I have widgets which emit events to "anyone" - so they eg emit an "guiobject.onclick" with itself as sender (but no "target").
Each object in the source can then register at the event manager that it wants to listen to "guiobject.onclick" (and it is allowed to even add either a specific type name - then retrieved by reflection - or you add a specific object if you are only interested in this objects events). Next to the event name/identifier the object must provide a callback (function or even instance+methodname) to the event manager.

On happening (delayed or now) of an event the event manager iterates over all listeners and if the filters/limits allow it, the registered callbacks/handlers are executed (and maybe even evaluated - in the sense of "IsVeto()").

Now this means that the registration might bind instances of listeners (if they told to call methods) or binds filters (if you only want to listen to a certain object's events).

I know that I am pretty ... overwhelmed ... when trying to explain my stuff in English - so if it is still a bit unclear feel free to write me a mail to ron @ gamezworld.de and I will explain it there in German (which seems our boths mother tongue).

I'll give it a try.

Cool, let's see with what you come up with.

HurryStarfish · Answer 21 · Thu Mar 14 2019 04:18:31 GMT+0800 (China Standard Time)

By the way, there is a nice blog entry by Eric Lippert on finalizers in .NET. While not all of that applies to BlitzMax - things are a bit less complicated here - I think it, and the comments below, are worth a read nonetheless. Finalizers should be used very carefully. 🙂

Ronny Otto · Answer 22 · Thu Mar 14 2019 06:59:18 GMT+0800 (China Standard Time)

Thanks for the link - have read it but did not learn something as you already explained that :-)