'locking" broken

Question

'locking" broken

tejacques opened this issue 9 years ago · comments

From my testing, using locking over localStorage as a mechanism is fundamentally broken due to the way the localStorage data propagation model works. An example is this:

open two windows on the same domain and start a busy loop in each.
attempt to acquire the lock in window 1 in 1 second after both windows are open, and hold it for 2s before releasing it, and the lock in window 2 after ~2 seconds, also hold it for 2s before releasing it.
both windows will hold the 'lock' at the same time, ~2 seconds in to ~3 seconds in.
This happens because localStorage propagates changes with events, and if the event loop is not yielded to, localStorage does not get updated. This may not be true of ALL browsers, but it's true of most.

Andrew Wakeling · Answer 1 · Tue Aug 11 2015 15:57:53 GMT+0800 (China Standard Time)

Edit: It looks as though this library does yield after calling setItem. However I did manage to break it (see my reply below).

The following code snippet below helps explain @tejacques 's concerns.

// Run this in the 1st tab
localStorage['foo'] = '1';
var startDate = new Date().valueOf();
while (new Date().valueOf() - startDate < 10000) {}
console.log(localStorage['foo']);

// Within 10 seconds, run this code in the 2nd tab.
localStorage['foo'] = '2';

You'll observe that:

in the context of the first script, localStorage['foo'] is still observed to be '1'
even though "localStorage['foo'] = 1" is run earlier than localStorage['foo'] = 2, it doesn't appear to be ever persisted into Local Storage

Andrew Wakeling · Answer 2 · Wed Aug 12 2015 11:00:08 GMT+0800 (China Standard Time)

I managed to break it using: https://gist.github.com/andrewwakeling/430db7720a4393c4324b

It looks to eventually fail using a number of tabs in Chrome 44.

It looks as though 2 tabs manage to get hold of the lock simultaneously. Please let me know if something looks wrong with my test.

Andrew Wakeling · Answer 3 · Tue Sep 01 2015 14:52:26 GMT+0800 (China Standard Time)

For Chrome 44, in my example, it appears that the value is now written to local storage immediately (i.e. If you go to a 2nd tab, evaluating localStorage['foo'] will be '1').

Jasmin Muharemovic · Answer 4 · Fri Oct 27 2017 17:49:27 GMT+0800 (China Standard Time)

Hi, I think I've found what's wrong with the locking logic.

First, I have to say this is a fantastic idea and work, I'm wondering if the reason for it not being more popular is this bug, which causes really weird behavior. I started using the IWC-SignalR and with my livereload in dev environment and several tabs opened, this bug is not that uncommon. I used the following to reproduce it and troubleshoot everything - I added a bunch of console.log lines with exact times to your library, opened 4 tabs and looked what happens every time I refresh a tab which is a SignalR connection owner.

I quickly determined that the problem is not in the IWC-SignalR, but in the IWC. I was afraid that it's somehow related to the base of locking logic, which is your InterlockedCall with its complex timer based synchronization, but that proved to be rock solid in my testing. Simply, the interlocked calls among different tabs were always sequential, so it was very reassuring to see the foundation of it all being healthy. The problem turned out to be related to the clearJunkLocks call, its logic and also it not being synchronized with the lock obtaining code. There are 2 separate problems I detected:

The first problem I noticed was that the refreshed tab's clearJunkLock clears a valid lock established by another open tab a moment before. The reason for that is the logic in the WindowMonitor updateDataFromStorage. It updates openWindows variable with the new state from local storage AFTER firing the onWindowsChanged event, which triggers clearJunkLocks, which checks whether a found lock belongs to a closed window and it relies on openWindows variable for that. This causes 2 issues:

a) - the clearJunkLocks fired immediately in all tabs upon detection of the current tab unload/reload doesn't do anything as the unloaded tab is still reported to be open
b) - the clearJunkLocks fired in the reloaded tab treats a lock obtained by another tab as junk lock, since that tab is not yet present in the reloaded tab's openWindows and hence is reported as closed

I fixed this in a way that I moved the code which updates openWindows before the code that fires the OnWindowsChanged event. It fixes 1-b, but 1-a causes a new issue which I described below:

After 1) is fixed, a tab reload causes clearJunkLocks to be immediately fired in other tabs, but it doesn't mean that happens at exactly the same time in all of them. One tab may have done clearJunkLocks and proceeded with obtaining a now free lock, while another one does a clearJunkLocks right about that same time, sees the lock still belonging to the unloaded tab and clears it. There's no guarantee about which kind of sequence of clearJunkLocks and lock obtaining calls will play out. The only way I figured out to fix this is to rewrite clearJunkLocks so that it also is interlocked with the same id as the lock obtaining code. After that, I didn't notice any more duplicate SignalR connections opened. Though it's always possible that my manual testing didn't cover or play out all the possible scenarios, I tested it probably a hundred times, it never happened after the fix.

I made a pull request (#8) with everything above, I leave to you to decide if you are going to use it or find a better way to handle all this.

Finally, having said all this, the additional clearJunkLocks interlocked calls do make everything slightly slower. I'm not sure why clearJunkLocks is even needed, I assume possibly for some complex dynamically triggered locking scenarios. For my purposes, I intend to only use a single lock, the one for SignalR, and it happens on every tab load in my case. So, I changed your code for my purposes in a way that I simply don't call clearJunkLocks ever as the junk detection is already embedded in the lock obtaining code. That also works, I haven't noticed any duplicate connections and it's faster. For people that decide to do this, the only thing that must not be commented out related to the clearJunkLocks is the setLocksInitialized code, which always needs to be called at the library's start.

Yevhenii Khudyi · Answer 5 · Wed Nov 01 2017 20:06:01 GMT+0800 (China Standard Time)

@jasmh Thank you for your investigation. The purpose of clearJunkLocks is to avoid pollution of localStorage (size is limited at several megabytes). But you are right that in most cases this is not a problem. If you want to eliminate clearJunkLocks, be aware that setLocksInitialized shall be called after WindowMonitor is ready.

Krzysztof Chodak · Answer 6 · Fri Jun 26 2020 18:52:06 GMT+0800 (China Standard Time)

Hi, could someone confirm that this issue is still valid or could be closed thanks to pull request (#8)?