jacksondunstan / UnityNativeScripting

Unity Scripting in C++

Home Page:https://jacksondunstan.com/articles/3938

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Performance question - Having lots of c++ monobehaviour's is slower than expected.

AustinSmith13 opened this issue · comments

Hello,

I am planning on using your project for lua bindings instead of moonsharp. As a quick test to see how fast it runs, I spawned 1,000 GameObjects with the BaseBall script.

I also changed the object store both in c# and c++ to allow up to 10,00 objects.

The result was poor, each script was taking 0.078 ms in Editor and 0.070 ms in the Il2cpp build.

I then implemented the BaseBall script in C# and performed the same test.

My results where much better. each script was taking about 0.01 ms to complete in Editor.

This is my first time trying something like this in Unity3d, I read your blog and it looked like you where getting much better results "C++ can still make 13,140 Unity API calls in a single millisecond.". Executing the c++ scripts is taking me 70 - 80 ms, this seems wrong to me.

Is it better to only have one monobehaviour that manages multiple game objects?

That is indeed quite poor performance. In the article you're referring to, I ran on an LG Nexus 5X which wasn't even a fast Android phone in 2017. I was also using Unity 2017.1, which is now quite an old version. Despite these challenges, I still got an average of 0.0000761 ms per call into C++. This means you're getting about 1000x worse performance.

So I wonder what the difference is between our two tests. There's no way your test device is 1000x slower than mine, so the cause is likely software: Unity/IL2CPP, OS/Android, UnityNativeScripting, and the benchmark code itself could all be at fault. Could you post your fork of the repo so that I can take a look?

Sure, I appreciate the help. I'll post a fork with the setup.

@jacksondunstan I came back to this again and gave it another shot. I haven't forked yet because its tied heavily into our project.

Uses .net 4.0 and IL2CPP
Unity version 2019.2.9f1

Each test spawns 1000 objects that go back and forth.

The C# implementation runs in 8 ms.

The C++ implementation runs in 90 ms

The Moonsharp Lua implementation runs in 26 ms.

I attached the implementations in-case your interested. I'll try to fork this sometime this week, just posting my results for now.

C#

public class BallBehavior : MonoBehaviour
{
    public static float ballDir = -1;

    // Start is called before the first frame update
    void Start ()
    {

    }

    // Update is called once per frame
    void Update ()
    {
        Transform transform = this.gameObject.transform;
        Vector3 pos = transform.position;

        float speed = 3.2f;
        float min = -1.5f;
        float max = 1.5f;
        float distance = UnityEngine.Time.deltaTime * speed * ballDir;
        Vector3 offset = new Vector3 (distance, 0, 0);
        Vector3 newPos = pos + offset;
        if (newPos.x > max)
        {
            ballDir *= -1.0f;
            newPos.x = max - (newPos.x - max);
            if (newPos.x < min)
            {
                newPos.x = min;
            }
        }
        else if (newPos.x < min)
        {
            ballDir *= -1.0f;
            newPos.x = min + (min - newPos.x);
            if (newPos.x > max)
            {
                newPos.x = max;
            }
        }

        transform.position = newPos;
    }
}

Lua

ballDir = -1

                function ENTITY:Start ()
                    Log('hello world')
                end

                function ENTITY:Update (deltatime)
                    local pos = self:GetPosition()

                    local speed = 2.2
                    local min = -3.5
                    local max = 3.5
                    local distance = deltatime * speed * ballDir
                    local offset = Vector3(distance, 0, 0)
                    local newPos = pos + offset

                    if newPos.x > max then
                        ballDir = ballDir * -1
                        newPos.x = max - (newPos.x - max)
                        if newPos.x < min then
                            newPos.x = min
                        end
                    elseif newPos.x < min then
                        ballDir = ballDir * -1
                        newPos.x = min + (min - newPos.x)
                        if newPos.x > max then
                            newPos.x = max
                        end
                    end

                    self:SetPosition(newPos)
                end

The C++ implementation

#include "Bindings.h"
#include "Game.h"

extern "C"
{
#include "lua.h"
#include "lauxlib.h"
#include "lualib.h"
}

using namespace System;
using namespace UnityEngine;

namespace
{
struct GameState
{
	float BallDir;
};

GameState *gameState;
} // namespace

namespace MyGame
{
void BallScript::Update()
{
	Transform transform = GetTransform();
	Vector3 pos = transform.GetPosition();

	const float speed = 3.2f;
	const float min = -1.5f;
	const float max = 1.5f;
	float distance = Time::GetDeltaTime() * speed * gameState->BallDir;
	Vector3 offset(distance, 0, 0);
	Vector3 newPos = pos + offset;
	if (newPos.x > max)
	{
		gameState->BallDir *= -1.0f;
		newPos.x = max - (newPos.x - max);
		if (newPos.x < min)
		{
			newPos.x = min;
		}
	}
	else if (newPos.x < min)
	{
		gameState->BallDir *= -1.0f;
		newPos.x = min + (min - newPos.x);
		if (newPos.x > max)
		{
			newPos.x = max;
		}
	}
	transform.SetPosition(newPos);
}
} // namespace MyGame

// Called when the plugin is initialized
// This is mostly full of test code. Feel free to remove it all.
void PluginMain(
	void *memory,
	int32_t memorySize,
	bool isFirstBoot)
{
	gameState = (GameState *)memory;
	if (isFirstBoot)
	{
		lua_State *L = luaL_newstate();
		luaL_dostring(L, "return 'lua is working! 2'");

		const char *str = lua_tostring(L, -1);

		Debug::Log(String(str));

		lua_close(L);

		String message("Game booted up");
		Debug::Log(message);

		// The ball initially goes right
		gameState->BallDir = 1.0f;

		for (int32_t i = 0; i < 1000; i++)
		{
			// Create the ball game object out of a sphere primitive
			GameObject go = GameObject::CreatePrimitive(PrimitiveType::Cube);
		//	go.
			
			String name("GameObject with a BallScript");
			go.SetName(name);

			// Attach the ball script to make it bounce back and forth
			go.AddComponent<MyGame::BaseBallScript>();
		}
	}
}

I figured out what was the issue. const int BaseMaxSimultaneous = 5000; setting this to a higher value causes poor performance.

// Look up the object in the hash table
					int initialIndex = (int)(
						((uint)obj.GetHashCode()) % maxObjects);
					int index = initialIndex;
					do
					{
						if (object.ReferenceEquals(keys[index], obj))
						{
							return values[index];
						}
						index = (index + 1) % maxObjects;
					}
					while (index != initialIndex);

Its spending to much time trying to find an objects handle.

The Fix

So I changed this to cache with a dictionary instead and use the stack object to keep track of free handles, and now I can have the object store max size set to 10,000 and not suffer performance issues previously.

I'll submit a pull request to show the changes I made if your interested.

Hello!
So, did c++ script outperform c# equivalent?

@Dimous Depending on what your making it seems it can be faster, in my case embedding lua.

My results: Spawns 1000 cubes that go back and forth

  • C# 11.1 ms
  • C++ 11.1 ms

Unfortunately they where capped at 90 fps when I tested.

C++ did outperform C# Moonsharp for embedding lua. It was several times faster.

@jacksondunstan How would I have c# call c++ or have some sort two way communication using the generated bindings?

@AustinSmith13 Thanks for posting all of your findings and for your PR! It does look like you ended up capped at 90 FPS, which is 1000ms / 90 ≈ 11.1ms, with both versions. Glad to hear it at least beat MoonSharp. 👍

As for two-way communication using the generated bindings, all of the bound C# functionality is accessible to C++. This is how you're able to call functions like Debug::Log. For calls from C# into C++, it's done like with BallScript in the example. See this article for full details of how it works behind the scenes and for some examples. For more raw function call access, feel free to use P/Invoke directly.