Performance question - Having lots of c++ monobehaviour's is slower than expected.
AustinSmith13 opened this issue · comments
Hello,
I am planning on using your project for lua bindings instead of moonsharp. As a quick test to see how fast it runs, I spawned 1,000 GameObjects with the BaseBall script.
I also changed the object store both in c# and c++ to allow up to 10,00 objects.
The result was poor, each script was taking 0.078 ms in Editor and 0.070 ms in the Il2cpp build.
I then implemented the BaseBall script in C# and performed the same test.
My results where much better. each script was taking about 0.01 ms to complete in Editor.
This is my first time trying something like this in Unity3d, I read your blog and it looked like you where getting much better results "C++ can still make 13,140 Unity API calls in a single millisecond.". Executing the c++ scripts is taking me 70 - 80 ms, this seems wrong to me.
Is it better to only have one monobehaviour that manages multiple game objects?
That is indeed quite poor performance. In the article you're referring to, I ran on an LG Nexus 5X which wasn't even a fast Android phone in 2017. I was also using Unity 2017.1, which is now quite an old version. Despite these challenges, I still got an average of 0.0000761 ms per call into C++. This means you're getting about 1000x worse performance.
So I wonder what the difference is between our two tests. There's no way your test device is 1000x slower than mine, so the cause is likely software: Unity/IL2CPP, OS/Android, UnityNativeScripting, and the benchmark code itself could all be at fault. Could you post your fork of the repo so that I can take a look?
Sure, I appreciate the help. I'll post a fork with the setup.
@jacksondunstan I came back to this again and gave it another shot. I haven't forked yet because its tied heavily into our project.
Uses .net 4.0 and IL2CPP
Unity version 2019.2.9f1
Each test spawns 1000 objects that go back and forth.
The C# implementation runs in 8 ms.
The C++ implementation runs in 90 ms
The Moonsharp Lua implementation runs in 26 ms.
I attached the implementations in-case your interested. I'll try to fork this sometime this week, just posting my results for now.
C#
public class BallBehavior : MonoBehaviour
{
public static float ballDir = -1;
// Start is called before the first frame update
void Start ()
{
}
// Update is called once per frame
void Update ()
{
Transform transform = this.gameObject.transform;
Vector3 pos = transform.position;
float speed = 3.2f;
float min = -1.5f;
float max = 1.5f;
float distance = UnityEngine.Time.deltaTime * speed * ballDir;
Vector3 offset = new Vector3 (distance, 0, 0);
Vector3 newPos = pos + offset;
if (newPos.x > max)
{
ballDir *= -1.0f;
newPos.x = max - (newPos.x - max);
if (newPos.x < min)
{
newPos.x = min;
}
}
else if (newPos.x < min)
{
ballDir *= -1.0f;
newPos.x = min + (min - newPos.x);
if (newPos.x > max)
{
newPos.x = max;
}
}
transform.position = newPos;
}
}
Lua
ballDir = -1
function ENTITY:Start ()
Log('hello world')
end
function ENTITY:Update (deltatime)
local pos = self:GetPosition()
local speed = 2.2
local min = -3.5
local max = 3.5
local distance = deltatime * speed * ballDir
local offset = Vector3(distance, 0, 0)
local newPos = pos + offset
if newPos.x > max then
ballDir = ballDir * -1
newPos.x = max - (newPos.x - max)
if newPos.x < min then
newPos.x = min
end
elseif newPos.x < min then
ballDir = ballDir * -1
newPos.x = min + (min - newPos.x)
if newPos.x > max then
newPos.x = max
end
end
self:SetPosition(newPos)
end
The C++ implementation
#include "Bindings.h"
#include "Game.h"
extern "C"
{
#include "lua.h"
#include "lauxlib.h"
#include "lualib.h"
}
using namespace System;
using namespace UnityEngine;
namespace
{
struct GameState
{
float BallDir;
};
GameState *gameState;
} // namespace
namespace MyGame
{
void BallScript::Update()
{
Transform transform = GetTransform();
Vector3 pos = transform.GetPosition();
const float speed = 3.2f;
const float min = -1.5f;
const float max = 1.5f;
float distance = Time::GetDeltaTime() * speed * gameState->BallDir;
Vector3 offset(distance, 0, 0);
Vector3 newPos = pos + offset;
if (newPos.x > max)
{
gameState->BallDir *= -1.0f;
newPos.x = max - (newPos.x - max);
if (newPos.x < min)
{
newPos.x = min;
}
}
else if (newPos.x < min)
{
gameState->BallDir *= -1.0f;
newPos.x = min + (min - newPos.x);
if (newPos.x > max)
{
newPos.x = max;
}
}
transform.SetPosition(newPos);
}
} // namespace MyGame
// Called when the plugin is initialized
// This is mostly full of test code. Feel free to remove it all.
void PluginMain(
void *memory,
int32_t memorySize,
bool isFirstBoot)
{
gameState = (GameState *)memory;
if (isFirstBoot)
{
lua_State *L = luaL_newstate();
luaL_dostring(L, "return 'lua is working! 2'");
const char *str = lua_tostring(L, -1);
Debug::Log(String(str));
lua_close(L);
String message("Game booted up");
Debug::Log(message);
// The ball initially goes right
gameState->BallDir = 1.0f;
for (int32_t i = 0; i < 1000; i++)
{
// Create the ball game object out of a sphere primitive
GameObject go = GameObject::CreatePrimitive(PrimitiveType::Cube);
// go.
String name("GameObject with a BallScript");
go.SetName(name);
// Attach the ball script to make it bounce back and forth
go.AddComponent<MyGame::BaseBallScript>();
}
}
}
I figured out what was the issue. const int BaseMaxSimultaneous = 5000;
setting this to a higher value causes poor performance.
// Look up the object in the hash table
int initialIndex = (int)(
((uint)obj.GetHashCode()) % maxObjects);
int index = initialIndex;
do
{
if (object.ReferenceEquals(keys[index], obj))
{
return values[index];
}
index = (index + 1) % maxObjects;
}
while (index != initialIndex);
Its spending to much time trying to find an objects handle.
The Fix
So I changed this to cache with a dictionary instead and use the stack object to keep track of free handles, and now I can have the object store max size set to 10,000 and not suffer performance issues previously.
I'll submit a pull request to show the changes I made if your interested.
Hello!
So, did c++ script outperform c# equivalent?
@Dimous Depending on what your making it seems it can be faster, in my case embedding lua.
My results: Spawns 1000 cubes that go back and forth
- C# 11.1 ms
- C++ 11.1 ms
Unfortunately they where capped at 90 fps when I tested.
C++ did outperform C# Moonsharp for embedding lua. It was several times faster.
@jacksondunstan How would I have c# call c++ or have some sort two way communication using the generated bindings?
@AustinSmith13 Thanks for posting all of your findings and for your PR! It does look like you ended up capped at 90 FPS, which is 1000ms / 90 ≈ 11.1ms, with both versions. Glad to hear it at least beat MoonSharp. 👍
As for two-way communication using the generated bindings, all of the bound C# functionality is accessible to C++. This is how you're able to call functions like Debug::Log
. For calls from C# into C++, it's done like with BallScript
in the example. See this article for full details of how it works behind the scenes and for some examples. For more raw function call access, feel free to use P/Invoke directly.