Rajawali / Rajawali

Android OpenGL ES 2.0/3.0 Engine

Home Page:https://rajawali.github.io/Rajawali/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Rajawali 2.0

jwoolston opened this issue · comments

@ToxicBakery @MasDennis Your input here is highly desired. I am preparing myself for a large undertaking here and while your assistance would be greatly appreciated, I would settle for some design review. As always, users are welcome and encouraged to comment as well.

This is a summary of my current thoughts so far. I will update with more details as they are formulated/discussed.

Development is happening in the v2.0-development branch here

What It Will Be

Version 2.0 of Rajawali will be a ground up overhaul of the core render process. This will include data management, threading and the render process as a whole.

What It Won't Be

Rajawali has always been primarily a rendering engine. This rewrite will not change that, though it will make it easier to integrate other features necessary for game engines such as AI, sound and physics. There are great libraries for all of this out there, and we will not be undertaking their development.

Why?

Rajawali got its start in 2011 following in the foot steps of Min3D. Min3D was designed for Open GL 1.1 and for a variety of entirely valid reasons, Rajawali got its start from the design of Min3D. As features have been added to Rajawali over the years, its complexity and power has grown, and these features have been crammed into that same initial design. This has reached a point where trying to fix some of the issues with the library has become unpleasant and monumental. I wish to change that.

Some Basic Goals

There are a number of feature requests and bugs still out there. I wish to eventually be able to address all of these, but the primary development goal will be on getting the core engine back to its current state (or better) but with the following design points:

  • Simplified rendering pipeline
  • Scenes control everything
  • All rendering happens through a system similar to the current post processing system
  • Interface and composition driven
  • Simplified version of ATransformable3D
  • Multi-threaded design

Details Of Each Goal

Simplified rendering pipeline

The current render pipeline and lifecycle is fragmented and difficult to follow. Post processing and color picking use nearly identical methods, yet are controlled in two places. Scenes and Renderers have methods with the same name. I know Rajawali code inside and out and its difficult for me to follow.

  • Renderer should do nothing more than delegate for the methods they are required to implement for Android.
  • Each GL context Rajawali handles should have a global object attached to it. All GL calls (or at least ones which change the GL state) will be made through this object, allowing it to track GL state.
    • When an object is rendered, it provides a structure of the GL state it needs and the difference is applied.
    • This means objects no longer need to prepare/unwind the GL state for their render pass
    • The state manager can be configured through a simple call to skip difference evaluation and always apply the full state requests

Scenes control everything

Right now scenes are somewhat equal partners with the Renderer classes. This has resulted in some things being handled by scenes, while other things are handled by Renderers, and lot of data is shared between the two. This is a major source of headache for new users as well as old.

  • Objects (including lights and cameras) can only be added to one scene
    • Geometry data (VBOs and the like) can still be shared
  • Scenes are comprised of scene nodes (this means scene graphs will be an core function now, with a default single node graph for simple scenes)
    • These nodes will hold objects
    • No child objects. A scene node can be used as a container for a group of objects which should be handled together. This accomplishes the same goal while simplifying the Object3D class. Material batching is handled via the global state manager
    • Bounding volumes which are rendered for debug purposes are children of the node, not the object

All rendering happens through a system similar to the current post processing system

By planning for multi pass rendering from the get go, we can simplify a lot for ourselves and users.

  • A central anti aliasing manager can be provided which can control Coverage, MSAA, and post processing anti aliasing effects on the fly
  • Simplifies the path forward for deferred shading and Vulkan integration, should we ever get to it

Interface and composition driven

The current design uses a lot of concrete type passing and inheritance. Java is much better suited to interfaces and extension through composition rather than inheritance.

  • Provides for a more flexible design using the type of OOD Java was built for
  • Provides an easier path to testing
  • Simplifies the public API

Simplified version of ATransformable3D

ATransformable3D right now tries to do too much. It makes attempts at handling things for cameras, objects and lights.

Multi-threaded design

While the current design does a decent job of making operations thread safe with respect to controlling the thread of GL calls, it was a bit of a bandaid solution and has its flaws.

  • Android lifecycle must be accounted for with operations, particularly on cleanup
  • All GL calls must still be on the GL thread
  • Animations, loaders, etc should be on different threads.
  • Scenes will still have an initialization method, but it wont be run on the GL thread.
    • This should help the UI system get not be hung up when TextureView is being used and scene initialization is taking a long time.

How will animations be offloaded to another thread? Won't that require thread locking on the orientation which will end up slowing everything down more than what it is right now? Not against it just wondering if you have a better idea for this as we know immutable quats are not an option due to memory pressure and we can't perform partial updates of quats as that would have obviously undesirable results.

Moving loaders to their own thread is a great idea but I believe this has design considerations similar to the animations. Loaders, like AWD, sometimes need to create multiple models, textures, etc. Some of these operations require interaction with the GL thread such as creating a texture (right?) and other operations that AWD supports like shaders also require the GL thread I believe.

These aren't things that can not be overcome but it does mean the 3d object class or the scene or whatever will need some way to take these requests that must be performed on the GL thread and then finally post back that the operation has been completed so the loader can continue it's work.

That is what I am in the process of working through. I am not sure it can be done in a performant way yet, but it is my hope. My current thinking is something along the lines of a ReentrantReadWriteLock where each render pass (which includes any queued GL calls such as material creation, texture pushes, etc) acquires a read lock and each animation step, object add/remove/modify, etc must acquire a write lock. The theory here being at each modification step, a waiting render pass would have an opportunity to acquire the lock, where as all modifications would have to wait until a render pass has finished. Animation timing could be checked at the start of each step rather than each frame. What I have yet to solidify in my head is:

  1. Does this actually help or is it just a bunch of optimistic overhead. Perhaps animations should not be included here but the question of user code on other threads remains so I would like to find a possible solution to let other threads manipulate objects
  2. Does the Read-Write lock help or would a simple ReentrantLock or even just synchronized block or concurrent queue structure work better
  3. Is there a better way? Other options I have considered are some sort of transactional or snapshot system. Obviously the complexity here can explode. One thing I have ruled out is partial tree locking, such as the render pass only needs half the tree so it acquires a lock for the nodes of that half, while an animation touches the other half. The issue here is it would be possible to have a render pass which shows an object, a pass which does not, followed by another pass which does.

Moving loaders to their own thread is a great idea but I believe this has design considerations similar to the animations. Loaders, like AWD, sometimes need to create multiple models, textures, etc. Some of these operations require interaction with the GL thread such as creating a texture (right?) and other operations that AWD supports like shaders also require the GL thread I believe.

These aren't things that can not be overcome but it does mean the 3d object class or the scene or whatever will need some way to take these requests that must be performed on the GL thread and then finally post back that the operation has been completed so the loader can continue it's work.

To address this, my thinking was as follows (we have discussed variations on this periodically):

All objects which require touching GL for their initialization to complete will created by whatever thread wishes to create them, and added to a work queue for the GL thread, similar, though not identical to the current system. Each of these tasks will have weights assigned to them. For example, creating geometry VBOs is generally speaking a light weight operation and could be assigned a weight of 1. Texture pushes are very heavy and could be assigned a weight of 10. Other operations could fall in between. Through testing, we could determine a reasonable number of each task that can be performed per frame for different classes of devices, and optimistically assign a maximum weight sum. The render pass would execute tasks from the queue until it hit this threshold then move on to rendering until the next pass.

To this end, no final GL initialization will be done on objects until the render pass has processed their add/remove/update. However, this is not to say that certain changes cannot be immediate. In the case of destroying an object, it can be removed from the scene immediately by any thread which is able to acquire the lock. This will prevent a render pass from attempting to draw it. As part of the removal operation, the commands to destroy its geometry data (if appropriate) can be queued.

In order to preserve the FIFO needs of these tasks with minimal overhead, no attempt will be made to fix the situation of a developer adding an object which uses a texture prior to adding the texture. We can however use initialization flags to determine that all resources of the object are not yet ready. Alternatively, we can simply allow it to happen in the interests of efficiency as in my experience, OpenGL will just drop those render attempts and set an error code without crashing.

As I see this use case, in the vast majority of cases the queue will be empty or nearly empty. Even when adding large models, I expect the queue to be able to be able to drain within a handful of frames.

each animation step, object add/remove/modify, etc must acquire a write lock

To clarify, when I say each animation step, I mean each time step of an individual animation (or group if they must be locked together). So Animation 1 will acquire a lock, make its step, release the lock. Animation 2 must then acquire a lock, make its step, release, etc.

Regarding animations: As a (probably extreme) use case data point, my app choreographs upwards of 15 simultaneous (customized) animations, some of which might update upwards of 30 objects each, with multiple properties animated per object/group, and with ramping ("2nd derivative") loop durations (to provide smooth transitions between steady state animations).

I would be very surprised if the sync overhead/non-determinism of running all of those in separate threads didn't make something glitch badly, or even break, especially the ramping. Right now, it works fine... Maybe I don't understand how it might work, but I'm not sure what the upside would be either.

The idea with the new system is that when anything (say, an animation in this case) wants to modify the position/scale/rotation of objects, and hence the scene structure, it must request the ability to do so. This is implemented as a visitor type pattern where you pass an object, in this case an instance of a new interface SceneModifier which will have the doModifications(SceneGraph) method called. The intention here is that the caller will batch multiple changes into one sync call IF THEY ARE RELATED. In your case, since your animations run in lock step with each other, they are related and should all be run with one call. This would result in no difference with the existing system. The advantage of doing things this way is to allow for more complex animation logic without forcing it to be tied to the renderer (think streaming data playback, AI processing, multiplayer network transfer). Now, in cases where render time and animation time are coupled this becomes a bit of an issue and in this case, and yours, I have yet to come up with a completely thought through solution. You are the 3rd person to raise the sync overhead concerns regarding this rewrite and I make no claims that they aren't legitimate, but I think it is too hard to tell at this point. My focus right now is on getting back to a rendering state so I can then write some threading stress tests to see what happens, at which point I will be soliciting additional feedback from people. If you have some pseudo-code, real code or anything like that you are willing/able to share that gives me a better idea of your use case, I am more than happy to take it into account. I am happy to keep it private as well if need be or work on rolling it into a stress test if permissible. Also, is your app in the marketplace? I would love to see an example of Rajawali being used to a high degree...it always helps keep the motivation level high.

That is a good use case to be concerned about. Thank you for commenting @rpicolet

In your case, since your animations run in lock step with each other, they are related and should all be run with one call.

I think I get the basic drift, even though I'm not sure how the Visitor pattern would work in detail. I think you are saying that all registered animation update()s that currently run synchronously at the start of each frame's Scene.render() could still be run as a group in a different synchronous method (SceneModifier.doModifications()) still on the GL thread, but not in the render() method itself? If so, that seems like it addresses one concern.

Another concern is that animation update()s are currently all driven by the deltaTime input parameter. While it seems straightforward to guarantee that all the doModifcations() updates for a frame would access/see the same value, if the animation frame itself is decoupled from the render (presumably driven by a separate timer thread that would not be in "lock-step" with render()?) it seems like there would be no guarantee that deltaTimes are not skipped. And that seems very likely to glitch if a prior render with an unusually long delta gets skipped, and then likely followed by a similarly shortened delta for the unskipped frame, especially at transform, loop, and ramp edges (which can all occur independently in my custom animations). I also post a lot of update methods to the GL thread (due to lots of user interactivity), which can obviously make render() more jittery/less than monotonic, and exacerbate the skipping just when it is most noticeable.

Also, driving the animation timing from another thread seems like it would just add overhead, for the timer thread itself and for queuing up the resulting Visitor call to the GL thread, as well as increasing the latency and jitters. As the threads would likely have the same or similar update rates, they could end "beating" or "phasing" against each other, again causing additional friction, even if deltaTime is calculated by the timer thread.

Ultimately, it just seems to me like having the most accurate and up-to-date deltas possible at update() time would allow the least glitching, the smoothest results, and the best performance for animations.

Of course, for reading/parsing model files or other non-critical "background" tasks with no hard deadline or obvious visual consequences (other than it takes a fraction of a second longer to complete), sure, offloading/queueing/prioritizing syncs makes a lot more sense to me. By comparison, it seems to me like animation updates should be relatively lightweight, even if conditional/math heavy, so multi-threading those just to leverage multi-processing doesn't seem like a big win, performance-wise. OTOH, my 3D model has a pretty low triangle count (by design), so others might well disagree and prefer to offload the GL thread AMAP.

If you have some pseudo-code... [elided] ... rolling it into a stress test if permissible.

I'm quite willing to share my app source code with you (and Tox if interested) given (at least for now) the proviso for privacy. I have been mulling making the whole thing open source as a demo/example for my droidACK MVC library anyway, but haven't quite gotten there yet, so it's not on github just yet. Until then I could zip up the project and send you a link for that via private email if you think that would be useful. I'm not sure it would be very easy to extract test cases, but I'll let you decide that. I will of course try to answer any questions or help with any test dev.

< Also, is your app in the marketplace? I would love to see an example of Rajawali being used to a high degree...it always helps keep the motivation level high.

I kinda have it in a closed Alpha on the Play store (it's mostly code complete, but needs lots of resource assets), but again I can just share a link to the apk so you can sideload it, if you like...
I'm not sure it meets the "high degree" mark for Rajawali proper, as most of the complexity is in the user interactivity, scene changes, continuous animations, and 2D integration, while the traditional 3D rendering part is pretty basic (except for object picking, of course ;-) . However, it may be a good candidate for testing scene graphs (once it is converted of course), as I use a deep, dynamic, and interactive container hierarchy...

still on the GL thread, but not in the render() method itself

Not quite...on their own thread, but at a time guaranteed to have the GL thread waiting on it.

As for the time delta's, I think this is the biggest potential issue here and if we don't come up with a novel solution, then animations will stay on the GL thread.

Everything you mention is exactly what I am looking for. I've tried adding some higher level features for things that aren't necessarily the 3D artistic stuff and would love to see a use case of how someone used them, that way I can better improve them. If you want to email me links, that would be great. My address is on my github profile. Confidentiality will be respected to the extreme. If I am able to come up with any testing based on any of it, I will run all of it past you first before committing it.

Not quite...on their own thread, but at a time guaranteed to have the GL thread waiting on it.

If the GL thread is waiting/blocked anyway, what's the advantage to doing the work on a different thread? I'm clearly missing something... But don't feel obliged to educate me. I'm just asking questions in case it helps.

As for the time delta's, I think this is the biggest potential issue here and if we don't come up with a novel solution, then animations will stay on the GL thread.

Some experiments might be needed to see if there is actually any noticeable glitching. I'm speculating as well here.

Everything you mention is exactly what I am looking for...

OK, sorry it took a while for me to get around to the links, but you should have them in your email now.

BTW, I really like the overall theme of cleaner separation of concerns you are pursuing here for 2.0. As things start to gel for you design-wise and you think there is some spec/coding sub-task or other that I might be qualified to help with, please feel free to ask. no matter how small.

If the GL thread is waiting/blocked anyway, what's the advantage to doing the work on a different thread? I'm clearly missing something... But don't feel obliged to educate me. I'm just asking questions in case it helps.

This is precisely why I created this ticket. I prefer not to operate in a vacuum so everything from (constructive) criticism to playing devil's advocate is welcome. The advantage, in my mind, would be to structure the animation in such a way that if animation computation takes longer than a single frame for any reason, be it math, networking, whatever, it wouldnt lag the rendering. In light of our discussion, I am beginning to suspect that this is engineering a solution for a non-existent problem. I would like @ToxicBakery to give some thought on this as well.

As things start to gel for you design-wise and you think there is some spec/coding sub-task or other that I might be qualified to help with, please feel free to ask. no matter how small.

I'll be in touch via email.

@rpicolet Regarding contributing to this - to review:

All rendering happens through a system similar to the current post processing system

By planning for multi pass rendering from the get go, we can simplify a lot for ourselves and users.

  • A central anti aliasing manager can be provided which can control Coverage, MSAA, and post processing anti aliasing effects on the fly
  • Simplifies the path forward for deferred shading and Vulkan integration, should we ever get to it

Some Details

  • Each scene should have a post processing manager
    • By default it will just contain a render pass to screen.
  • Need to develop an anti aliasing manager
    • It will need to support several modes
    • Hardware AA: Effects like MSAA, Coverage AA, TXAA which require flags to be set in the EGL configuration and cannot be changed after surface configuration
    • Software AA: Effects like FXAA, SSAA
    • Mixed: A combination of the two
    • These should have XML attributes for the views allowed and the ability to be used as OR'd flags, similar to layout gravity rules in FrameLayout
    • The view attribute processing in the views will handle the situation of bad combinations which will be determined by the anti aliasing manager
  • The renderer implementation will hold the singular anti aliasing manager. The default anti aliasing mode will be none, however if a view has it configured in XML, the manager will be initialized with this configuration.
  • Scenes will have a method the renderer implementation calls when switching scenes, passing along the view configured AA configuration. Any software AA effects will need to be added in this method by the scene to its post processing manager. At this time, the scene make make any changes to the software anti aliasing it desires.
  • Color picking should be adapted to this system
    • For an initial overview, the color picking process from the old Scene class should be extracted into an effect which the post processing manager should be aware of so it can be sure to pause rendering after completion of this pass to call any necessary callbacks.
    • I have few details on this at the moment, I am open to discussion on them.

Ok, so I'm in research and learn mode for a few days... I'll ask the required dumb and edge-exploration questions and raise issues as I go. If you see my understanding/focus/priorities as going the wrong way, please throw flags. Here's a few to get started:

  1. Given all the current and potential-future use cases, might this effort be better thought of as/named unified render-pass management, whether for post-processing, color-picking, deferred shaders, etc.? I have some conceptual difficulties e.g. calling color-picking or early passes of deferred shaders as "effects" or "post" processing, just because they are not the primary screen render pass.
  2. Will the AA manager interact directly with the new GL state at all? I'm guessing no, since EGL config is one-shot, and probably not included in the GL State properties, and that the software AA render passes will encapsulate any GL state updates. Do I have the right overall picture?
  3. I'm guessing the main reason to raise Vulkan integration as a potential goal is to try to keep all of the rendering details encapsulated in the specific render pass implementations (which would inevitably be impacted), so render pass management itself can better survive/support a migration to/integration with Vulkan? I would guess that the efficiency of Vulkan will enable/spawn new types of multi-pass strategies, and the goal here is to avoid building-in assumptions about how many/what kinds/orders of passes might be useful, while offering some reasonable pre-defined configurations for the common existing use cases.

Am I on the right page, at least?

  1. Absolutely. For an initial discussion I was trying to stick with existing nomenclature but you are 100% on track here. I think the term "pass" is appropriate in all cases it might be used, and in the case of post processing effects, so is "effect". I am not sure of what a good alternative for effect would be for non-post processing effect cases.
  2. Correct again. The only GL state updating the software passes are in binding their material/FBO as needed which the render pass manager is responsible for.
  3. Yes, sort of. To be honest, I have no idea if Vulkan integration will ever happen, but I am optimistic. Comparing Vulkan and Open GL is a bit like comparing a house cat and a lion. They are both felines, but their capabilities are nothing alike. Vulkan's memory model alone is exceptionally different, and then you throw in things like multi threaded support, no error checking, etc and things really start to change fast. Throwing Vulkan integration as a goal out there is more about giving a target that gets us to implement a good interface driven/modular design than anything. If we come up with something that eases the path to Vulkan, excellent. If all we get is a really modular OpenGL implementation, thats fantastic too. Assuming Vulkan does happen though, your assessment is correct...we want to avoid assumptions about number/kind/order and we want it to assume at a minimum a render to screen pass is needed (though in vulkan this isn't strictly true as everything is done to FBOs and you just tell the driver which FBO to display but we will assume it is for now). A more certain to be achieved goal that this does support though is integration with geometry/tessellation/compute shaders because particularly with compute shaders, there is no requirement that visible data be used/output. When looking at how deferred shading might work, environmental lighting is typically done as a compute shader pass, taking a rendered FBO as input, doing a shit load of lighting math/physics, then outputting a bunch of data that another render pass will use to compute colors. This would represent a pass that would output a data type other than to a FBO (and thus a GL 3 feature only).

I've prowled around the v2.0-development branch, and your overall approach is starting to make sense, even though it is work-in-progress. I was mostly interested in what's changing with Scene.render(), since that's where the render passes will originate. But it made me remember some additional time-critical render interactions in my app, notably 3D swipe/fling gestures. In a nutshell, these update scene objects in onDrawFrame(), since they are not animations, but they are similar in terms of glitch potential. There are also follow-on updates, as I use the resulting rotation angle of a swipe/fling to scale objects as well, by monitoring the orientation in a separately registered onDrawFrame(). So, in general, I guess interactivity is another performance concern with using separate threads for all scene updates...

we want to avoid assumptions about number/kind/order and we want it to assume at a minimum a render to screen pass is needed

I'm thinking maybe the assumption of a render-to-screen pass can be part of a default render-pass configuration, rather than hard-coded into the render-pass iteration logic itself. And both the input/output types of a render pass seem like they could be key abstractions for the passes themselves as well as for rules to configure/combine/sequence the passes for a scene (i.e. required input has to be available for a pass)? Maybe I'm still too ignorant of the details, where the devil always hides out somewhere, just brainstorming...

Frame callbacks are still available, as would the ability to override onDrawFrame in a custom scene implementation so those use cases should be fine.

As for what you are proposing, I am on board, so long as it doesnt explode into complexity. In particular, to a user, they should not need to worry about these details unless they really want to start fiddling with post processing/multipass rendering.

Frame callbacks are still available... so those use cases should be fine.

OK, cool. BTW, I really like the SceneGraph stuff. It seems like it cleans things up a bunch.

As for what you are proposing, I am on board, so long as it doesnt explode into complexity

I'll leave any enforcing of configuration rules for a later enhancement (if ever), but it seems like the render pass manager will need to know what to do with pass outputs, as it doesn't seem like a strict serial output-to-input-ending-at-the-screen-render assumption will cover everything. The structure of passes in a scene seems more like it should be a tree or maybe even a DAG, or maybe just multiple independent sequences, rather than one single sequence?

I really like the SceneGraph stuff. It seems like it cleans things up a bunch.

Thanks, that's the goal.

I would agree. While there is nothing in the library where this complication would arise, I hope that shortly after this rewrite we will be able to work on adding some of this. While I prefer the simplicity of just multiple independent sequences, I think we can't assume that to be the case. In general traversing a tree from leaf to trunk would do it, though I suspect there may be situations where branch ordering matters, which I guess leaves us with something to the effect of a DAG. Be dutiful in your research into a library to use for this, unless you have expertise already.

Be dutiful in your research into a library to use for this, unless you have expertise already.

Hmm. Hadn't really thought about the implementation, just about what the logical arrangement would be. Good suggestion anyway, I will try to avoid inventing wheels here or elsewhere. Right now, I'm kind of hoping/thinking that simply listing any prerequisite passes for a given pass will take care of it, and put the onus on the developer to make sure it is a legal/efficient graph (once we decide what that means)...

Right now, I'm kind of hoping/thinking that simply listing any prerequisite passes for a given pass will take care of it, and put the onus on the developer to make sure it is a legal/efficient graph (once we decide what that means)

I like it. So long as the automatic stuff life software AA doesn't require any thought on their part other than saying "yes, i want it" then I think it is completely reasonable to assume that other situations require the developer to not do something foolish.

I am trying hard to get things to a rendering state again. There is still a fair bit of work required to do this I'm afraid, but ill keep you in the loop. Right now I'm trying to get arbitrary frustum culling working right.