Optimize OpenGL Drawing

Question

Optimize OpenGL Drawing

kaveh808 opened this issue 2 years ago · comments

Use vertex arrays and the like to speed up the current naive drawing code in opengl.lisp.

Johannes Martinez Calzada · Answer 1 · Tue Sep 06 2022 10:18:30 GMT+0800 (China Standard Time)

it's been over 5 years since the last release of Opengl, there probably shouldn't be any reason to target anything but the latest. But since the writing is on the wall, perhaps some thought should be put into how to abstract over both opengl and vulkan. Though I'm thinking that might need somebody familiar with vulkan.
If not, then there's choosing an abstraction to handle modern gl or write yet another one.

Joel Boehland · Answer 2 · Tue Sep 06 2022 11:11:39 GMT+0800 (China Standard Time)

Regarding the need to keep our eyes on the next graphics API as @JMC-design was talking about: Piet-gpu is a good project to follow. They have a 2d/font focus, but they are pushing the envelope for doing as much of the compute for a UI on the GPU:
https://github.com/linebender/piet-gpu
project vision:
https://github.com/linebender/piet-gpu/blob/main/doc/vision.md
.. And Raph Levien has some fantastic articles about doing graphics/compute on modern GPU/gpu-apis:
https://github.com/linebender/piet-gpu/blob/main/doc/blogs.md

Johannes Martinez Calzada · Answer 3 · Tue Sep 06 2022 12:11:56 GMT+0800 (China Standard Time)

Might also want to set a bar for minimum gpu memory, I guess that's something that needs to be tracked, such a weird concept.
I've run across piet when looking for ideas on a rich text sort of api. I'm not sold on specifying ranges, though it is nice that it allows the text to be unmodified. I'm still leaning towards something I can read or write to a stream, so list of objects and lists that change attributes.

Johannes Martinez Calzada · Answer 4 · Tue Sep 06 2022 12:31:12 GMT+0800 (China Standard Time)

yet another opengl abstraction for lisp
https://github.com/jl2/simple-gl

Kaveh Kardan · Answer 5 · Wed Sep 07 2022 04:56:54 GMT+0800 (China Standard Time)

I am very keen to maximize use of the GPU as well as SIMD and multiple cores. I really want our system to be able to handle production-level datasets with the same (or better) speed as commercial packages.

How we architect this (improved OpenGL interface, Vulkan, compute on GPU) is something we should discuss.

If we do have a Vulkan enthusiast, a first step could be to implement the equivalent of the code in opengl.lisp.

Also, one of my goals is to develop a cross-platform GUI toolkit. Currently we're building it on OpenGL, using the text engine by @awolven and font rasterizer by @JMC-design .

Johannes Martinez Calzada · Answer 6 · Wed Sep 07 2022 21:27:01 GMT+0800 (China Standard Time)

So I've just drawn my first triangle using vertex arrays and here are some of my initial thoughts.
I'm assuming we'd like to fill buffers by just sending a list of points? What I've done for a test is just fill up a cl array, grab the vector-sap, and use that to fill buffers. With points we have to pack them. Do we pack into a cl array, pin and use, or just pack directly into a foreign array, and then free or keep the array around?
Does any packing we do into cl arrays have any effect on packing into simd packs?

Writing glsl in a string in a lisp buffer is a nightmare of formatting. In the long run it doesn't matter what a person uses to get a string for a shader program, but maybe there should be some default shader dsl, or formatting to make code and examples easier to read?

It'd seems like it might be nice to encapsulate these buffers into structs that can be passed around easily, then you have to build a bunch of functions to use those structs, and then years later you have cepl... or something similar. I wonder if anybody has made a comparison of the different layers on top of gl?

I'm not even sure if sbcl system pointers work the same way on windows or osx. So maybe packing directly into foreigns is required? And definitely so if any plans to support another implementation.
If anybody is interested this is the code I used to test. https://plaster.tymoon.eu/view/3408#3408 , just replace the surface:update with whatever your window needs to swap buffers.

Andrew Wolven · Answer 7 · Wed Sep 07 2022 23:54:23 GMT+0800 (China Standard Time)

You could look at the text engine to see how vertex arrays are used there. There seems to, however, at least on linux, been a change in the version of opengl used, rendering the text engine useless. I'm trying to fix it, but it would be nice to know if there is going to be a version change before I spend a lot of time targeting a specific version.

…

On Wed, Sep 7, 2022 at 8:27 AM Johannes Martinez Calzada < ***@***.***> wrote: So I've just drawn my first triangle using vertex arrays and here are some of my initial thoughts. I'm assuming we'd like to fill buffers by just sending a list of points? What I've done for a test is just fill up a cl array, grab the vector-sap, and use that to fill buffers. With points we have to pack them. Do we pack into a cl array, pin and use, or just pack directly into a foreign array, and then free or keep the array around? Does any packing we do into cl arrays have any effect on packing into simd packs? Writing glsl in a string in a lisp buffer is a nightmare of formatting. In the long run it doesn't matter what a person uses to get a string for a shader program, but maybe there should be some default shader dsl, or formatting to make code and examples easier to read? It'd seems like it might be nice to encapsulate these buffers into structs that can be passed around easily, then you have to build a bunch of functions to use those structs, and then years later you have cepl... or something similar. I wonder if anybody has made a comparison of the different layers on top of gl? I'm not even sure if sbcl system pointers work the same way on windows or osx. So maybe packing directly into foreigns is required? And definitely so if any plans to support another implementation. If anybody is interested this is the code I used to test. https://plaster.tymoon.eu/view/3408#3408 , just replace the surface:update with whatever your window needs to swap buffers. — Reply to this email directly, view it on GitHub <#90 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABGMMITZTTX64K4Z4MENP3V5CJ3BANCNFSM6AAAAAAQFI53LM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Johannes Martinez Calzada · Answer 8 · Thu Sep 08 2022 00:35:13 GMT+0800 (China Standard Time)

I tried, but it reads like c and I don't see any lispy abstraction. The only thing I see is direct writing of individual bytes to foreign memory.
I'm not bright enough to understand other languages.

Kaveh Kardan · Answer 9 · Thu Sep 08 2022 06:48:48 GMT+0800 (China Standard Time)

These are good questions, and there are a lot of moving parts on how we encode geometry: ease of editing in CL, optimized OpenGL display, for SIMD, for threads.

One possibility I have been mulling over is whether we should keep a low-level C representation which can act like an old school display list for our geometry classes. We would need to sync up the CL point arrays with these C-type vectors after modeling operations, which would be optimized for OpenGL and such.

Or we could have C-level structs for internal geometry, which we access and modify from GL. That might make CL editing a bit slower, but could result in faster rendering.

Johannes Martinez Calzada · Answer 10 · Fri Sep 09 2022 09:50:05 GMT+0800 (China Standard Time)

good eats
https://www.youtube.com/watch?v=K70QbvzB6II

Michaël Le Barbier · Answer 11 · Sat Sep 10 2022 17:17:08 GMT+0800 (China Standard Time)

I am very keen to maximize use of the GPU as well as SIMD and multiple cores. I really want our system to be able to handle production-level datasets with the same (or better) speed as commercial packages.

Does it include distributed computing as a goal? :-)

Kaveh Kardan · Answer 12 · Sat Sep 10 2022 17:39:35 GMT+0800 (China Standard Time)

Down the road, why not? :)

Deleted user · Answer 13 · Sat Sep 10 2022 21:21:10 GMT+0800 (China Standard Time)

Down the road, why not? :)

Because would be a 30MB SBCL runtime per node? I really wish there was something like MirageOS (which uses OCaml) for Common Lisp or Scheme.

Deleted user · Answer 14 · Sat Sep 10 2022 21:31:18 GMT+0800 (China Standard Time)

good eats

I'm a bit full from their 130 page slide deck on optimization. Looks like OpenGL 4.2+ only, which caused a stomach rumble. Sometimes I wonder, "Why can't we just implement OpenGL in pure Common Lisp and be done with it?"

Johannes Martinez Calzada · Answer 15 · Sat Sep 10 2022 22:54:08 GMT+0800 (China Standard Time)

I think the approach is still interesting. Today I'm going to try and test if it makes any difference packing arrays from different types of points, into cl arrays that are pinned and sent, as well as foreign arrays and sent.
In my brain it doesn't seem like there'd be much difference.
Besides un/packing structured bits to be sent is on my todo list, calling it pipeline. For use with a new CLX and wayland.
the thing with 4.2 is that 4.1 might have the same things just in extensions. Whether it's like that on Mac I don't know. That or maybe MGL isn't hard to install/use? I have no mac to test that.

Deleted user · Answer 16 · Sun Sep 11 2022 00:17:58 GMT+0800 (China Standard Time)

I think the approach is still interesting.

I agree, especially given the potential performance improvement. (I don't like vinegar on my salad, but wouldn't suggest other people shouldn't enjoy it, if you can tolerate one more food joke.) Thank you for posting the link and doing the testing.

I don't have a (capable enough) Mac to try it out on either, but if you do have success I wonder if it would help for you to post a simplified gist somewhere so someone who does could try it out.

Johannes Martinez Calzada · Answer 17 · Sun Sep 11 2022 01:12:03 GMT+0800 (China Standard Time)

Trying to come up with a good test for display as well. But so far, with just 333,333 points there's no time difference in packing cl arrays from either origin vectors or 3d-vector structs. From vectors uses slightly less cpu, but I probably need more points, since this is all taking ~0.004 seconds. .020 using generic functions.
submitting cl arrays to gl by pinning them and passing the pointer is, well, just passing a pointer. I guess I should probably through in some static-vectors stuff.

Johannes Martinez Calzada · Answer 18 · Sun Sep 11 2022 02:19:37 GMT+0800 (China Standard Time)

so here's just some basic testing. If you make smaller arrays then origin's lead widens. whether it's worth the trade off in not being able to dispatch on...
But the surprising thing is the foreign being slower. If we can depend on just using sbcl to send pointers then i'm not sure what the benefit is.

https://plaster.tymoon.eu/view/3413#3413

Kaveh Kardan · Answer 19 · Sun Sep 11 2022 03:09:15 GMT+0800 (China Standard Time)

Nice work. Is the cost of sending sbcl pointers and ffi arrays to OpenGL (and GPUs) the same?

On a slight tangent, should we bite the bullet and go with double-float as our default? Or is the performance hit a serious one?

Johannes Martinez Calzada · Answer 20 · Sun Sep 11 2022 04:15:36 GMT+0800 (China Standard Time)

i can't see why it would be different as they're both just pointers to memory. Unless being in sbcl's mem space somehow affects it. That's why I think an actual drawing test might elucidate further. at least just in terms of packing/repacking something over and over.

I don't know if I've been reading out dated stuff, but what I've seen is that lots of opengl drivers will just convert to single as their internal format. The support for doubles for gp compute is relatively new and requires above 4.1 and in some cases a new card. I've seen figures of half to 1/3 of performance of singles.
For anything like CAD I'd think a fixedpoint format would probably be better.

Andrew Wolven · Answer 21 · Tue Oct 11 2022 15:06:45 GMT+0800 (China Standard Time)

opengl is a foreign library

…

On Wed, Sep 7, 2022 at 11:35 AM Johannes Martinez Calzada < ***@***.***> wrote: I tried, but it reads like c and I don't see any lispy abstraction. The only thing I see is direct writing of individual bytes to foreign memory. — Reply to this email directly, view it on GitHub <#90 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABGMMOZD4LKQ66EUNCPNELV5C74ZANCNFSM6AAAAAAQFI53LM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Andrew Wolven · Answer 22 · Tue Oct 11 2022 17:28:42 GMT+0800 (China Standard Time)

I'm a vulkan enthusiast, but I have too much on my plate at this time to volunteer for porting opengl.lisp. I can provide vulkan bindings and some sample code on how to make triangles and triangle_strips of various colors render in vulkan, but I have the text engine to debug/extend and a whole host of other projects relating to other things. Perhaps someone who doesn't necessarily have vulkan experience could volunteer. Vulkan's not that hard. For the MacOS platform, I'm working on cl-metal using a objective-c bridge from fiddlerwaoroof. Vulkan does work on mac, but it doesn't support compute shaders yet, so I'm going the Metal route rather than wait on MoltenVK. Metal and Vulkan use different shading languages so it would be great if someone could work on a (possibly CEPL-based) lisp syntax that could be compiled to either GLSL 4.5 for vulkan or the Metal shading language.

…

On Tue, Sep 6, 2022 at 3:57 PM Kaveh Kardan ***@***.***> wrote: I am very keen to maximize use of the GPU as well as SIMD and multiple cores. I really want our system to be able to handle production-level datasets with the same (or better) speed as commercial packages. How we architect this (improved OpenGL interface, Vulkan, compute on GPU) is something we should discuss. If we do have a Vulkan enthusiast, a first step could be to implement the equivalent of the code in opengl.lisp. Also, one of my goals is to develop a cross-platform GUI toolkit. Currently we're building it on OpenGL, using the text engine by @awolven <https://github.com/awolven> and font rasterizer by @JMC-design <https://github.com/JMC-design> . — Reply to this email directly, view it on GitHub <#90 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABGMMOZCHMQGEROWZLALPDV46V2FANCNFSM6AAAAAAQFI53LM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Luke Gorrie · Answer 23 · Tue Oct 11 2022 19:17:09 GMT+0800 (China Standard Time)

Perhaps someone who doesn't necessarily have vulkan experience could volunteer.

I volunteer to make an attempt this month. What do I need to know to start off in the right direction? (Either in absolute terms or based on the tiny start I made in #109 a ways back.)

Théo Tyburn · Answer 24 · Sun Sep 24 2023 06:28:22 GMT+0800 (China Standard Time)

I'm interested in trying to write this. I will try to build on what @JMC-design has proposed and the text-rendering engine @awolven has written.

It would probably make sense to reuse parts of the code of the text-rendering engine. In order to do so I would have a lot of questions, since there are a lot of things I don't understand the purpose of - it seems like a pretty advanced implementation to me which take a lot of nitty details of OpenGL into consideration, am I right ?

Anyway, I'll start by proposing something and hopefully we can improve on it incremental after with your feedback.

Andrew Wolven · Answer 25 · Sun Sep 24 2023 08:44:32 GMT+0800 (China Standard Time)

The text rendering engine is a two-implementation immediate mode hack[s] to get Kaveh working with text. I say two implementations, because so long as Kaveh uses OpenGL 1.1 for the rest of Kons-9, macOS will be a different implementation than any "modern opengl" implementation used in Windows and Linux. This is because the opengl 2.1 implementation of macOS is not forward compatible with opengl 3+, unlike Windows and Linux. So there is an opengl 2.1 and an opengl 3.3 version of the text rendering engine. The least common denominator is opengl 2.1 and opengl 2.1 doesn't even use shader programming. So to borrow your term "modern opengl"... a modern opengl version of Kons-9 would require a rewrite of the logic in opengl.lisp at the minimum. This has been done before with vulkan, however Kaveh rejected the vulkan branch and continued to make changes to the main branch until the vulkan branch bit rotted. Kayomarz has volunteered to update the vulkan implementation, but only has weekends to work on it and has not posted any updates for that effort in some time. By modern opengl, I am assuming you are talking about opengl 3.3+. Kons-9 is in need of a proper graphics engine to make developer's lives easier and make the program scalable functionally, opengl or otherwise. A modern opengl implementation would be based on now decades old tech and would be essentially be reimplementing the logic of the vulkan engine (called "krma"), which can render thousands of text characters without so much as a blip in the frame rate unlike the immediate mode implementations currently in Kons-9. So if you want to upgrade kons-9 to a modern opengl version for GLSL programming and you have little OpenGL or Common Lisp experience, you would basically just be spinning your wheels...for a lot of reasons. First, this type of work takes knowledge, and second, if there is some kind of absolute insistence on using openGL instead of something newer like vulkan, you're better off adding that capability to krma and letting Kayomarz finish porting opengl.lisp to krma, which would allow for kons-9 to support both opengl and vulkan. As far as "modular", krma is modular and you can add and remove pipelines while the program is rendering. Furthermore, while krma fully supports the immediate mode rendering paradigm of kons-9, in the long run one will want to support retained mode paradigms, for performance, which is going to require somewhat of a reorganization of kons-9, unless you live in a cold cabin and need your PC to double as a toaster oven.

…

-Andrew Wolven

On Sat, Sep 23, 2023 at 5:28 PM Théo Tyburn ***@***.***> wrote: I'm interested in trying to write this. I will try to build on what @JMC-design <https://github.com/JMC-design> has proposed and the text-rendering engine @awolven <https://github.com/awolven> has written. It would probably make sense to reuse parts of the code of the text-rendering engine. In order to do so I would have a lot of questions, since there are a lot of things I don't understand the purpose of - it seems like a pretty advanced implementation to me which take a lot of nitty details of OpenGL into consideration, am I right ? Anyway, I'll start by proposing something and hopefully we can improve on it incremental after with your feedback. — Reply to this email directly, view it on GitHub <#90 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABGMMIIV6XNHIJ2SI3J47DX35PBFANCNFSM6AAAAAAQFI53LM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Deleted user · Answer 26 · Sun Sep 24 2023 09:52:39 GMT+0800 (China Standard Time)

unless you live in a cold cabin and need your PC to double as a toaster oven.

I used to render movies on my Mac Dual G4 only in the winter in Colorado, b/c it used nearly 1500W, like a hair dryer (which would have been quieter).

in the long run one will want to support retained mode paradigms

Retained mode caching in OpenGL-based scene graphs usually used "display lists". What method exists to do that now?

Théo Tyburn · Answer 27 · Sun Sep 24 2023 17:59:51 GMT+0800 (China Standard Time)

I see. I could also join the effort of porting kons-9 to krma then, if this makes more sense. I'm mostly interested in having a rendering engine I can understand and modify on the fly. If krma can fulfill this role, I'm in.

About the modularity of krma, how would you do things like offscreen rendering, multiple passes? How would you create and load custom pipelines? Having some simple examples would be nice.

Deleted user · Answer 28 · Sun Sep 24 2023 18:33:46 GMT+0800 (China Standard Time)

Kaveh rejected the vulkan branch and continued to make changes to the main branch until the vulkan branch bit rotted.

I have the feeling anything I say here is going to get me in trouble with someone. Adieu.

Théo Tyburn · Answer 29 · Sun Sep 24 2023 18:40:38 GMT+0800 (China Standard Time)

Could krma evolve to become something like CEPL for vulkan? Because that's in the end what I am looking for: a CL interface to a graphics API. Not just the bindings of course, but an interface that make programming OpenGL or Vulkan in CL more natural

Deleted user · Answer 30 · Sun Sep 24 2023 19:58:22 GMT+0800 (China Standard Time)

CEPL

+1

Adieu to this topic.