AI-Generated Documentation Experiment for libcoro

Question

AI-Generated Documentation Experiment for libcoro

plops opened this issue a month ago · comments

I'm quite intrigued by libcoro and wanted to experiment with using a large language model (Gemini 1.5 Pro) to generate documentation for it.

The project seemed like a good fit for this experiment because:

It's small enough to fit entirely within Gemini's context window, allowing the AI to potentially grasp the full scope of the codebase.
The main source of documentation appears to be the README, and while there are comments in the code, it doesn't seem like there's a Doxygen setup for a detailed API reference.

My process was simple: I fed Gemini the entirety of the libcoro source code, examples, and tests (about 170k tokens). Then, I prompted it with instructions like "propose an outline" or "write section 4.1" and let it generate content. It took about an hour of copy and paste. The requests typically finished in 1 or 2 minutes. I assembled the results into markdown files in the doc/ folder of my fork: https://github.com/plops/libcoro/tree/main/doc

My primary goal wasn't to produce production-ready documentation. Instead, I was curious to see what the AI would come up with when given access to the entire codebase.

Here are some observations:

The generated content is surprisingly structured and coherent, including a nice PlantUML diagram.
It's likely riddled with inaccuracies and potentially useless examples.

I haven't thoroughly vetted the output, but I'm sharing it in case you find it interesting or potentially useful as a starting point for more formal documentation.

Let me know if you have any thoughts or questions about this experiment!

Josh Baldwin · Answer 1 · Sat Jun 22 2024 06:09:04 GMT+0800 (China Standard Time)

Hey @plops this is a really cool idea! Thanks for sharing. I will definitely dig in and see how it turned out when I get some time.

Curious to see how it compares to the hand written readme examples too.

Josh Baldwin · Answer 2 · Wed Jul 03 2024 05:21:44 GMT+0800 (China Standard Time)

I've got some time to try and read through some of these, I'll write up some notes / thoughts as I go through the generated docs.

Josh Baldwin · Answer 3 · Wed Jul 03 2024 05:55:00 GMT+0800 (China Standard Time)

31_tasks

coro::task<T> documentation is very low level and shows how to use task outside of the normal usages with sync_wait and executors (e.g. coro::thread_pool and coro:io_scheduler). This is maybe useful? But probably not how I would expect most users to end up using coro::task<T>.
The example usage of std::suspend_always{} I think should be discouraged since it doesn't work with libcoro's executors. I'm wondering if that is in my existing docs and it pulled it from there (in which case I should scrub it) or if it got that from std::coroutine code?

32_executors

coro::thread_pool awesome this is really easy to follow I think, the shutdown() isn't necessary and will be called automatically by the pool's destructor so the docs are a little off on this, its optionally available if a user wants to call it sooner. I wonder if this is in the wording I've used on the shutdown() function and should say its optional?
coro::io_scheduler the first example is pretty interesting... it makes an io_scheduler to process a task of polling an eventfd but then it resumes the task manually, probably something nobody would ever do, the io_scheduler should just drive the task from start to finish in this example.

33_syncprimitives

These examples are short and sweet and really get the point across, it seems to have done a better generation here.
Some interesting missing items in the examples thought like a comment saying Create and start the worker tasks... but doesn't show code for it.

... more to come, this is really cool though, I'm enjoying reading through what its generated...