git: Add support for sparse checkout
krish2718 opened this issue · comments
When west is used to pull in dependent files, currently we need to do a full repo clone, this is unnecessary and slow. If we can add git's sparse checkout feature we can extend west.yml
to take a list of files/folders with regex and only checkout those saving time and bandwidth.
E.g., Manifest project A, wants a specific shared header file in project B and nothing else.
Did you try --fetch-opt=...
? see #638
https://github.com/zephyrproject-rtos/west/issues?q=label%3Aperformance+
EDIT: also take a look at:
Refs
I haven't tried but looked at the docs, path
support to filter has been removed :
Note that the form --filter=sparse:path= that wants to read from an arbitrary path on the filesystem has been dropped for security reasons.
path support to filter has been removed
Bummer.
I was curious so I took at quick look at what this would take. This would be tricky because the sparse-checkout configuration is git repo specific and should happen between git fetch
and git checkout
inside west update
. So the sparse-checkout configuration would have to be preset in some configuration file, maybe in some new west config
section?
west
does not use git clone
for... other optimization reasons (!) so git clone --no-checkout
is not an option. git update-ref HEAD
is more or less equivalent to git checkout --no-checkout
(couldn't resist) so there's that.
this is unnecessary and slow. [...] saving time and bandwidth.
How much time saved? As with every other performance problem, you don't know until you have measured it. And then you only know for the particular cases you measured. According to https://github.blog/2020-01-17-bring-your-monorepo-down-to-size-with-sparse-checkout/ saves significant time only for large "monorepos". Is that your case?
#319 has some measurements. Please take a look at it and see how much other optimizations that are already available can help.
BTW: https://git-scm.com/docs/git-sparse-checkout
THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE.
For measuring/testing/prototyping here's a starting point:
--- a/src/west/app/project.py
+++ b/src/west/app/project.py
@@ -1228,7 +1228,7 @@ class Update(_ProjectCommand):
# out the new detached HEAD, then print some helpful context.
if take_stats:
start = perf_counter()
- project.git(['checkout', '--detach', sha])
+ # project.git(['checkout', '--detach', sha])
if take_stats:
stats['checkout new manifest-rev'] = perf_counter() - start
self.post_checkout_help(project, current_branch,
@@ -1505,7 +1505,7 @@ class Update(_ProjectCommand):
# it avoids a spammy detached HEAD warning from Git.
if take_stats:
start = perf_counter()
- project.git('checkout --detach ' + QUAL_MANIFEST_REV)
+ project.git('update-ref HEAD ' + QUAL_MANIFEST_REV)
if take_stats:
stats['checkout new manifest-rev'] = perf_counter() - start
This (tested) HACK stops west update
from checking out any code. After that you can use west foreach -h
to "manually" perform sparse checkouts.
Newcomer comment:
I started going through the "Getting started" today (Windows 10 + WSL2) and this step takes ages. I am expecting only the latest state of Zephyr repo would be required, but it looks like a deep clone.
$ west init ~/zephyrproject/
=== Initializing in /home/akauppi/zephyrproject
--- Cloning manifest repository from https://github.com/zephyrproject-rtos/zephyr
Cloning into '/home/akauppi/zephyrproject/.west/manifest-tmp'...
remote: Enumerating objects: 958200, done.
remote: Counting objects: 100% (24324/24324), done.
remote: Compressing objects: 100% (1216/1216), done.
Receiving objects: 4% (43854/958200), 13.62 MiB | 176.00 KiB/s
Receiving objects: 4% (44464/958200), 13.73 MiB | 178.00 KiB/s
Receiving objects: 45% (437575/958200), 369.92 MiB | 210.00 KiB/s
...
There are likely two separate concerns: getting people onboarded fast (not happening to me, at least; the clone is still ongoing...) and the west update
. Is this the right place for the comment?
Edit: There was something wrong with my WLAN. Was able to raise the 176 KiB/s to ~2 Mbps but that's missing the point. Still time-and-space taking, to get started.. 1h gone
With some simple optimizations, the SOF CI clones the zephyr repo and a few others from scratch for every SOF PR in in 40s:
https://github.com/thesofproject/sof/actions/workflows/daily-tests.yml
https://github.com/thesofproject/sof/actions/runs/7080442491/job/19268340897
Is this the right place for the comment?
There are many different optimizations possible and many are not mutually exclusive. Here's a tentative list:
https://github.com/zephyrproject-rtos/west/issues?q=+label%3Aperformance+
None is perfect which is why none is enabled by default.
Was able to raise the 176 KiB/s to ~2 Mbps but that's missing the point. Still time-and-space taking, to get started.. 1h gone
2Mb/s does not qualify as "broadband". Without any optimization, cloning everything from scratch typically takes 5 minutes max for me.