Allow to use local cached repository to speed up build
inochisa opened this issue · comments
Although this script uses a shallow clone to reduce data transfer. It still needs to clone lots of data before building. If we use a local cached repository, it will reduce data usage as only delta needs to be fetched.
I have already create a prototype patch that allow to fetch repository from local, but it's hard for me to replace the version checking. I hope this can inspire you.
diff --git a/qbittorrent-nox-static.sh b/qbittorrent-nox-static.sh
index c47bc1d..b3a2025 100644
--- a/qbittorrent-nox-static.sh
+++ b/qbittorrent-nox-static.sh
@@ -891,9 +891,14 @@ download_folder() {
folder_name="${qbt_install_dir}/${1}"
folder_inc="${qbt_install_dir}/include/${1}"
[[ -d "${folder_name}" ]] && rm -rf "${folder_name}"
- [[ "${1}" == 'libtorrent' && -d "${folder_inc}" ]] && rm -rf "${folder_inc}"
- git config --global advice.detachedHead false
- _cmd git clone --no-tags --single-branch --branch "${!github_tag}" --shallow-submodules --recurse-submodules -j"$(nproc)" --depth 1 "${url_github}" "${folder_name}"
+ if [[ -d "${folder_name}" ]]; then
+ _cmd git -C "${folder_name}" pull --all -p
+ else
+ [[ "${1}" == 'libtorrent' && -d "${folder_inc}" ]] && rm -rf "${folder_inc}"
+ git config --global advice.detachedHead false
+ # _cmd git clone --no-tags --single-branch --branch "${!github_tag}" --shallow-submodules --recurse-submodules -j"$(nproc)" --depth 1 "${url_github}" "${folder_name}"
+ _cmd git clone --no-tags --single-branch --branch "${!github_tag}" --shallow-submodules --recurse-submodules -j"$(nproc)" --depth 1 "file://${qbt_working_dir}/source/${1}" "${folder_name}"
+ fi
mkdir -p "${folder_name}${subdir}"
[[ -d "${folder_name}${subdir}" ]] && _cd "${folder_name}${subdir}"
echo "${2}" > "${qbt_install_dir}/logs/${1}_github_url.log"
You don't need to do this this. Just use the workflow files.
https://userdocs.github.io/qbittorrent-nox-static/#/build-help?id=env-settings
export qbt_workflow_files=yes ./qbittorrent-nox-static.sh
And it will just use this https://github.com/userdocs/qbt-workflow-files/releases/tag/3825212312351213117721321308181012182085158642452
Thanks, but this is not what I mean.
At most time, it is hard for me to access github so I setup a mirror to collect this source. Meanwhile, I prefer to use git repository since I usually compile this code from the master branch of the mirror. So it's hard for me to just use artifacts.
I understand what you're trying to do. I'll need to think about it.
Thanks, waiting for your reply.
Which platform are you building on?
Do you cache all dependencies?
Which platform are you building on?
Usually x86_64, native build.
Do you cache all dependencies?
I cache all dependencies except the boost (its submodules are too many). Most of dependencies are fetched from mainline repository.
Which OS?
Alpine Linux in a systemd-container.
Ok, thanks.
So I don't want to be responsible for the downloading of cached stuff. What I might do is expand an existing feature of the script. Since I can already do caching with workflow archives + artifacts
The version issue falls under the download issue as the point of caching is the script does not download anything that's already there.
So with a git repo i could probably cd in master and checkout the branch to solve it.
cd libtorrent
# git checkout "$(git tag -l --sort=-v:refname "v2*" | head -n 1)" # always checkout the latest release of libtorrent v2
# git checkout "$(git tag -l --sort=-v:refname "v1*" | head -n 1)" # always checkout the latest release of libtorrent v1
git checkout RC_2_0
With an archive it's more complicated.
So can do something simple for this but mostly expect the end user to organise their cached files properly.
So I don't want to be responsible for the downloading of cached stuff. What I might do is expand an existing feature of the script. Since I can already do caching with workflow archives + artifacts
Of course, The cache should be setup by the user.
So with a git repo i could probably cd in master and checkout the branch to solve it.
I think it's OK to just clone it to the build dir as the script already did. This allows mirror to use bare repository and we can minimize the code change. Moreover, just checkout may pollute the existing tree. The clone code may like this
git clone --no-tags --single-branch --branch "${!github_tag}" --shallow-submodules --recurse-submodules ... $mirror/libtorrent libtorrent
which is alreay in this script
With an archive it's more complicated.
I think archive can leave as is.
Can you try this attempt and see how it goes
https://github.com/userdocs/qbittorrent-nox-static/pull/112/files
You must provide a path to the cache dir and an env or a switch
export qbt_cache_dir=~/path
# or as a switch
-cd ~/path
You must name the folders in the cache dir the same as the module names.
~/path/libtorrent
~/path/qbittorrent
Thanks, I have test this. Here is my results
- I compiled success without caching
iconv
,qttools
. - I need to patch the
icu
module of this script, since it has wrong work path and git tag. iconv
is hard to directly cached. I suggest remove caching this.- I also get error while cloning
qttools
, since it can not fetch its submodules during building. Maybe my proxy has some errors.
Here is my patch
diff --git a/qbittorrent-nox-static.sh b/qbittorrent-nox-static.sh
index 5487bf6..9085d35 100644
--- a/qbittorrent-nox-static.sh
+++ b/qbittorrent-nox-static.sh
@@ -607,7 +607,8 @@ set_module_urls() {
iconv_github_tag="$(git_git ls-remote -q -t --refs https://git.savannah.gnu.org/git/libiconv.git | awk '{sub("refs/tags/", "");sub("(.*)(-[^0-9].*)(.*)", ""); print $2 }' | awk '!/^$/' | sort -rV | head -n 1)"
iconv_url="https://ftp.gnu.org/gnu/libiconv/$(grep -Eo 'libiconv-([0-9]{1,3}[.]?)([0-9]{1,3}[.]?)([0-9]{1,3}?)\.tar.gz' <(curl https://ftp.gnu.org/gnu/libiconv/) | sort -V | tail -1)"
- icu_github_tag="$(git_git ls-remote -q -t --refs https://github.com/unicode-org/icu.git | awk '/\/release-/{sub("refs/tags/release-", "");sub("(.*)(-[^0-9].*)(.*)", ""); print $2 }' | awk '!/^$/' | sort -rV | head -n 1)"
+ icu_version_tag="$(git_git ls-remote -q -t --refs https://github.com/unicode-org/icu.git | awk '/\/release-/{sub("refs/tags/release-", "");sub("(.*)(-[^0-9].*)(.*)", ""); print $2 }' | awk '!/^$/' | sort -rV | head -n 1)"
+ icu_github_tag=release-${icu_version_tag}
icu_url="https://github.com/unicode-org/icu/releases/download/release-${icu_github_tag}/icu4c-${icu_github_tag/-/_}-src.tgz"
double_conversion_github_tag="$(git_git ls-remote -q -t --refs https://github.com/google/double-conversion.git | awk '/v/{sub("refs/tags/", "");sub("(.*)(v6|rc|alpha|beta)(.*)", ""); print $2 }' | awk '!/^$/' | sort -rV | head -n1)"
@@ -2056,7 +2057,7 @@ if [[ "${!app_name_skip:-yes}" == 'no' || "${1}" == "${app_name}" ]]; then
custom_flags_reset
if [[ -n "${qbt_cache_dir}" && -d "${qbt_cache_dir}/${app_name}" ]]; then
- download_folder "${app_name}" "${!app_github_url}" "/source"
+ download_folder "${app_name}" "${!app_github_url}" "/icu4c/source"
else
download_file "${app_name}" "${!app_url}" "/source"
fi
it will not use missing folders. so if you don't cache iconv it won't fail and just default to normal non cached mode.
You said you cached all except boost, but me not caching iconv is the same you not caching it.
I'll take at look at icu.
it will not use missing folders. so if you don't cache iconv it won't fail and just default to normal non cached mode.
You said you cached all except boost, but me not caching iconv is the same you not caching it.
I just test it to check it is workable. So forget it.
I'll take at look at icu.
Thanks
i think there is some room for me to tweak when iconv is activated to begin with. I think i can make is skipped 100% for any qt6 + RC_2_0 combos.
I'll have to double check incase there was one app that needs it outside them
But inconv should have worked if you cloned their git? unless it have no configure file. i'll need to double check it later.
I'll have to double check incase there was one app that needs it outside them
The repo has submodule, maybe usefull for you
- openssl
- boost
- libtorrent
- qttools
But inconv should have worked if you cloned their git? unless it have no configure file. i'll need to double check it later.
The git repo has no configure, so the only way is to use autogen.sh. but the autogen need to define GNULIB_SRCDIR
.
iconv and icu from cache should be fixed.
So I probably hit some of the same obstacles you mentioned and it made me think to change the approach slightly.
I used this bootstrap script to cache files on a directory relative to where it runs from cache_dir="$(pwd)/cache_dir"
This git clones all dependencies and the submodules for the tags specified. Currently they are dynamically set like the main script.
Now you can't clone boost a second time as you get submodule errors. So i was just thinking why bother to clone. Just copy them.
I already have them fully clone, i just need to copy it where i need it. Doing this solved the boost issue. Like this.
if [[ -n "${qbt_cache_dir}" && -d "${qbt_cache_dir}/${1}" ]]; then
cp -rf "${qbt_cache_dir}/${1}"/. "${folder_name}"
else
_cmd git clone --no-tags --single-branch --branch "${!github_tag}" --shallow-submodules --recurse-submodules -j"$(nproc)" --depth 1 "${github_url}" "${folder_name}"
fi
iconv and icu from cache should be fixed.
Yes, I confirmed this fixed.
So I probably hit some of the same obstacles you mentioned and it made me think to change the approach slightly.
I used this bootstrap script to cache files on a directory relative to where it runs from
cache_dir="$(pwd)/cache_dir"
This git clones all dependencies and the submodules for the tags specified. Currently they are dynamically set like the main script.
Now you can't clone boost a second time as you get submodule errors. So i was just thinking why bother to clone. Just copy them.
I already have them fully clone, i just need to copy it where i need it. Doing this solved the boost issue. Like this.
if [[ -n "${qbt_cache_dir}" && -d "${qbt_cache_dir}/${1}" ]]; then cp -rf "${qbt_cache_dir}/${1}"/. "${folder_name}" else _cmd git clone --no-tags --single-branch --branch "${!github_tag}" --shallow-submodules --recurse-submodules -j"$(nproc)" --depth 1 "${github_url}" "${folder_name}" fi
It seems we need to find a new way to keep the build space clean and workable. I prefer to preserve the git clone
, since we don't know the actual reason.
I think we need to find a new way to handle these dependencies and make the build process modular and easy to change. But this is another new issue, and things related this is just fine. So feel free to merge the pull request and close this issue. I will open a new issue when I find a better way.
They are all cloned to a dir as git repositories. You manage these by their upstream..
All this build script needs is a copy of that. Using git to clone the clone make no real special sense here. You can make a new branch in your cache and it will just copy this, build it.
The cloning the clone is what presents the issue. I have already cloned everything + submodules i don't need to clone it again. This way it takec4 seconds to copy boost to the build dir.
This is the cleanest way. You manage the cache however you want (new branches) and it get copied based on what branch you have set.
The following tweak would be to add patching to all modules then the whole thing is conceptually resolved by copying and having patches in the patch dir.
OK, this is sensible and seems like a better way. But please add a note to notify the cloned dir should keep clean, or some mistakes may happen.
So lets consider this script the baseline
- It can clone or remove the clone.
- update repos based on tags used or clones new and backs up old folder.
I will tweak the PR to this script. So you can test the same thing when i push the changes.
So as a stage two to this local cache feature (which I appreciate is valuable to people with bandwidth/connection limitations) If I add the full module patching in the same way we patch qbit and libtorrent they don't have to be careful with the cache all they need to do is place the patch in the path folder like patches/icu/72_1
and they are good to go.
So lets consider this script the baseline
* It can clone or remove the clone. * update repos based on tags used or clones new and backs up old folder.
I will tweak the PR to this script. So you can test the same thing when i push the changes.
I have already test this script and it works well. There are some I find to change
qtbase
upstream:https://code.qt.io/qt/qtbase.git
qttools
upstream:https://code.qt.io/qt/qttools.git
Another suggestion is git_tags_array
and git_tags_array
and the urls in the script need to be manage together, as I mention before. May use a json and use jq
to parse this. And the github bot can provide automatic update.
So as a stage two to this local cache feature (which I appreciate is valuable to people with bandwidth/connection limitations) If I add the full module patching in the same way we patch qbit and libtorrent they don't have to be careful with the cache all they need to do is place the patch in the path folder like
patches/icu/72_1
and they are good to go.
A good idea.
I decided to refactor the URL section, which was really required to do this properly. It's quite a significant change that I won't go into detail on, other than all that info is now in associative arrays instead of variables.
You can see it by doing:
./qbittorrent-nox-static.sh -sdu
It's not 100% done yet, needs some tweaking and integrations fixed but it mostly works for testing here.
To use cache as a built in you do this
-cd PATH option
The options are
rm
- remove cache dir and exit.
bs
- download the modules and exit. all
or zlib
for example.
Passed URL are relative or full.
my_cache
will be made relative to the script dir.
~/my_cache
will be in the home of the user.
To just download everything but not build.
./qbittorrent-nox-static.sh all -cd my_cache bs
To update git dirs and build
./qbittorrent-nox-static.sh all -cd my_cache
This should be work with no errors.
@inochisa Can you try it now.
So, I am getting close to finalising these changes and right now it works like this
This will cache all modules.
./qbittorrent-nox-static.sh all -cd my_cache_dir bs
This will cache just the ICU module.
./qbittorrent-nox-static.sh icu -cd my_cache_dir bs
To build (drop bs
)
./qbittorrent-nox-static.sh all -cd my_cache_dir
Now it will only copy from the cache if it exists and use the provided or default method otherwise.
So lets say i use this, it will pull and switch the cache branch from default to v1.2.18
and this becomes the cache default.
./qbittorrent-nox-static.sh libtorrent v1.2.18 -cd cache bs
All modules should support patching now, bootstrap the script to get the folders.
./qbittorrent-nox-static.sh libtorrent -bs
You will see this
● Script version: 1.0.6
● Using the defaults, these directories have been created:
~/qbt-build/patches/zlib/1.2.13
~/qbt-build/patches/iconv/1.17
~/qbt-build/patches/icu/72-1
~/qbt-build/patches/openssl/3.1.0
~/qbt-build/patches/boost/1.81.0
~/qbt-build/patches/libtorrent/2.0.8
~/qbt-build/patches/double_conversion/3.2.1
~/qbt-build/patches/qtbase/5.15.8
~/qbt-build/patches/qttools/5.15.8
~/qbt-build/patches/qbittorrent/4.5.2
So you can manage the cache and provide patches
Still bug fixing until the refactoring is complete but it should be working, mostly.
@userdocs I have already test this script. Here are my suggestions.
- The branch checkout need to properly handled if use different tags or branches. For example, the following is the result if I checkout tag in a clone with master branch.
● qbittorrent - Updating directory /build/qbittorrent-nox-static/sources/qbittorrent
remote: Enumerating objects: 310, done.
remote: Counting objects: 100% (310/310), done.
remote: Compressing objects: 100% (158/158), done.
remote: Total 162 (delta 147), reused 7 (delta 2), pack-reused 0
Receiving objects: 100% (162/162), 82.12 KiB | 243.00 KiB/s, done.
Resolving deltas: 100% (147/147), completed with 141 local objects.
From https://github.com/qbittorrent/qBittorrent
* tag release-4.5.2 -> FETCH_HEAD
error: pathspec 'release-4.5.2' did not match any file(s) known to git
- The command switch
-cd
should throw error when the second argument is notbs
orrm
or null. This should avoid some undesirable results if one add a switch to the end.
1: should be fixed
2: I don't really understand this as if the 3rd is not bs
or rm
it's ignored. Unless there some example i'm missing?
2: should also be fixed now.
Missed some issues with 1 that should be sorted.
I can clone to branch then update branch or switch to new tag or branch.
I can clone to tag then update branch or switch to new tag or branch.
Did this for both libtorrent and qbittorrent for current commit version
1: should be fixed
./qbittorrent-nox-static.sh qbittorrent release-4.5.2 -p http://127.0.0.1:11081 -i -s -cd "${PWD}/sources" bs
Failed
./qbittorrent-nox-static.sh qbittorrent -p http://127.0.0.1:11081 -i -s -cd "${PWD}/sources" bs
Success
# Run after the previous
./qbittorrent-nox-static.sh qbittorrent -p http://127.0.0.1:11081 -m -i -s -cd "${PWD}/sources" bs
# get
remote: Enumerating objects: 19, done.
remote: Counting objects: 100% (18/18), done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 7 (delta 6), reused 1 (delta 0), pack-reused 0
Unpacking objects: 100% (7/7), 1.16 KiB | 148.00 KiB/s, done.
From https://github.com/qbittorrent/qBittorrent
! [rejected] master -> master (non-fast-forward)
+ d41a778...a450a7c master -> origin/master (forced update)
Previous HEAD position was 97853f3 Bump to 4.5.2
Switched to branch 'master'
Your branch and 'origin/master' have diverged,
and have 1 and 1 different commits each, respectively.
(use "git pull" to merge the remote branch into yours)
Need to use git reset --hard origin/${branch_name}
to fix this conflict.
I suggest use git fetch
to update the branch and use git reset
to force checkout to avoid the rebase of upstream.
2: should also be fixed now.
confirmed, test with no problem.
the first one will fail since you did not provide the switch -qt release-4.5.2
I'll have to look at the second issue as I am not actually modifying any files in the git and if i do your commands on a fresh cache i have no issue
the first one will fail since you did not provide the switch
-qt release-4.5.2
OK, I forgot this
I'll have to look at the second issue as I am not actually modifying any files in the git and if i do your commands on a fresh cache i have no issue
The state of my repo
- Already used for building
- switch to tag
release-4.5.2
and switch to the master back
And I confirmed git reset --hard origin/master
is worked for this problem.
so right now cache works for folders (git) by default but also workflow archives. Basically, if you use a cache, it either downloads to it or copies/extracts from it.
Like this will use archives or download them, from or to the cache dir.
./qbittorrent-nox-static.sh all -cd cache -wf
I have also made a undocumented tweak to the patch function that will copy files like a mirror from the patches/app_name/version/source
So you can store modified files in the same structure as the source as it will copy them in and overwrite.
I'm not 100% sure about the git reset thing as i don't think you should be modifying the cache.
Sorry for this late reply, I have tested the new code.
The broken branch still existed after switching branch. But the other I test it is quite OK. I am not sure why this exists as I did not change anything. Anyway, I always use master so this is acceptable for me.
The most bandwidth efficient method is now to use workflow files -cd cache -wf
as this downloads all xz
compressed files.
I have redone the things like
crossbuild toolchains to 70% reduced in compressed size
ninja is prebuilt and under 2Mb in size
With the overall changes, the way i see this done best is to not modify the cache files directly. I'm not going deep like hash checking them or anything but if they are left alone we should use the build_dir/patches
directory to manage patches.
Since we can now patch all apps, and also clone source code from the patch/source
directory it's better to have a system to manage you change and copy them here, even if you use the github folders for cached dir.
So i want to take the approach that i don't change the cache dir files unless we are bootstrapping, then i assume you want to remove them and redownload.
The final changes i've been making are more related to the workflows and CI but i think the main script changes are complete and 99% bug tested.
Yes, I agree. As everything is workable, I think it's OK to close this issue and merge the code.
The code is pretty good, thanks.