[Build] Remove large files from repository
mc-nv opened this issue · comments
Misha Chornyi commented
Describe the issue
Observing that repository checkout can consume a lot of time.
Due to model files examples stored in the repository.
Those files can be outdated to the project branch, but do increase checkout time for the repository.
Urgency
I would say it's urgent as it impact many users and also will block/impact the
#12081
Target platform
any
Build script
reproduce steps
$ git clone https://github.com/microsoft/onnxruntime.git
$ cd onxxruntime
$ git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | sed -n 's/^blob //p' | sort --numeric-sort --key=2 | cut -c 1-12,41- | $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
Error / output
...
4b82f1d9cd30 19MiB onnxruntime/python/tools/quantization/E2E_example_model/object_detection/trt/yolov3/annotations/instances_val2017.json
618e8a8acc50 20MiB orttraining/orttraining/models/bert_tiny/bert-tiny_1-layer_noloss.onnx
64d138c6d30a 20MiB onnxruntime/test/testdata/bert_toy_postprocessed.onnx
718b0d93c1a5 20MiB onnxruntime/test/testdata/bert_toy_postprocessed.onnx
67936bb7b3d2 20MiB onnxruntime/test/testdata/bert_toy_postprocessed.onnx
28d50361c4e2 20MiB onnxruntime/test/testdata/bert_toy_postprocessed.onnx
1dd70726be37 20MiB onnxruntime/test/testdata/bert_toy_postprocessed.onnx
295c165101f1 20MiB onnxruntime/test/testdata/bert_toy_postprocessed.onnx
bb0f72efd2ea 20MiB onnxruntime/test/testdata/bert_toy_postprocessed.onnx
150184ba7698 20MiB onnxruntime/test/testdata/transform/bert_toy_opset14.onnx
ba50963637c6 27MiB onnxruntime/test/testdata/bart_tiny.onnx
e22f27348f9a 31MiB onnx_test_runner_armv8a_flag.zip
070d5d4f066a 31MiB winml/test/scenario/models/coreml_Resnet50_ImageNet-dq.onnx
f3e82ef30be6 33MiB images/bert-excel.gif
2af37a459364 34MiB onnxruntime/python/tools/transformers/benchmark_autosuggest_LM/dlis/cublasLt64_10.dll
20bcfbfc2184 35MiB onnxruntime/test/testdata/ort_ckpt/bert_toy_lamb.ZeRO.1.3.ort.pt
a68114bc465d 35MiB onnxruntime/test/testdata/ort_ckpt/bert_toy_lamb.ZeRO.2.3.ort.pt
bf978748d4b5 35MiB onnxruntime/test/testdata/ort_ckpt/bert_toy_lamb.ZeRO.0.3.ort.pt
5f13eebf892e 39MiB onnxruntime/test/testdata/ort_ckpt/bert_toy_lamb.ZeRO.3.3.ort.pt
13678207f109 71MiB onnxruntime/python/tools/transformers/benchmark_autosuggest_LM/dlis/cublas64_10.dll
5afd272b5fff 74MiB onnxruntime/test/contrib_ops/qordered_python_test/my_model/const16_longformer.embeddings.word_embeddings.weight.npy
38e731a65948 87MiB test/ssd/ssd.onnx
4315eb99ea53 97MiB onnxruntime/python/tools/quantization/E2E_example_model/resnet50_v1.onnx
53ac9b3d567a 98MiB onnxruntime/python/tools/quantization/E2E_example_model/image_classification/cpu/resnet50-v1-9.onnx
bbed42bb5ea3 98MiB onnxruntime/python/tools/quantization/E2E_example_model/image_classification/cpu/resnet50-v1-13.onnx
Visual Studio Version
No response
GCC / Compiler Version
No response
George Wu commented
can you do a shallow clone (i.e. --depth 1) to reduce the time?
but agreed, we should do more on our side to avoid checking in large objects into the repo.
+@snnn @pranavsharma FYI