microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Home Page:https://onnxruntime.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Build] Remove large files from repository

mc-nv opened this issue · comments

Describe the issue

Observing that repository checkout can consume a lot of time.
Due to model files examples stored in the repository.
Those files can be outdated to the project branch, but do increase checkout time for the repository.

Urgency

I would say it's urgent as it impact many users and also will block/impact the
#12081

Target platform

any

Build script

reproduce steps

$ git clone https://github.com/microsoft/onnxruntime.git
$ cd onxxruntime
$ git rev-list --objects --all |   git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |   sed -n 's/^blob //p' |   sort --numeric-sort --key=2 |   cut -c 1-12,41- |   $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest

Error / output

...
4b82f1d9cd30   19MiB onnxruntime/python/tools/quantization/E2E_example_model/object_detection/trt/yolov3/annotations/instances_val2017.json
618e8a8acc50   20MiB orttraining/orttraining/models/bert_tiny/bert-tiny_1-layer_noloss.onnx
64d138c6d30a   20MiB onnxruntime/test/testdata/bert_toy_postprocessed.onnx
718b0d93c1a5   20MiB onnxruntime/test/testdata/bert_toy_postprocessed.onnx
67936bb7b3d2   20MiB onnxruntime/test/testdata/bert_toy_postprocessed.onnx
28d50361c4e2   20MiB onnxruntime/test/testdata/bert_toy_postprocessed.onnx
1dd70726be37   20MiB onnxruntime/test/testdata/bert_toy_postprocessed.onnx
295c165101f1   20MiB onnxruntime/test/testdata/bert_toy_postprocessed.onnx
bb0f72efd2ea   20MiB onnxruntime/test/testdata/bert_toy_postprocessed.onnx
150184ba7698   20MiB onnxruntime/test/testdata/transform/bert_toy_opset14.onnx
ba50963637c6   27MiB onnxruntime/test/testdata/bart_tiny.onnx
e22f27348f9a   31MiB onnx_test_runner_armv8a_flag.zip
070d5d4f066a   31MiB winml/test/scenario/models/coreml_Resnet50_ImageNet-dq.onnx
f3e82ef30be6   33MiB images/bert-excel.gif
2af37a459364   34MiB onnxruntime/python/tools/transformers/benchmark_autosuggest_LM/dlis/cublasLt64_10.dll
20bcfbfc2184   35MiB onnxruntime/test/testdata/ort_ckpt/bert_toy_lamb.ZeRO.1.3.ort.pt
a68114bc465d   35MiB onnxruntime/test/testdata/ort_ckpt/bert_toy_lamb.ZeRO.2.3.ort.pt
bf978748d4b5   35MiB onnxruntime/test/testdata/ort_ckpt/bert_toy_lamb.ZeRO.0.3.ort.pt
5f13eebf892e   39MiB onnxruntime/test/testdata/ort_ckpt/bert_toy_lamb.ZeRO.3.3.ort.pt
13678207f109   71MiB onnxruntime/python/tools/transformers/benchmark_autosuggest_LM/dlis/cublas64_10.dll
5afd272b5fff   74MiB onnxruntime/test/contrib_ops/qordered_python_test/my_model/const16_longformer.embeddings.word_embeddings.weight.npy
38e731a65948   87MiB test/ssd/ssd.onnx
4315eb99ea53   97MiB onnxruntime/python/tools/quantization/E2E_example_model/resnet50_v1.onnx
53ac9b3d567a   98MiB onnxruntime/python/tools/quantization/E2E_example_model/image_classification/cpu/resnet50-v1-9.onnx
bbed42bb5ea3   98MiB onnxruntime/python/tools/quantization/E2E_example_model/image_classification/cpu/resnet50-v1-13.onnx

Visual Studio Version

No response

GCC / Compiler Version

No response

can you do a shallow clone (i.e. --depth 1) to reduce the time?
but agreed, we should do more on our side to avoid checking in large objects into the repo.
+@snnn @pranavsharma FYI