JuliaGPU / CUDA.jl

CUDA programming in Julia.

Home Page:https://juliagpu.org/cuda/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multiplying `CuSparseMatrixCSC` by `CuMatrix` results in `Out of GPU memory`

lpawela opened this issue · comments

Sanity checks (read this first, then remove this section)

  • Make sure you're reporting a bug; for general questions, please use Discourse or
    Slack.

  • If you're dealing with a performance issue, make sure you disable scalar iteration
    (CUDA.allowscalar(false)). Only file an issue if that shows scalar iteration happening
    in CUDA.jl or Base Julia, as opposed to your own code.

  • If you're seeing an error message, follow the error message instructions, if any
    (e.g. inspect code with @device_code_warntype). If you can't solve the problem using
    that information, make sure to post it as part of the issue.

  • Always ensure you're using the latest version of CUDA.jl, and if possible, please
    check the master branch to see if your issue hasn't been resolved yet.

If your bug is still valid, please go ahead and fill out the template below.

Describe the bug

Some multiplications with dense and sparse matrices result in

ERROR: Out of GPU memory trying to allocate 113.295 TiB
Effective GPU memory usage: 19.63% (1.916 GiB/9.759 GiB)
Memory pool usage: 72 bytes (32.000 MiB reserved)

Stacktrace:
  [1] macro expansion
    @ ~/.julia/packages/CUDA/35NC6/src/pool.jl:435 [inlined]
  [2] macro expansion
    @ ./timing.jl:395 [inlined]
  [3] #_alloc#991
    @ ~/.julia/packages/CUDA/35NC6/src/pool.jl:424 [inlined]
  [4] _alloc
    @ ~/.julia/packages/CUDA/35NC6/src/pool.jl:419 [inlined]
  [5] #alloc#990
    @ ~/.julia/packages/CUDA/35NC6/src/pool.jl:409 [inlined]
  [6] alloc
    @ ~/.julia/packages/CUDA/35NC6/src/pool.jl:403 [inlined]
  [7] CuArray{UInt8, 1, CUDA.Mem.DeviceBuffer}(::UndefInitializer, dims::Tuple{Int64})
    @ CUDA ~/.julia/packages/CUDA/35NC6/src/array.jl:93
  [8] CuArray
    @ ~/.julia/packages/CUDA/35NC6/src/array.jl:176 [inlined]
  [9] CuArray
    @ ~/.julia/packages/CUDA/35NC6/src/array.jl:183 [inlined]
 [10] with_workspace(f::CUDA.CUSPARSE.var"#1330#1332"{}, eltyp::Type{…}, size::CUDA.CUSPARSE.var"#bufferSize#1331"{}, fallback::Nothing; keep::Bool)
    @ CUDA.APIUtils ~/.julia/packages/CUDA/35NC6/lib/utils/call.jl:67
 [11] with_workspace
    @ ~/.julia/packages/CUDA/35NC6/lib/utils/call.jl:58 [inlined]
 [12] with_workspace (repeats 2 times)
    @ ~/.julia/packages/CUDA/35NC6/lib/utils/call.jl:55 [inlined]
 [13] mm!(transa::Char, transb::Char, alpha::Bool, A::CUDA.CUSPARSE.CuSparseMatrixCSC{…}, B::CuArray{…}, beta::Bool, C::CuArray{…}, index::Char, algo::CUDA.CUSPARSE.cusparseSpMMAlg_t)
    @ CUDA.CUSPARSE ~/.julia/packages/CUDA/35NC6/lib/cusparse/generic.jl:237
 [14] mm!
    @ ~/.julia/packages/CUDA/35NC6/lib/cusparse/generic.jl:197 [inlined]
 [15] mm_wrapper(transa::Char, transb::Char, alpha::Bool, A::CUDA.CUSPARSE.CuSparseMatrixCSC{Float64, Int32}, B::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, beta::Bool, C::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer})
    @ CUDA.CUSPARSE ~/.julia/packages/CUDA/35NC6/lib/cusparse/interfaces.jl:46
 [16] generic_matmatmul!(C::CuArray{…}, tA::Char, tB::Char, A::CUDA.CUSPARSE.CuSparseMatrixCSC{…}, B::CuArray{…}, _add::LinearAlgebra.MulAddMul{…})
    @ CUDA.CUSPARSE ~/.julia/packages/CUDA/35NC6/lib/cusparse/interfaces.jl:76
 [17] mul!
    @ ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:263 [inlined]
 [18] mul!
    @ ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:237 [inlined]
 [19] *(A::CUDA.CUSPARSE.CuSparseMatrixCSC{Float64, Int32}, B::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer})
    @ LinearAlgebra ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:106
 [20] top-level scope
    @ REPL[17]:1
 [21] top-level scope
    @ ~/.julia/packages/CUDA/35NC6/src/initialization.jl:190

To reproduce

The Minimal Working Example (MWE) for this bug:

using CUDA
using SparseArrays

dense32 = CUDA.rand(1, 1)
sparse32csc = cu(sprand(Float32, 1, 1, 1.))

dense64 = CUDA.rand(Float64, 1, 1)
sparse64csc = CUSPARSE.CuSparseMatrixCSC{Float64, Int32}(sparse32csc.colPtr, sparse32csc.rowVal, sparse32csc.nzVal, (1, 1))

sparse64csr = CUSPARSE.CuSparseMatrixCSR(dense64)

sparse32csc * dense32 # ERROR
dense32 * sparse32csc # NO ERROR
(sparse32csc' * dense32')' # ERROR

sparse64csc * dense64 # ERROR
dense64 * sparse64csc # NO ERROR
(dense64' * sparse64csc')' # NO ERROR

sparse64csr * dense64 # NO ERROR
dense64 * sparse64csr # ERROR
(sparse64csr' * dense64')' # NO ERROR
Manifest.toml

# This file is machine-generated - editing it directly is not advised

julia_version = "1.10.1"
manifest_format = "2.0"
project_hash = "aa98b1f95df17ebf5569f91383899271dde8d36d"

[[deps.AbstractFFTs]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "d92ad398961a3ed262d8bf04a1a2b8340f915fef"
uuid = "621f4979-c628-5d54-868e-fcf4e3e8185c"
version = "1.5.0"
weakdeps = ["ChainRulesCore", "Test"]

    [deps.AbstractFFTs.extensions]
    AbstractFFTsChainRulesCoreExt = "ChainRulesCore"
    AbstractFFTsTestExt = "Test"

[[deps.Adapt]]
deps = ["LinearAlgebra", "Requires"]
git-tree-sha1 = "cde29ddf7e5726c9fb511f340244ea3481267608"
uuid = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
version = "3.7.2"
weakdeps = ["StaticArrays"]

    [deps.Adapt.extensions]
    AdaptStaticArraysExt = "StaticArrays"

[[deps.ArgTools]]
uuid = "0dad84c5-d112-42e6-8d28-ef12dabb789f"
version = "1.1.1"

[[deps.ArnoldiMethod]]
deps = ["LinearAlgebra", "Random", "StaticArrays"]
git-tree-sha1 = "62e51b39331de8911e4a7ff6f5aaf38a5f4cc0ae"
uuid = "ec485272-7323-5ecc-a04f-4719b315124d"
version = "0.2.0"

[[deps.Artifacts]]
uuid = "56f22d72-fd6d-98f1-02f0-08ddc0907c33"

[[deps.Atomix]]
deps = ["UnsafeAtomics"]
git-tree-sha1 = "c06a868224ecba914baa6942988e2f2aade419be"
uuid = "a9b6321e-bd34-4604-b9c9-b65b8de01458"
version = "0.1.0"

[[deps.AutoHashEquals]]
git-tree-sha1 = "45bb6705d93be619b81451bb2006b7ee5d4e4453"
uuid = "15f4f7f2-30c1-5605-9d31-71845cf9641f"
version = "0.2.0"

[[deps.BFloat16s]]
deps = ["LinearAlgebra", "Printf", "Random", "Test"]
git-tree-sha1 = "dbf84058d0a8cbbadee18d25cf606934b22d7c66"
uuid = "ab4f0b2a-ad5b-11e8-123f-65d77653426b"
version = "0.4.2"

[[deps.Base64]]
uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"

[[deps.BitFlags]]
git-tree-sha1 = "2dc09997850d68179b69dafb58ae806167a32b1b"
uuid = "d1d4a3ce-64b1-5f1a-9ba4-7e7e69966f35"
version = "0.1.8"

[[deps.CEnum]]
git-tree-sha1 = "eb4cb44a499229b3b8426dcfb5dd85333951ff90"
uuid = "fa961155-64e5-5f13-b03f-caf6b980ea82"
version = "0.4.2"

[[deps.CSTParser]]
deps = ["Tokenize"]
git-tree-sha1 = "b544d62417a99d091c569b95109bc9d8c223e9e3"
uuid = "00ebfdb7-1f24-5e51-bd34-a7502290713f"
version = "3.4.2"

[[deps.CSV]]
deps = ["Dates", "Mmap", "Parsers", "PooledArrays", "SentinelArrays", "Tables", "Unicode"]
git-tree-sha1 = "b83aa3f513be680454437a0eee21001607e5d983"
uuid = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
version = "0.8.5"

[[deps.CUDA]]
deps = ["AbstractFFTs", "Adapt", "BFloat16s", "CEnum", "CUDA_Driver_jll", "CUDA_Runtime_Discovery", "CUDA_Runtime_jll", "ExprTools", "GPUArrays", "GPUCompiler", "KernelAbstractions", "LLVM", "LazyArtifacts", "Libdl", "LinearAlgebra", "Logging", "Preferences", "Printf", "Random", "Random123", "RandomNumbers", "Reexport", "Requires", "SparseArrays", "SpecialFunctions", "UnsafeAtomicsLLVM"]
git-tree-sha1 = "968c1365e2992824c3e7a794e30907483f8469a9"
uuid = "052768ef-5323-5732-b1bb-66c8b64840ba"
version = "4.4.1"

[[deps.CUDA_Driver_jll]]
deps = ["Artifacts", "JLLWrappers", "LazyArtifacts", "Libdl", "Pkg"]
git-tree-sha1 = "498f45593f6ddc0adff64a9310bb6710e851781b"
uuid = "4ee394cb-3365-5eb0-8335-949819d2adfc"
version = "0.5.0+1"

[[deps.CUDA_Runtime_Discovery]]
deps = ["Libdl"]
git-tree-sha1 = "2cb12f6b2209f40a4b8967697689a47c50485490"
uuid = "1af6417a-86b4-443c-805f-a4643ffb695f"
version = "0.2.3"

[[deps.CUDA_Runtime_jll]]
deps = ["Artifacts", "CUDA_Driver_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"]
git-tree-sha1 = "5248d9c45712e51e27ba9b30eebec65658c6ce29"
uuid = "76a88914-d11a-5bdc-97e0-2f5a05c973a2"
version = "0.6.0+0"

[[deps.CUTENSOR_jll]]
deps = ["Artifacts", "CUDA_Runtime_jll", "CompilerSupportLibraries_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"]
git-tree-sha1 = "e231d9b8894558e22bb35910a2c5e7458655744f"
uuid = "35b6c64b-1ee1-5834-92a3-3f624899209a"
version = "1.7.0+1"

[[deps.ChainRulesCore]]
deps = ["Compat", "LinearAlgebra"]
git-tree-sha1 = "575cd02e080939a33b6df6c5853d14924c08e35b"
uuid = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
version = "1.23.0"
weakdeps = ["SparseArrays"]

    [deps.ChainRulesCore.extensions]
    ChainRulesCoreSparseArraysExt = "SparseArrays"

[[deps.CodecZlib]]
deps = ["TranscodingStreams", "Zlib_jll"]
git-tree-sha1 = "59939d8a997469ee05c4b4944560a820f9ba0d73"
uuid = "944b1d66-785c-5afd-91f1-9de20f533193"
version = "0.7.4"

[[deps.CommonMark]]
deps = ["Crayons", "JSON", "PrecompileTools", "URIs"]
git-tree-sha1 = "532c4185d3c9037c0237546d817858b23cf9e071"
uuid = "a80b9123-70ca-4bc0-993e-6e3bcb318db6"
version = "0.8.12"

[[deps.Compat]]
deps = ["TOML", "UUIDs"]
git-tree-sha1 = "c955881e3c981181362ae4088b35995446298b80"
uuid = "34da2185-b29b-5c13-b0c7-acf172513d20"
version = "4.14.0"
weakdeps = ["Dates", "LinearAlgebra"]

    [deps.Compat.extensions]
    CompatLinearAlgebraExt = "LinearAlgebra"

[[deps.CompilerSupportLibraries_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "e66e0078-7015-5450-92f7-15fbd957f2ae"
version = "1.1.0+0"

[[deps.ConcurrentUtilities]]
deps = ["Serialization", "Sockets"]
git-tree-sha1 = "9c4708e3ed2b799e6124b5673a712dda0b596a9b"
uuid = "f0e56b4a-5159-44fe-b623-3e5288b988bb"
version = "2.3.1"

[[deps.Crayons]]
git-tree-sha1 = "249fe38abf76d48563e2f4556bebd215aa317e15"
uuid = "a8cc5b0e-0ffa-5ad4-8c14-923d3ee1735f"
version = "4.1.1"

[[deps.DataAPI]]
git-tree-sha1 = "abe83f3a2f1b857aac70ef8b269080af17764bbe"
uuid = "9a962f9c-6df0-11e9-0e5d-c546b8b5ee8a"
version = "1.16.0"

[[deps.DataStructures]]
deps = ["Compat", "InteractiveUtils", "OrderedCollections"]
git-tree-sha1 = "0f4b5d62a88d8f59003e43c25a8a90de9eb76317"
uuid = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
version = "0.18.18"

[[deps.DataValueInterfaces]]
git-tree-sha1 = "bfc1187b79289637fa0ef6d4436ebdfe6905cbd6"
uuid = "e2d170a0-9d28-54be-80f0-106bbe20a464"
version = "1.0.0"

[[deps.Dates]]
deps = ["Printf"]
uuid = "ade2ca70-3891-5945-98fb-dc099432e06a"

[[deps.DelimitedFiles]]
deps = ["Mmap"]
git-tree-sha1 = "9e2f36d3c96a820c678f2f1f1782582fcf685bae"
uuid = "8bb1440f-4735-579b-a4ab-409b98df4dab"
version = "1.9.1"

[[deps.Distances]]
deps = ["LinearAlgebra", "Statistics", "StatsAPI"]
git-tree-sha1 = "66c4c81f259586e8f002eacebc177e1fb06363b0"
uuid = "b4f34e82-e78d-54a5-968a-f98e89d6e8f7"
version = "0.10.11"
weakdeps = ["ChainRulesCore", "SparseArrays"]

    [deps.Distances.extensions]
    DistancesChainRulesCoreExt = "ChainRulesCore"
    DistancesSparseArraysExt = "SparseArrays"

[[deps.Distributed]]
deps = ["Random", "Serialization", "Sockets"]
uuid = "8ba89e20-285c-5b6f-9357-94700520ee1b"

[[deps.DocStringExtensions]]
deps = ["LibGit2"]
git-tree-sha1 = "2fb1e02f2b635d0845df5d7c167fec4dd739b00d"
uuid = "ffbed154-4ef7-542d-bbb7-c09d3a79fcae"
version = "0.9.3"

[[deps.Downloads]]
deps = ["ArgTools", "FileWatching", "LibCURL", "NetworkOptions"]
uuid = "f43a241f-c20a-4ad4-852c-f6b1247861c6"
version = "1.6.0"

[[deps.ExceptionUnwrapping]]
deps = ["Test"]
git-tree-sha1 = "dcb08a0d93ec0b1cdc4af184b26b591e9695423a"
uuid = "460bff9d-24e4-43bc-9d9f-a8973cb893f4"
version = "0.1.10"

[[deps.ExprTools]]
git-tree-sha1 = "27415f162e6028e81c72b82ef756bf321213b6ec"
uuid = "e2ba6199-217a-4e67-a87a-7c52f15ade04"
version = "0.1.10"

[[deps.FFTW]]
deps = ["AbstractFFTs", "FFTW_jll", "LinearAlgebra", "MKL_jll", "Preferences", "Reexport"]
git-tree-sha1 = "4820348781ae578893311153d69049a93d05f39d"
uuid = "7a1cc6ca-52ef-59f5-83cd-3a7055c09341"
version = "1.8.0"

[[deps.FFTW_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
git-tree-sha1 = "c6033cc3892d0ef5bb9cd29b7f2f0331ea5184ea"
uuid = "f5851436-0d7a-5f13-b9de-f02708fd171a"
version = "3.3.10+0"

[[deps.FileIO]]
deps = ["Pkg", "Requires", "UUIDs"]
git-tree-sha1 = "c5c28c245101bd59154f649e19b038d15901b5dc"
uuid = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549"
version = "1.16.2"

[[deps.FileWatching]]
uuid = "7b1f6079-737a-58dc-b8bc-7a2ca5c1b5ee"

[[deps.Future]]
deps = ["Random"]
uuid = "9fa8497b-333b-5362-9e8d-4d0656e87820"

[[deps.GPUArrays]]
deps = ["Adapt", "GPUArraysCore", "LLVM", "LinearAlgebra", "Printf", "Random", "Reexport", "Serialization", "Statistics"]
git-tree-sha1 = "2e57b4a4f9cc15e85a24d603256fe08e527f48d1"
uuid = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7"
version = "8.8.1"

[[deps.GPUArraysCore]]
deps = ["Adapt"]
git-tree-sha1 = "2d6ca471a6c7b536127afccfa7564b5b39227fe0"
uuid = "46192b85-c4d5-4398-a991-12ede77f4527"
version = "0.1.5"

[[deps.GPUCompiler]]
deps = ["ExprTools", "InteractiveUtils", "LLVM", "Libdl", "Logging", "Scratch", "TimerOutputs", "UUIDs"]
git-tree-sha1 = "72b2e3c2ba583d1a7aa35129e56cf92e07c083e3"
uuid = "61eb1bfa-7361-4325-ad38-22787b887f55"
version = "0.21.4"

[[deps.GitHub]]
deps = ["Base64", "Dates", "HTTP", "JSON", "MbedTLS", "Sockets", "SodiumSeal", "URIs"]
git-tree-sha1 = "7ee730a8484d673a8ce21d8536acfe6494475994"
uuid = "bc5e4493-9b4d-5f90-b8aa-2b2bcaad7a26"
version = "5.9.0"

[[deps.Glob]]
git-tree-sha1 = "97285bbd5230dd766e9ef6749b80fc617126d496"
uuid = "c27321d9-0574-5035-807b-f59d2c89b15c"
version = "1.3.1"

[[deps.Graphs]]
deps = ["ArnoldiMethod", "Compat", "DataStructures", "Distributed", "Inflate", "LinearAlgebra", "Random", "SharedArrays", "SimpleTraits", "SparseArrays", "Statistics"]
git-tree-sha1 = "899050ace26649433ef1af25bc17a815b3db52b7"
uuid = "86223c79-3864-5bf0-83f7-82e725a168b6"
version = "1.9.0"

[[deps.HDF5]]
deps = ["Compat", "HDF5_jll", "Libdl", "MPIPreferences", "Mmap", "Preferences", "Printf", "Random", "Requires", "UUIDs"]
git-tree-sha1 = "26407bd1c60129062cec9da63dc7d08251544d53"
uuid = "f67ccb44-e63f-5c2f-98bd-6dc0ccc4ba2f"
version = "0.17.1"

    [deps.HDF5.extensions]
    MPIExt = "MPI"

    [deps.HDF5.weakdeps]
    MPI = "da04e1cc-30fd-572f-bb4f-1f8673147195"

[[deps.HDF5_jll]]
deps = ["Artifacts", "CompilerSupportLibraries_jll", "JLLWrappers", "LazyArtifacts", "LibCURL_jll", "Libdl", "MPICH_jll", "MPIPreferences", "MPItrampoline_jll", "MicrosoftMPI_jll", "OpenMPI_jll", "OpenSSL_jll", "TOML", "Zlib_jll", "libaec_jll"]
git-tree-sha1 = "e4591176488495bf44d7456bd73179d87d5e6eab"
uuid = "0234f1f7-429e-5d53-9886-15a909be8d59"
version = "1.14.3+1"

[[deps.HTTP]]
deps = ["Base64", "CodecZlib", "ConcurrentUtilities", "Dates", "ExceptionUnwrapping", "Logging", "LoggingExtras", "MbedTLS", "NetworkOptions", "OpenSSL", "Random", "SimpleBufferStream", "Sockets", "URIs", "UUIDs"]
git-tree-sha1 = "db864f2d91f68a5912937af80327d288ea1f3aee"
uuid = "cd3eb016-35fb-5094-929b-558a96fad6f3"
version = "1.10.3"

[[deps.Hwloc_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl"]
git-tree-sha1 = "ca0f6bf568b4bfc807e7537f081c81e35ceca114"
uuid = "e33a78d0-f292-5ffc-b300-72abe9b543c8"
version = "2.10.0+0"

[[deps.Inflate]]
git-tree-sha1 = "ea8031dea4aff6bd41f1df8f2fdfb25b33626381"
uuid = "d25df0c9-e2be-5dd7-82c8-3ad0b3e990b9"
version = "0.1.4"

[[deps.IntelOpenMP_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
git-tree-sha1 = "ad37c091f7d7daf900963171600d7c1c5c3ede32"
uuid = "1d5cc7b8-4909-519e-a0f8-d0f5ad9712d0"
version = "2023.2.0+0"

[[deps.InteractiveUtils]]
deps = ["Markdown"]
uuid = "b77e0a4c-d291-57a0-90e8-8db25a27a240"

[[deps.IrrationalConstants]]
git-tree-sha1 = "630b497eafcc20001bba38a4651b327dcfc491d2"
uuid = "92d709cd-6900-40b7-9082-c6be49f344b6"
version = "0.2.2"

[[deps.IteratorInterfaceExtensions]]
git-tree-sha1 = "a3f24677c21f5bbe9d2a714f95dcd58337fb2856"
uuid = "82899510-4779-5014-852e-03e436cf321d"
version = "1.0.0"

[[deps.JLD2]]
deps = ["FileIO", "MacroTools", "Mmap", "OrderedCollections", "Pkg", "PrecompileTools", "Printf", "Reexport", "Requires", "TranscodingStreams", "UUIDs"]
git-tree-sha1 = "5ea6acdd53a51d897672edb694e3cc2912f3f8a7"
uuid = "033835bb-8acc-5ee8-8aae-3f567f8a3819"
version = "0.4.46"

[[deps.JLLWrappers]]
deps = ["Artifacts", "Preferences"]
git-tree-sha1 = "7e5d6779a1e09a36db2a7b6cff50942a0a7d0fca"
uuid = "692b3bcd-3c85-4b1f-b108-f13ce0eb3210"
version = "1.5.0"

[[deps.JSON]]
deps = ["Dates", "Mmap", "Parsers", "Unicode"]
git-tree-sha1 = "31e996f0a15c7b280ba9f76636b3ff9e2ae58c9a"
uuid = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
version = "0.21.4"

[[deps.JuliaFormatter]]
deps = ["CSTParser", "CommonMark", "DataStructures", "Glob", "Pkg", "PrecompileTools", "Tokenize"]
git-tree-sha1 = "1954b04bf7ce17ed708ce9059d05881f58f07845"
uuid = "98e50ef6-434e-11e9-1051-2b60c6c9e899"
version = "1.0.52"

[[deps.KernelAbstractions]]
deps = ["Adapt", "Atomix", "InteractiveUtils", "LinearAlgebra", "MacroTools", "PrecompileTools", "Requires", "SparseArrays", "StaticArrays", "UUIDs", "UnsafeAtomics", "UnsafeAtomicsLLVM"]
git-tree-sha1 = "ed7167240f40e62d97c1f5f7735dea6de3cc5c49"
uuid = "63c18a36-062a-441e-b654-da1e3ab1ce7c"
version = "0.9.18"

    [deps.KernelAbstractions.extensions]
    EnzymeExt = "EnzymeCore"

    [deps.KernelAbstractions.weakdeps]
    EnzymeCore = "f151be2c-9106-41f4-ab19-57ee4f262869"

[[deps.LLVM]]
deps = ["CEnum", "LLVMExtra_jll", "Libdl", "Preferences", "Printf", "Requires", "Unicode"]
git-tree-sha1 = "ddab4d40513bce53c8e3157825e245224f74fae7"
uuid = "929cbde3-209d-540e-8aea-75f648917ca0"
version = "6.6.0"
weakdeps = ["BFloat16s"]

    [deps.LLVM.extensions]
    BFloat16sExt = "BFloat16s"

[[deps.LLVMExtra_jll]]
deps = ["Artifacts", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"]
git-tree-sha1 = "88b916503aac4fb7f701bb625cd84ca5dd1677bc"
uuid = "dad2f222-ce93-54a1-a47d-0025e8a3acab"
version = "0.0.29+0"

[[deps.LRUCache]]
git-tree-sha1 = "b3cc6698599b10e652832c2f23db3cab99d51b59"
uuid = "8ac3fa9e-de4c-5943-b1dc-09c6b5f20637"
version = "1.6.1"
weakdeps = ["Serialization"]

    [deps.LRUCache.extensions]
    SerializationExt = ["Serialization"]

[[deps.LabelledGraphs]]
deps = ["Graphs", "MetaGraphs"]
git-tree-sha1 = "436f40ecb7360aed88ed69893c659b35d738919b"
uuid = "605abd48-4d17-4660-b914-d4df33194460"
version = "0.4.4"

[[deps.LazilyInitializedFields]]
git-tree-sha1 = "8f7f3cabab0fd1800699663533b6d5cb3fc0e612"
uuid = "0e77f7df-68c5-4e49-93ce-4cd80f5598bf"
version = "1.2.2"

[[deps.LazyArtifacts]]
deps = ["Artifacts", "Pkg"]
uuid = "4af54fe1-eca0-43a8-85a7-787d91b784e3"

[[deps.LazyStack]]
deps = ["ChainRulesCore", "Compat", "LinearAlgebra"]
git-tree-sha1 = "aff621f1f49e9262a34aaf0d57d02ea3b35aec60"
uuid = "1fad7336-0346-5a1a-a56f-a06ba010965b"
version = "0.1.3"

[[deps.LibCURL]]
deps = ["LibCURL_jll", "MozillaCACerts_jll"]
uuid = "b27032c2-a3e7-50c8-80cd-2d36dbcbfd21"
version = "0.6.4"

[[deps.LibCURL_jll]]
deps = ["Artifacts", "LibSSH2_jll", "Libdl", "MbedTLS_jll", "Zlib_jll", "nghttp2_jll"]
uuid = "deac9b47-8bc7-5906-a0fe-35ac56dc84c0"
version = "8.4.0+0"

[[deps.LibGit2]]
deps = ["Base64", "LibGit2_jll", "NetworkOptions", "Printf", "SHA"]
uuid = "76f85450-5226-5b5a-8eaa-529ad045b433"

[[deps.LibGit2_jll]]
deps = ["Artifacts", "LibSSH2_jll", "Libdl", "MbedTLS_jll"]
uuid = "e37daf67-58a4-590a-8e99-b0245dd2ffc5"
version = "1.6.4+0"

[[deps.LibSSH2_jll]]
deps = ["Artifacts", "Libdl", "MbedTLS_jll"]
uuid = "29816b5a-b9ab-546f-933c-edad1886dfa8"
version = "1.11.0+1"

[[deps.Libdl]]
uuid = "8f399da3-3557-5675-b5ff-fb832c97cbdb"

[[deps.LicenseCheck]]
deps = ["licensecheck_jll"]
git-tree-sha1 = "e98bc9e1f773123cfdb1fa3d7a4c898e1f030341"
uuid = "726dbf0d-6eb6-41af-b36c-cd770e0f00cc"
version = "0.2.2"

[[deps.LinearAlgebra]]
deps = ["Libdl", "OpenBLAS_jll", "libblastrampoline_jll"]
uuid = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"

[[deps.LocalRegistry]]
deps = ["Random", "RegistryInstances", "RegistryTools", "TOML", "UUIDs"]
git-tree-sha1 = "d01c44f41135b4e39656201ae77a95aeba9fe395"
uuid = "89398ba2-070a-4b16-a995-9893c55d93cf"
version = "0.5.6"

[[deps.LogExpFunctions]]
deps = ["DocStringExtensions", "IrrationalConstants", "LinearAlgebra"]
git-tree-sha1 = "18144f3e9cbe9b15b070288eef858f71b291ce37"
uuid = "2ab3a3ac-af41-5b50-aa03-7779005ae688"
version = "0.3.27"

    [deps.LogExpFunctions.extensions]
    LogExpFunctionsChainRulesCoreExt = "ChainRulesCore"
    LogExpFunctionsChangesOfVariablesExt = "ChangesOfVariables"
    LogExpFunctionsInverseFunctionsExt = "InverseFunctions"

    [deps.LogExpFunctions.weakdeps]
    ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
    ChangesOfVariables = "9e997f8a-9a97-42d5-a9f1-ce6bfc15e2c0"
    InverseFunctions = "3587e190-3f89-42d0-90ee-14403ec27112"

[[deps.Logging]]
uuid = "56ddb016-857b-54e1-b83d-db4d58db5568"

[[deps.LoggingExtras]]
deps = ["Dates", "Logging"]
git-tree-sha1 = "c1dd6d7978c12545b4179fb6153b9250c96b0075"
uuid = "e6f89c97-d47a-5376-807f-9c37f3926c36"
version = "1.0.3"

[[deps.LowRankApprox]]
deps = ["FFTW", "LinearAlgebra", "LowRankMatrices", "Nullables", "Random", "SparseArrays"]
git-tree-sha1 = "031af63ba945e23424815014ba0e59c28f5aed32"
uuid = "898213cb-b102-5a47-900c-97e73b919f73"
version = "0.5.5"

[[deps.LowRankMatrices]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "7c8664b2f3d5c3d9b77605c03d53b18813e79b0f"
uuid = "e65ccdef-c354-471a-8090-89bec1c20ec3"
version = "1.0.1"

    [deps.LowRankMatrices.extensions]
    LowRankMatricesFillArraysExt = "FillArrays"

    [deps.LowRankMatrices.weakdeps]
    FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b"

[[deps.MKL]]
deps = ["Artifacts", "Libdl", "LinearAlgebra", "MKL_jll", "PackageCompiler"]
git-tree-sha1 = "e8434bb06f4f5695111c8bb9b54ed2f4b8d3b30d"
uuid = "33e6dc65-8f57-5167-99aa-e5a354878fb2"
version = "0.4.3"

[[deps.MKL_jll]]
deps = ["Artifacts", "IntelOpenMP_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "Pkg"]
git-tree-sha1 = "2ce8695e1e699b68702c03402672a69f54b8aca9"
uuid = "856f044c-d86e-5d09-b602-aeab76dc8ba7"
version = "2022.2.0+0"

[[deps.MPICH_jll]]
deps = ["Artifacts", "CompilerSupportLibraries_jll", "Hwloc_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "MPIPreferences", "TOML"]
git-tree-sha1 = "656036b9ed6f942d35e536e249600bc31d0f9df8"
uuid = "7cb0a576-ebde-5e09-9194-50597f1243b4"
version = "4.2.0+0"

[[deps.MPIPreferences]]
deps = ["Libdl", "Preferences"]
git-tree-sha1 = "8f6af051b9e8ec597fa09d8885ed79fd582f33c9"
uuid = "3da0fdf6-3ccc-4f1b-acd9-58baa6c99267"
version = "0.1.10"

[[deps.MPItrampoline_jll]]
deps = ["Artifacts", "CompilerSupportLibraries_jll", "Hwloc_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "MPIPreferences", "TOML"]
git-tree-sha1 = "77c3bd69fdb024d75af38713e883d0f249ce19c2"
uuid = "f1f71cc9-e9ae-5b93-9b94-4fe0e1ad3748"
version = "5.3.2+0"

[[deps.MacroTools]]
deps = ["Markdown", "Random"]
git-tree-sha1 = "2fa9ee3e63fd3a4f7a9a4f4744a52f4856de82df"
uuid = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09"
version = "0.5.13"

[[deps.Markdown]]
deps = ["Base64"]
uuid = "d6f4376e-aef5-505a-96c1-9c027394607a"

[[deps.MbedTLS]]
deps = ["Dates", "MbedTLS_jll", "MozillaCACerts_jll", "NetworkOptions", "Random", "Sockets"]
git-tree-sha1 = "c067a280ddc25f196b5e7df3877c6b226d390aaf"
uuid = "739be429-bea8-5141-9913-cc70e7f3736d"
version = "1.1.9"

[[deps.MbedTLS_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "c8ffd9c3-330d-5841-b78e-0817d7145fa1"
version = "2.28.2+1"

[[deps.Memoization]]
deps = ["MacroTools"]
git-tree-sha1 = "073f080e733bc6697411901224ed4fd15fefaffa"
uuid = "6fafb56a-5788-4b4e-91ca-c0cea6611c73"
version = "0.2.1"

[[deps.MetaGraphs]]
deps = ["Graphs", "JLD2", "Random"]
git-tree-sha1 = "1130dbe1d5276cb656f6e1094ce97466ed700e5a"
uuid = "626554b9-1ddb-594c-aa3c-2596fe9399a5"
version = "0.7.2"

[[deps.MicrosoftMPI_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
git-tree-sha1 = "f12a29c4400ba812841c6ace3f4efbb6dbb3ba01"
uuid = "9237b28f-5490-5468-be7b-bb81f5f5e6cf"
version = "10.1.4+2"

[[deps.Mmap]]
uuid = "a63ad114-7e13-5084-954f-fe012c677804"

[[deps.Mocking]]
deps = ["Compat", "ExprTools"]
git-tree-sha1 = "4cc0c5a83933648b615c36c2b956d94fda70641e"
uuid = "78c3b35d-d492-501b-9361-3d52fe80e533"
version = "0.7.7"

[[deps.MozillaCACerts_jll]]
uuid = "14a3606d-f60d-562e-9121-12d972cd8159"
version = "2023.1.10"

[[deps.NNlib]]
deps = ["Adapt", "Atomix", "ChainRulesCore", "GPUArraysCore", "KernelAbstractions", "LinearAlgebra", "Pkg", "Random", "Requires", "Statistics"]
git-tree-sha1 = "877f15c331337d54cf24c797d5bcb2e48ce21221"
uuid = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
version = "0.9.12"

    [deps.NNlib.extensions]
    NNlibAMDGPUExt = "AMDGPU"
    NNlibCUDACUDNNExt = ["CUDA", "cuDNN"]
    NNlibCUDAExt = "CUDA"
    NNlibEnzymeCoreExt = "EnzymeCore"

    [deps.NNlib.weakdeps]
    AMDGPU = "21141c5a-9bdb-4563-92ae-f87d6854732e"
    CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
    EnzymeCore = "f151be2c-9106-41f4-ab19-57ee4f262869"
    cuDNN = "02a925ec-e4fe-4b08-9a7e-0d78e3d38ccd"

[[deps.NetworkOptions]]
uuid = "ca575930-c2e3-43a9-ace4-1e988b2c1908"
version = "1.2.0"

[[deps.Nullables]]
git-tree-sha1 = "8f87854cc8f3685a60689d8edecaa29d2251979b"
uuid = "4d1e1d77-625e-5b40-9113-a560ec7a8ecd"
version = "1.0.0"

[[deps.OpenBLAS_jll]]
deps = ["Artifacts", "CompilerSupportLibraries_jll", "Libdl"]
uuid = "4536629a-c528-5b80-bd46-f80d51c5b363"
version = "0.3.23+4"

[[deps.OpenLibm_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "05823500-19ac-5b8b-9628-191a04bc5112"
version = "0.8.1+2"

[[deps.OpenMPI_jll]]
deps = ["Artifacts", "CompilerSupportLibraries_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "MPIPreferences", "TOML"]
git-tree-sha1 = "e25c1778a98e34219a00455d6e4384e017ea9762"
uuid = "fe0851c0-eecd-5654-98d4-656369965a5c"
version = "4.1.6+0"

[[deps.OpenSSL]]
deps = ["BitFlags", "Dates", "MozillaCACerts_jll", "OpenSSL_jll", "Sockets"]
git-tree-sha1 = "af81a32750ebc831ee28bdaaba6e1067decef51e"
uuid = "4d8831e6-92b7-49fb-bdf8-b643e874388c"
version = "1.4.2"

[[deps.OpenSSL_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl"]
git-tree-sha1 = "60e3045590bd104a16fefb12836c00c0ef8c7f8c"
uuid = "458c3c95-2e84-50aa-8efc-19380b2a3a95"
version = "3.0.13+0"

[[deps.OpenSpecFun_jll]]
deps = ["Artifacts", "CompilerSupportLibraries_jll", "JLLWrappers", "Libdl", "Pkg"]
git-tree-sha1 = "13652491f6856acfd2db29360e1bbcd4565d04f1"
uuid = "efe28fd5-8261-553b-a9e1-b2916fc3738e"
version = "0.5.5+0"

[[deps.OrderedCollections]]
git-tree-sha1 = "dfdf5519f235516220579f949664f1bf44e741c5"
uuid = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
version = "1.6.3"

[[deps.PackageCompiler]]
deps = ["Artifacts", "LazyArtifacts", "Libdl", "Pkg", "RelocatableFolders", "UUIDs"]
git-tree-sha1 = "a16924b37299cc7d6106fac255b44a8c79c7c21f"
uuid = "9b87118b-4619-50d2-8e1e-99f35a4d4d9d"
version = "1.7.7"

[[deps.PackageExtensionCompat]]
git-tree-sha1 = "fb28e33b8a95c4cee25ce296c817d89cc2e53518"
uuid = "65ce6f38-6b18-4e1d-a461-8949797d7930"
version = "1.0.2"
weakdeps = ["Requires", "TOML"]

[[deps.Parsers]]
deps = ["Dates"]
git-tree-sha1 = "bfd7d8c7fd87f04543810d9cbd3995972236ba1b"
uuid = "69de0a69-1ddd-5017-9359-2bf0b02dc9f0"
version = "1.1.2"

[[deps.Pkg]]
deps = ["Artifacts", "Dates", "Downloads", "FileWatching", "LibGit2", "Libdl", "Logging", "Markdown", "Printf", "REPL", "Random", "SHA", "Serialization", "TOML", "Tar", "UUIDs", "p7zip_jll"]
uuid = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
version = "1.10.0"

[[deps.PooledArrays]]
deps = ["DataAPI", "Future"]
git-tree-sha1 = "36d8b4b899628fb92c2749eb488d884a926614d3"
uuid = "2dfb63ee-cc39-5dd5-95bd-886bf059d720"
version = "1.4.3"

[[deps.PrecompileTools]]
deps = ["Preferences"]
git-tree-sha1 = "03b4c25b43cb84cee5c90aa9b5ea0a78fd848d2f"
uuid = "aea7be01-6a6a-4083-8856-8a6e6704d82a"
version = "1.2.0"

[[deps.Preferences]]
deps = ["TOML"]
git-tree-sha1 = "9306f6085165d270f7e3db02af26a400d580f5c6"
uuid = "21216c6a-2e73-6563-6e65-726566657250"
version = "1.4.3"

[[deps.Printf]]
deps = ["Unicode"]
uuid = "de0858da-6303-5e67-8744-51eddeeeb8d7"

[[deps.ProgressMeter]]
deps = ["Distributed", "Printf"]
git-tree-sha1 = "763a8ceb07833dd51bb9e3bbca372de32c0605ad"
uuid = "92933f4c-e287-5a05-a399-4b506db050ca"
version = "1.10.0"

[[deps.REPL]]
deps = ["InteractiveUtils", "Markdown", "Sockets", "Unicode"]
uuid = "3fa0cd96-eef1-5676-8a61-b3b8758bbffb"

[[deps.Random]]
deps = ["SHA"]
uuid = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"

[[deps.Random123]]
deps = ["Random", "RandomNumbers"]
git-tree-sha1 = "4743b43e5a9c4a2ede372de7061eed81795b12e7"
uuid = "74087812-796a-5b5d-8853-05524746bad3"
version = "1.7.0"

[[deps.RandomNumbers]]
deps = ["Random", "Requires"]
git-tree-sha1 = "043da614cc7e95c703498a491e2c21f58a2b8111"
uuid = "e6cf234a-135c-5ec9-84dd-332b85af5143"
version = "1.5.3"

[[deps.RecipesBase]]
deps = ["PrecompileTools"]
git-tree-sha1 = "5c3d09cc4f31f5fc6af001c250bf1278733100ff"
uuid = "3cdcf5f2-1ef4-517c-9805-6587b60abb01"
version = "1.3.4"

[[deps.Reexport]]
git-tree-sha1 = "45e428421666073eab6f2da5c9d310d99bb12f9b"
uuid = "189a3867-3050-52da-a836-e630ba90ab69"
version = "1.2.2"

[[deps.RegistryCI]]
deps = ["Base64", "Dates", "GitHub", "HTTP", "JSON", "LibGit2", "LicenseCheck", "Pkg", "Printf", "Random", "RegistryTools", "SHA", "StringDistances", "TOML", "Tar", "Test", "TimeZones", "VisualStringDistances"]
git-tree-sha1 = "f2719e7e9ddbb8137531f93599222ca0a103c18f"
uuid = "0c95cc5f-2f7e-43fe-82dd-79dbcba86b32"
version = "10.0.1"

[[deps.RegistryInstances]]
deps = ["LazilyInitializedFields", "Pkg", "TOML", "Tar"]
git-tree-sha1 = "ffd19052caf598b8653b99404058fce14828be51"
uuid = "2792f1a3-b283-48e8-9a74-f99dce5104f3"
version = "0.1.0"

[[deps.RegistryTools]]
deps = ["AutoHashEquals", "LibGit2", "Pkg", "SHA", "UUIDs"]
git-tree-sha1 = "3dd9eaa965a2925b0a34d994b4d886d797f54b20"
uuid = "d1eb7eb1-105f-429d-abf5-b0f65cb9e2c4"
version = "2.2.3"

[[deps.RelocatableFolders]]
deps = ["SHA", "Scratch"]
git-tree-sha1 = "cdbd3b1338c72ce29d9584fdbe9e9b70eeb5adca"
uuid = "05181044-ff0b-4ac5-8273-598c1e38db00"
version = "0.1.3"

[[deps.Requires]]
deps = ["UUIDs"]
git-tree-sha1 = "838a3a4188e2ded87a4f9f184b4b0d78a1e91cb7"
uuid = "ae029012-a4dd-5104-9daa-d747884805df"
version = "1.3.0"

[[deps.SHA]]
uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce"
version = "0.7.0"

[[deps.Scratch]]
deps = ["Dates"]
git-tree-sha1 = "3bac05bc7e74a75fd9cba4295cde4045d9fe2386"
uuid = "6c6a2e73-6563-6170-7368-637461726353"
version = "1.2.1"

[[deps.SentinelArrays]]
deps = ["Dates", "Random"]
git-tree-sha1 = "0e7508ff27ba32f26cd459474ca2ede1bc10991f"
uuid = "91c51154-3ec4-41a3-a24f-3f23e20d615c"
version = "1.4.1"

[[deps.Serialization]]
uuid = "9e88b42a-f829-5b0c-bbe9-9e923198166b"

[[deps.SharedArrays]]
deps = ["Distributed", "Mmap", "Random", "Serialization"]
uuid = "1a1011a3-84de-559e-8e89-a11a2f7dc383"

[[deps.SimpleBufferStream]]
git-tree-sha1 = "874e8867b33a00e784c8a7e4b60afe9e037b74e1"
uuid = "777ac1f9-54b0-4bf8-805c-2214025038e7"
version = "1.1.0"

[[deps.SimpleTraits]]
deps = ["InteractiveUtils", "MacroTools"]
git-tree-sha1 = "5d7e3f4e11935503d3ecaf7186eac40602e7d231"
uuid = "699a6c99-e7fa-54fc-8d76-47d257e15c1d"
version = "0.9.4"

[[deps.Sockets]]
uuid = "6462fe0b-24de-5631-8697-dd941f90decc"

[[deps.SodiumSeal]]
deps = ["Base64", "Libdl", "libsodium_jll"]
git-tree-sha1 = "80cef67d2953e33935b41c6ab0a178b9987b1c99"
uuid = "2133526b-2bfb-4018-ac12-889fb3908a75"
version = "0.1.1"

[[deps.SparseArrays]]
deps = ["Libdl", "LinearAlgebra", "Random", "Serialization", "SuiteSparse_jll"]
uuid = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
version = "1.10.0"

[[deps.SpecialFunctions]]
deps = ["IrrationalConstants", "LogExpFunctions", "OpenLibm_jll", "OpenSpecFun_jll"]
git-tree-sha1 = "e2cfc4012a19088254b3950b85c3c1d8882d864d"
uuid = "276daf66-3868-5448-9aa4-cd146d93841b"
version = "2.3.1"
weakdeps = ["ChainRulesCore"]

    [deps.SpecialFunctions.extensions]
    SpecialFunctionsChainRulesCoreExt = "ChainRulesCore"

[[deps.SpinGlassEngine]]
deps = ["CUDA", "DocStringExtensions", "Graphs", "LabelledGraphs", "LinearAlgebra", "MKL", "Memoization", "MetaGraphs", "NNlib", "ProgressMeter", "SpinGlassNetworks", "SpinGlassTensors", "Statistics", "TensorCast", "TensorOperations"]
path = "../../../new-zksi-repo/SpinGlassEngine.jl"
uuid = "0563570f-ea1b-4080-8a64-041ac6565a4e"
version = "1.0.0"

[[deps.SpinGlassNetworks]]
deps = ["CSV", "CUDA", "DocStringExtensions", "Graphs", "HDF5", "JLD2", "LabelledGraphs", "LinearAlgebra", "MKL", "MetaGraphs", "SparseArrays", "SpinGlassTensors", "TensorCast"]
path = "../../../new-zksi-repo/SpinGlassNetworks.jl"
uuid = "b7f6bd3e-55dc-4da6-96a9-ef9dbec6ac19"
version = "1.0.0"

[[deps.SpinGlassTensors]]
deps = ["CUDA", "DocStringExtensions", "LinearAlgebra", "LowRankApprox", "MKL", "Memoization", "NNlib", "SparseArrays", "TSVD", "TensorCast", "TensorOperations", "TransmuteDims", "cuTENSOR"]
path = "../../../new-zksi-repo/SpinGlassTensors.jl"
uuid = "7584fc6a-5a23-4eeb-8277-827aab0146ea"
version = "1.0.0"

[[deps.StaticArrays]]
deps = ["LinearAlgebra", "PrecompileTools", "Random", "StaticArraysCore"]
git-tree-sha1 = "bf074c045d3d5ffd956fa0a461da38a44685d6b2"
uuid = "90137ffa-7385-5640-81b9-e52037218182"
version = "1.9.3"
weakdeps = ["ChainRulesCore", "Statistics"]

    [deps.StaticArrays.extensions]
    StaticArraysChainRulesCoreExt = "ChainRulesCore"
    StaticArraysStatisticsExt = "Statistics"

[[deps.StaticArraysCore]]
git-tree-sha1 = "36b3d696ce6366023a0ea192b4cd442268995a0d"
uuid = "1e83bf80-4336-4d27-bf5d-d5a4f845583c"
version = "1.4.2"

[[deps.Statistics]]
deps = ["LinearAlgebra", "SparseArrays"]
uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
version = "1.10.0"

[[deps.StatsAPI]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "1ff449ad350c9c4cbc756624d6f8a8c3ef56d3ed"
uuid = "82ae8749-77ed-4fe6-ae5f-f523153014b0"
version = "1.7.0"

[[deps.Strided]]
deps = ["LinearAlgebra", "StridedViews", "TupleTools"]
git-tree-sha1 = "40c69be0e1b72ee2f42923b7d1ff13e0b04e675c"
uuid = "5e0ebb24-38b0-5f93-81fe-25c709ecae67"
version = "2.0.4"

[[deps.StridedViews]]
deps = ["LinearAlgebra", "PackageExtensionCompat"]
git-tree-sha1 = "5b765c4e401693ab08981989f74a36a010aa1d8e"
uuid = "4db3bf67-4bd7-4b4e-b153-31dc3fb37143"
version = "0.2.2"
weakdeps = ["CUDA"]

    [deps.StridedViews.extensions]
    StridedViewsCUDAExt = "CUDA"

[[deps.StringDistances]]
deps = ["Distances", "StatsAPI"]
git-tree-sha1 = "ceeef74797d961aee825aabf71446d6aba898acb"
uuid = "88034a9c-02f8-509d-84a9-84ec65e18404"
version = "0.11.2"

[[deps.SuiteSparse_jll]]
deps = ["Artifacts", "Libdl", "libblastrampoline_jll"]
uuid = "bea87d4a-7f5b-5778-9afe-8cc45184846c"
version = "7.2.1+1"

[[deps.TOML]]
deps = ["Dates"]
uuid = "fa267f1f-6049-4f14-aa54-33bafae1ed76"
version = "1.0.3"

[[deps.TSVD]]
deps = ["Adapt", "LinearAlgebra"]
git-tree-sha1 = "c39caef6bae501e5607a6caf68dd9ac6e8addbcb"
uuid = "9449cd9e-2762-5aa3-a617-5413e99d722e"
version = "0.4.4"

[[deps.TableTraits]]
deps = ["IteratorInterfaceExtensions"]
git-tree-sha1 = "c06b2f539df1c6efa794486abfb6ed2022561a39"
uuid = "3783bdb8-4a98-5b6b-af9a-565f29a5fe9c"
version = "1.0.1"

[[deps.Tables]]
deps = ["DataAPI", "DataValueInterfaces", "IteratorInterfaceExtensions", "LinearAlgebra", "OrderedCollections", "TableTraits"]
git-tree-sha1 = "cb76cf677714c095e535e3501ac7954732aeea2d"
uuid = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
version = "1.11.1"

[[deps.Tar]]
deps = ["ArgTools", "SHA"]
uuid = "a4e569a6-e804-4fa4-b0f3-eef7a1d5b13e"
version = "1.10.0"

[[deps.TensorCast]]
deps = ["ChainRulesCore", "Compat", "LazyStack", "LinearAlgebra", "MacroTools", "Random", "StaticArrays", "TransmuteDims"]
git-tree-sha1 = "88423a9e2a1eb7fb2e8c4dd7ede52e28bc5769eb"
uuid = "02d47bb6-7ce6-556a-be16-bb1710789e2b"
version = "0.4.6"

[[deps.TensorOperations]]
deps = ["LRUCache", "LinearAlgebra", "PackageExtensionCompat", "Strided", "StridedViews", "TupleTools", "VectorInterface"]
git-tree-sha1 = "59bcc1e51aa8c7489fa64d4cab0c4b0202edfc0e"
uuid = "6aa20fa7-93e2-5fca-9bc0-fbd0db3c71a2"
version = "4.1.0"
weakdeps = ["CUDA", "ChainRulesCore", "cuTENSOR"]

    [deps.TensorOperations.extensions]
    TensorOperationsChainRulesCoreExt = "ChainRulesCore"
    TensorOperationscuTENSORExt = ["cuTENSOR", "CUDA"]

[[deps.Test]]
deps = ["InteractiveUtils", "Logging", "Random", "Serialization"]
uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[[deps.TimeZones]]
deps = ["Dates", "Future", "LazyArtifacts", "Mocking", "Pkg", "Printf", "RecipesBase", "Serialization", "Unicode"]
git-tree-sha1 = "a5688ffdbd849a98503c6650effe79fe89a41252"
uuid = "f269a46b-ccf7-5d73-abea-4c690281aa53"
version = "1.5.9"

[[deps.TimerOutputs]]
deps = ["ExprTools", "Printf"]
git-tree-sha1 = "f548a9e9c490030e545f72074a41edfd0e5bcdd7"
uuid = "a759f4b9-e2f1-59dc-863e-4aeb61b1ea8f"
version = "0.5.23"

[[deps.Tokenize]]
git-tree-sha1 = "5b5a892ba7704c0977013bd0f9c30f5d962181e0"
uuid = "0796e94c-ce3b-5d07-9a54-7f471281c624"
version = "0.5.28"

[[deps.TranscodingStreams]]
git-tree-sha1 = "3caa21522e7efac1ba21834a03734c57b4611c7e"
uuid = "3bb67fe8-82b1-5028-8e26-92a6c54297fa"
version = "0.10.4"
weakdeps = ["Random", "Test"]

    [deps.TranscodingStreams.extensions]
    TestExt = ["Test", "Random"]

[[deps.TransmuteDims]]
deps = ["Adapt", "ChainRulesCore", "GPUArraysCore", "LinearAlgebra", "Requires", "Strided"]
git-tree-sha1 = "5b6f1f2ba5e91983eabc47cb362f92d9a96b579f"
repo-rev = "strided2"
repo-url = "https://github.com/mcabbott/TransmuteDims.jl.git"
uuid = "24ddb15e-299a-5cc3-8414-dbddc482d9ca"
version = "0.1.16"

[[deps.TupleTools]]
git-tree-sha1 = "41d61b1c545b06279871ef1a4b5fcb2cac2191cd"
uuid = "9d95972d-f1c8-5527-a6e0-b4b365fa01f6"
version = "1.5.0"

[[deps.URIs]]
git-tree-sha1 = "67db6cc7b3821e19ebe75791a9dd19c9b1188f2b"
uuid = "5c2747f8-b7ea-4ff2-ba2e-563bfd36b1d4"
version = "1.5.1"

[[deps.UUIDs]]
deps = ["Random", "SHA"]
uuid = "cf7118a7-6976-5b1a-9a39-7adc72f591a4"

[[deps.UnbalancedOptimalTransport]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "9af2572277f564035f452ec82c56de795f523c45"
uuid = "6f61b460-fd45-461a-bdf7-98edd72e362f"
version = "0.2.1"

[[deps.Unicode]]
uuid = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"

[[deps.UnsafeAtomics]]
git-tree-sha1 = "6331ac3440856ea1988316b46045303bef658278"
uuid = "013be700-e6cd-48c3-b4a1-df204f14c38f"
version = "0.2.1"

[[deps.UnsafeAtomicsLLVM]]
deps = ["LLVM", "UnsafeAtomics"]
git-tree-sha1 = "323e3d0acf5e78a56dfae7bd8928c989b4f3083e"
uuid = "d80eeb9a-aca5-4d75-85e5-170c8b632249"
version = "0.1.3"

[[deps.VectorInterface]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "ed8f91274e744e5030c349e76fa98bf68236766f"
uuid = "409d34a3-91d5-4945-b6ec-7529ddf182d8"
version = "0.4.4"

[[deps.VisualStringDistances]]
deps = ["DelimitedFiles", "LinearAlgebra", "StaticArrays", "UnbalancedOptimalTransport"]
git-tree-sha1 = "0b2c7d9d5c16629f165d03b8769a07ffd44c6ad7"
uuid = "089bb0c6-1854-47b9-96f7-327dbbe09dca"
version = "0.1.1"

[[deps.Zlib_jll]]
deps = ["Libdl"]
uuid = "83775a58-1f1d-513f-b197-d71354ab007a"
version = "1.2.13+1"

[[deps.cuTENSOR]]
deps = ["CEnum", "CUDA", "CUTENSOR_jll", "LinearAlgebra"]
git-tree-sha1 = "c5c63b96b5fb0c7133145eb967882142420e6272"
uuid = "011b41b2-24ef-40a8-b3eb-fa098493e9e1"
version = "1.1.0"

[[deps.libaec_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl"]
git-tree-sha1 = "46bf7be2917b59b761247be3f317ddf75e50e997"
uuid = "477f73a3-ac25-53e9-8cc3-50b2fa2566f0"
version = "1.1.2+0"

[[deps.libblastrampoline_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "8e850b90-86db-534c-a0d3-1478176c7d93"
version = "5.8.0+1"

[[deps.libsodium_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
git-tree-sha1 = "848ab3d00fe39d6fbc2a8641048f8f272af1c51e"
uuid = "a9144af2-ca23-56d9-984f-0d03f7b5ccf8"
version = "1.0.20+0"

[[deps.licensecheck_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
git-tree-sha1 = "b790ad21ac235c39c0eb34214ccf3d5f5ea60efa"
uuid = "4ecb348a-8b88-51ea-b912-4c460483ee91"
version = "0.3.101+0"

[[deps.nghttp2_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "8e850ede-7688-5339-a07c-302acd2aaf8d"
version = "1.52.0+1"

[[deps.p7zip_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "3f19e933-33d8-53b3-aaab-bd5110c3b7a0"
version = "17.4.0+2"

Expected behavior

Correct 1x1 matrix multiplication.

Version info

Details on Julia:

julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 20 × 12th Gen Intel(R) Core(TM) i7-12700KF
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, alderlake)
Threads: 1 default, 0 interactive, 1 GC (on 20 virtual cores)

Details on CUDA:

julia> CUDA.versioninfo()
CUDA runtime 12.1, artifact installation
CUDA driver 12.0
NVIDIA driver 525.147.5

CUDA libraries: 
- CUBLAS: 12.1.3
- CURAND: 10.3.2
- CUFFT: 11.0.2
- CUSOLVER: 11.4.5
- CUSPARSE: 12.1.0
- CUPTI: 18.0.0
- NVML: 12.0.0+525.147.5

Julia packages: 
- CUDA: 4.4.1
- CUDA_Driver_jll: 0.5.0+1
- CUDA_Runtime_jll: 0.6.0+0

Toolchain:
- Julia: 1.10.2
- LLVM: 15.0.7
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA GeForce RTX 3080 (sm_86, 7.963 GiB / 10.000 GiB available)

I also tried reproducing on some different systems, and the same errors occur on:

julia> CUDA.versioninfo()
CUDA runtime 12.3, artifact installation
CUDA driver 12.3
NVIDIA driver 545.23.6

CUDA libraries: 
- CUBLAS: 12.3.4
- CURAND: 10.3.4
- CUFFT: 11.0.12
- CUSOLVER: 11.5.4
- CUSPARSE: 12.2.0
- CUPTI: 21.0.0
- NVML: 12.0.0+545.23.6

Julia packages: 
- CUDA: 5.2.0
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.11.1+0

Toolchain:
- Julia: 1.10.2
- LLVM: 15.0.7

2 devices:
  0: NVIDIA RTX A6000 (sm_86, 44.256 GiB / 44.988 GiB available)
  1: NVIDIA RTX A6000 (sm_86, 44.548 GiB / 44.988 GiB available)

julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × 13th Gen Intel(R) Core(TM) i9-13900K
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, goldmont)
Threads: 1 default, 0 interactive, 1 GC (on 32 virtual cores)

Strangely, on this system

julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 36 × Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake-avx512)
  Threads: 1 on 36 virtual cores

julia> CUDA.versioninfo()
CUDA runtime 11.8, artifact installation
CUDA driver 11.4
NVIDIA driver 470.182.3

CUDA libraries: 
- CUBLAS: 11.11.3
- CURAND: 10.3.0
- CUFFT: 10.9.0
- CUSOLVER: 11.4.1
- CUSPARSE: 11.7.5
- CUPTI: 18.0.0
- NVML: 11.0.0+470.182.3

Julia packages: 
- CUDA.jl: 5.1.2
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.10.1+0
- CUDA_Runtime_Discovery: 0.2.3

Toolchain:
- Julia: 1.8.5
- LLVM: 13.0.1

2 devices:
  0: NVIDIA TITAN RTX (sm_75, 23.124 GiB / 23.653 GiB available)
  1: NVIDIA TITAN RTX (sm_75, 23.647 GiB / 23.650 GiB available)

the CSC examples pass, the CSR one fails. After upgrade to newer Julia and CUDA.jl

julia> CUDA.versioninfo()
CUDA runtime 11.8, artifact installation
CUDA driver 11.4
NVIDIA driver 470.182.3

CUDA libraries: 
- CUBLAS: 11.11.3
- CURAND: 10.3.0
- CUFFT: 10.9.0
- CUSOLVER: 11.4.1
- CUSPARSE: 11.7.5
- CUPTI: 18.0.0
- NVML: 11.0.0+470.182.3

Julia packages: 
- CUDA: 5.2.0
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.11.1+0

Toolchain:
- Julia: 1.10.2
- LLVM: 15.0.7

2 devices:
  0: NVIDIA TITAN RTX (sm_75, 23.124 GiB / 23.653 GiB available)
  1: NVIDIA TITAN RTX (sm_75, 23.647 GiB / 23.650 GiB available)

julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 36 × Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
Threads: 1 default, 0 interactive, 1 GC (on 36 virtual cores)

the behavior stays the same - CSC works, CSR does not. Seems that with CUDA 11 the CSC examples work and fail with CUDA 12.

@lpawela
I suspect that the matrices are so small that the CUDA routine that provides the size of the buffer doesn't modify the variable that stores the size because we need an empty buffer.
We already observed this issue with another CUSPARSE routine:
https://github.com/JuliaGPU/CUDA.jl/blob/master/lib/cusparse/conversions.jl#L206.

An hotfix is to specify an initial value 0 for the size of the buffer here.
If we don't specify the initial value, it's a random number and it could represent 113Tb...

@amontoison setting out = Ref{Csize_t}(UInt64(0)::Csize_t) results in the same errors.

@amontoison I went through the code setting the buffers to zero in some places (master...lpawela:CUDA.jl:lp/sparse-buffer-size). Now I get the following errors

julia> sparse32csc * dense32 # ERROR
ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
 [1] throw_api_error(res::CUDA.cudaError_enum)
   @ CUDA ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:30
 [2] nonblocking_synchronize(val::CuContext)
   @ CUDA ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:174
 [3] device_synchronize(; blocking::Bool, spin::Bool)
   @ CUDA ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:185
 [4] device_synchronize
   @ ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:180 [inlined]
 [5] maybe_synchronize_cuda()
   @ CUDA ~/lib/CUDA.jl/src/initialization.jl:217
 [6] top-level scope
   @ ~/lib/CUDA.jl/src/initialization.jl:208

julia> dense32 * sparse32csc # NO ERROR
ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
 [1] throw_api_error(res::CUDA.cudaError_enum)
   @ CUDA ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:30
 [2] isdone
   @ ~/lib/CUDA.jl/lib/cudadrv/stream.jl:111 [inlined]
 [3] spinning_synchronization(f::typeof(CUDA.isdone), obj::CuStream)
   @ CUDA ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:79
 [4] device_synchronize(; blocking::Bool, spin::Bool)
   @ CUDA ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:182
 [5] device_synchronize
   @ ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:180 [inlined]
 [6] maybe_synchronize_cuda()
   @ CUDA ~/lib/CUDA.jl/src/initialization.jl:217
 [7] top-level scope
   @ ~/lib/CUDA.jl/src/initialization.jl:208

caused by: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:30
  [2] check
    @ ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:37 [inlined]
  [3] cuMemAllocFromPoolAsync
    @ ~/lib/CUDA.jl/lib/utils/call.jl:30 [inlined]
  [4] #alloc#1
    @ ~/lib/CUDA.jl/lib/cudadrv/memory.jl:81 [inlined]
  [5] alloc
    @ ~/lib/CUDA.jl/lib/cudadrv/memory.jl:71 [inlined]
  [6] actual_alloc(bytes::Int64; async::Bool, stream::CuStream, pool::CuMemoryPool)
    @ CUDA ~/lib/CUDA.jl/src/pool.jl:66
  [7] actual_alloc
    @ ~/lib/CUDA.jl/src/pool.jl:59 [inlined]
  [8] #1060
    @ ~/lib/CUDA.jl/src/pool.jl:453 [inlined]
  [9] retry_reclaim
    @ ~/lib/CUDA.jl/src/pool.jl:370 [inlined]
 [10] macro expansion
    @ ~/lib/CUDA.jl/src/pool.jl:452 [inlined]
 [11] macro expansion
    @ ./timing.jl:395 [inlined]
 [12] #_alloc#1059
    @ ~/lib/CUDA.jl/src/pool.jl:448 [inlined]
 [13] _alloc
    @ ~/lib/CUDA.jl/src/pool.jl:444 [inlined]
 [14] #alloc#1058
    @ ~/lib/CUDA.jl/src/pool.jl:434 [inlined]
 [15] alloc
    @ ~/lib/CUDA.jl/src/pool.jl:428 [inlined]
 [16] CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}(::UndefInitializer, dims::Tuple{Int64, Int64})
    @ CUDA ~/lib/CUDA.jl/src/array.jl:74
 [17] CuArray
    @ ~/lib/CUDA.jl/src/array.jl:147 [inlined]
 [18] CuArray
    @ ~/lib/CUDA.jl/src/array.jl:162 [inlined]
 [19] *(A::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, B::CUDA.CUSPARSE.CuSparseMatrixCSC{Float32, Int32})
    @ CUDA.CUSPARSE ~/lib/CUDA.jl/lib/cusparse/interfaces.jl:129
 [20] top-level scope
    @ REPL[9]:1
 [21] top-level scope
    @ ~/lib/CUDA.jl/src/initialization.jl:206

julia> (sparse32csc' * dense32')' # ERROR
WARNING: Error while freeing DeviceBuffer(4 bytes at 0x0000000302001800):
CUDA.CuError(code=CUDA.cudaError_enum(0x000002bc), details=CUDA.Optional{String}(data=nothing))

Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:30
  [2] check
    @ ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:37 [inlined]
  [3] cuMemFreeAsync
    @ ~/lib/CUDA.jl/lib/utils/call.jl:30 [inlined]
  [4] free(buf::CUDA.Mem.DeviceBuffer; stream::CuStream)
    @ CUDA.Mem ~/lib/CUDA.jl/lib/cudadrv/memory.jl:97
  [5] free
    @ ~/lib/CUDA.jl/lib/cudadrv/memory.jl:92 [inlined]
  [6] #actual_free#1042
    @ ~/lib/CUDA.jl/src/pool.jl:78 [inlined]
  [7] actual_free
    @ ~/lib/CUDA.jl/src/pool.jl:75 [inlined]
  [8] #_free#1067
    @ ~/lib/CUDA.jl/src/pool.jl:523 [inlined]
  [9] _free
    @ ~/lib/CUDA.jl/src/pool.jl:510 [inlined]
 [10] macro expansion
    @ ~/lib/CUDA.jl/src/pool.jl:495 [inlined]
 [11] macro expansion
    @ ./timing.jl:395 [inlined]
 [12] #free#1066
    @ ~/lib/CUDA.jl/src/pool.jl:494 [inlined]
 [13] free
    @ ~/lib/CUDA.jl/src/pool.jl:483 [inlined]
 [14] (::CUDA.var"#1073#1074"{CUDA.Mem.DeviceBuffer, Bool})()
    @ CUDA ~/lib/CUDA.jl/src/array.jl:101
 [15] #context!#954
    @ ~/lib/CUDA.jl/lib/cudadrv/state.jl:170 [inlined]
 [16] context!
    @ ~/lib/CUDA.jl/lib/cudadrv/state.jl:165 [inlined]
 [17] _free_buffer(buf::CUDA.Mem.DeviceBuffer, early::Bool)
    @ CUDA ~/lib/CUDA.jl/src/array.jl:89
 [18] release(rc::GPUArrays.RefCounted{CUDA.Mem.DeviceBuffer}, args::Bool)
    @ GPUArrays ~/.julia/packages/GPUArrays/Hd5Sk/src/host/abstractarray.jl:42
 [19] unsafe_free!
    @ ~/.julia/packages/GPUArrays/Hd5Sk/src/host/abstractarray.jl:91 [inlined]
 [20] unsafe_finalize!(xs::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
    @ CUDA ~/lib/CUDA.jl/src/array.jl:113
 [21] top-level scope
    @ REPL[10]:1
 [22] top-level scope
    @ ~/lib/CUDA.jl/src/initialization.jl:206
 [23] eval
    @ ./boot.jl:385 [inlined]
 [24] eval_user_input(ast::Any, backend::REPL.REPLBackend, mod::Module)
    @ REPL ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:150
 [25] repl_backend_loop(backend::REPL.REPLBackend, get_module::Function)
    @ REPL ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:246
 [26] start_repl_backend(backend::REPL.REPLBackend, consumer::Any; get_module::Function)
    @ REPL ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:231
 [27] run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool, backend::Any)
    @ REPL ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:389
 [28] run_repl(repl::REPL.AbstractREPL, consumer::Any)
    @ REPL ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:375
 [29] (::Base.var"#1013#1015"{Bool, Bool, Bool})(REPL::Module)
    @ Base ./client.jl:432
 [30] #invokelatest#2
    @ ./essentials.jl:892 [inlined]
 [31] invokelatest
    @ ./essentials.jl:889 [inlined]
 [32] run_main_repl(interactive::Bool, quiet::Bool, banner::Bool, history_file::Bool, color_set::Bool)
    @ Base ./client.jl:416
 [33] exec_options(opts::Base.JLOptions)
    @ Base ./client.jl:333
 [34] _start()
    @ Base ./client.jl:552
ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
 [1] throw_api_error(res::CUDA.cudaError_enum)
   @ CUDA ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:30
 [2] isdone
   @ ~/lib/CUDA.jl/lib/cudadrv/stream.jl:111 [inlined]
 [3] spinning_synchronization(f::typeof(CUDA.isdone), obj::CuStream)
   @ CUDA ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:79
 [4] device_synchronize(; blocking::Bool, spin::Bool)
   @ CUDA ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:182
 [5] device_synchronize
   @ ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:180 [inlined]
 [6] maybe_synchronize_cuda()
   @ CUDA ~/lib/CUDA.jl/src/initialization.jl:217
 [7] top-level scope
   @ ~/lib/CUDA.jl/src/initialization.jl:208

caused by: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:30
  [2] check
    @ ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:37 [inlined]
  [3] cuMemAllocFromPoolAsync
    @ ~/lib/CUDA.jl/lib/utils/call.jl:30 [inlined]
  [4] #alloc#1
    @ ~/lib/CUDA.jl/lib/cudadrv/memory.jl:81 [inlined]
  [5] alloc
    @ ~/lib/CUDA.jl/lib/cudadrv/memory.jl:71 [inlined]
  [6] actual_alloc(bytes::Int64; async::Bool, stream::CuStream, pool::CuMemoryPool)
    @ CUDA ~/lib/CUDA.jl/src/pool.jl:66
  [7] actual_alloc
    @ ~/lib/CUDA.jl/src/pool.jl:59 [inlined]
  [8] #1060
    @ ~/lib/CUDA.jl/src/pool.jl:453 [inlined]
  [9] retry_reclaim
    @ ~/lib/CUDA.jl/src/pool.jl:370 [inlined]
 [10] macro expansion
    @ ~/lib/CUDA.jl/src/pool.jl:452 [inlined]
 [11] macro expansion
    @ ./timing.jl:395 [inlined]
 [12] #_alloc#1059
    @ ~/lib/CUDA.jl/src/pool.jl:448 [inlined]
 [13] _alloc
    @ ~/lib/CUDA.jl/src/pool.jl:444 [inlined]
 [14] #alloc#1058
    @ ~/lib/CUDA.jl/src/pool.jl:434 [inlined]
 [15] alloc
    @ ~/lib/CUDA.jl/src/pool.jl:428 [inlined]
 [16] CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}(::UndefInitializer, dims::Tuple{Int64, Int64})
    @ CUDA ~/lib/CUDA.jl/src/array.jl:74
 [17] similar
    @ ~/lib/CUDA.jl/src/array.jl:196 [inlined]
 [18] similar
    @ ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/adjtrans.jl:361 [inlined]
 [19] *(A::LinearAlgebra.Adjoint{Float32, CUDA.CUSPARSE.CuSparseMatrixCSC{Float32, Int32}}, B::LinearAlgebra.Adjoint{Float32, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}})
    @ LinearAlgebra ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:106
 [20] top-level scope
    @ REPL[10]:1
 [21] top-level scope
    @ ~/lib/CUDA.jl/src/initialization.jl:206

Can you try with a default buffer size 10000 instead of 0?
Just to check if we still have an error or not.
It seems that we still need a buffer but the size of the buffer is not updated for these small matrices.

@amontoison Yes, this works, thank you. I started a PR with these changes (#2298). Maybe some other places also need updating?

It's a bug in the NVIDIA routine so we should use it only as a workaround for now.
I will submit an issue to NVIDIA.