Richard (Rick) Zamora's repositories
distributed
A distributed task scheduler for Dask
NVTabular
A library that sits on top of RAPIDS cuDF library providing a range of benefits for processing extremely large tabular datasets, particularly those that do not fit in GPU or CPU memory. NVTabular has many capabilities including fast terabyte-scale data preparation and accelerated tabular data loading, all on GPU, which streamline the first step for both training and inference to any deep recommender system pipelines.
arrow
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.
cugraph
cuGraph - RAPIDS Graph Analytics Library
cuxfilter
GPU accelerated cross filtering with cuDF.
design-docs
Experimental repo for proposals of future work
fastparquet
python implementation of the parquet columnar file format.
filesystem_spec
A specification that python filesystems should adhere to.
Morpheus
Morpheus SDK
NeMo-Curator
Scalable toolkit for data curation
pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
pynvml-feedstock
A conda-smithy repository for pynvml.
ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
rjzamora.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
systems
Merlin Systems provides tools for combining recommendation models with other elements of production recommender systems (like feature stores, nearest neighbor search, and exploration strategies) into end-to-end recommendation pipelines that can be served with Triton Inference Server.
xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow