Marianna's repositories

urls2dataset

Convert URLs of webpages to dataset

Language:PythonLicense:MITStargazers:3Issues:0Issues:0
Language:Jupyter NotebookLicense:MITStargazers:2Issues:0Issues:0

doc2dataset

A tool to extract text (and images) from documents (like PDFs)

Language:PythonLicense:MITStargazers:2Issues:1Issues:6

audio-dataset

Audio Dataset for training CLAP and other models

Language:PythonStargazers:1Issues:0Issues:0

laion-datasets

Description and pointers of laion datasets

Language:HTMLLicense:MITStargazers:1Issues:0Issues:0
Language:PythonStargazers:1Issues:1Issues:0

openbioml-datasets

bio-datasets

Language:PythonStargazers:1Issues:0Issues:0

tiktok_scraper

TikTok video URLs craper

Language:PythonStargazers:0Issues:2Issues:1
Language:JavaScriptStargazers:0Issues:0Issues:0

any2dataset

Turn any collection of files into a dataset

License:MITStargazers:0Issues:0Issues:0

audio2dataset

Easily turn large sets of audio urls to an audio dataset. More info - TBD

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch

License:MITStargazers:0Issues:0Issues:0

cc2dataset

Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text, ...

License:MITStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0

CLAP

Contrastive Language-Audio Pretraining

License:CC0-1.0Stargazers:0Issues:0Issues:0

datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

License:Apache-2.0Stargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0

heat

Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python

License:MITStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0
Language:JavaScriptLicense:MITStargazers:0Issues:0Issues:0

open_clip

An open source implementation of CLIP.

License:NOASSERTIONStargazers:0Issues:0Issues:0

pii-data

Base data structures for PII management

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

pii-transform

Perform transformations on PII instances detected in documents

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

PointNeXt

[NeurIPS'22] PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies

Language:ShellLicense:MITStargazers:0Issues:0Issues:0

riffusion

Stable diffusion for real-time music generation

License:MITStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:1Issues:0

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

License:Apache-2.0Stargazers:0Issues:0Issues:0

video2dataset

Easily create large video dataset from video urls

Language:PythonLicense:MITStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0