There are 1 repository under large-dataset topic.
Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data
Detecting Frauds in Online Transactions using Anamoly Detection Techniques Such as Over Sampling and Under-Sampling as the ratio of Frauds is less than 0.00005 thus, simply applying Classification Algorithm may result in Overfitting
Tools and code samples for solving large network analysis problems in ArcGIS Pro
TensorFlow Input Pipeline Examples based on multi-thread and FIFOQueue
Fast and lightweight event-driven streaming XML parser in pure JavaScript
This repository contains the code and data of the paper titled "SPEC5G: A Dataset for 5G Cellular Network Protocol Analysis" published at AACL 2023.
:fish: A streaming ETL for fish
💧A stream based csv aggregator for limiting RAM usage while processing large data sets.
Virtual Slider/Carousel for React
Large HSET keys delete on Redis.
Some components for internal, line of business angular apps
Raw array (RA) file format for simple, robust, and user-friendly N-dimensional array storage
Processed Amazon Review Dataset for Language Generation (Character Level)
Is it feasable to train a model on 100 million ratings using nothing more than a common laptop? Let's find out.
OrthoSLC: A pipeline to get Orthologous Genes using Reciprocal Best Blast Hit (RBBH) Single Linkage Clustering, indenpendent of relational database management system
Project for CMU 15-780 Graduate Artificial Intelligence
A platform for the world's largest open datasets, stored on a decentralized network
Dynamically load and refresh detail data (master-detail).
Objective of this competition is to use historical loan application data to predict whether or not an applicant will be able to repay a loan.
Finding The Median In Large Sets Of Numbers Split Across N Servers using zeromq and nodejs (experimental)
Experiment in a bid to produce a custom-design table for presenting large data
Machine Learning models for large datasets
VLSI with CAD: Python program which accepts file input and determines the minimum cutset
All the code required to reproduce the results in our paper "Scaling Up Structural Clustering to Large Probabilistic Graphs Using Lyapunov Central Limit Theorem"
Detect whether the text is AI-generated by training a new tokenizer and combining it with tree classification models or by training language models on a large dataset of human & AI-generated texts.
Memory-mapping made easy.
🧬 large scale genes data analysis software
A realtime web based watching solution, akin to UNIX's tail -f command, employs Django Channels for real-time monitoring. It obviates page refreshes, efficiently streams updates, supports multiple connections, and shows the last 10 lines of the log.