data-transformation

There are 15 repositories under data-transformation topic.

glom
mahmoud / glom
☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
declarative data recursion python utilities cli nested-structures data-transformation apis dictionaries
Language:Python 1854
hi-primus / optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
big-data-cleaning bigdata cudf dask dask-cudf data-analysis data-cleaner data-cleaning data-cleansing data-exploration data-extraction data-preparation data-profiling data-science data-transformation data-wrangling machine-learning pyspark spark
Language:Python 1463
2ndQuadrant / pglogical
Logical Replication extension for PostgreSQL 15, 14, 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
postgresql replication logical-decoding database-replication subscription publish-subscribe data-transformation data-transport etl cdc zero-downtime
Language:C 964
zingg
zinggAI / zingg
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
fuzzymatch fuzzy-matching deduplication dedupe masterdata dataengineering data-transformation analytics-engineering entity-resolution identity-resolution data-transformations data-science spark ml etl dataquality identity modern-data-stack analytics datalake
Language:Java 922
mattt / TransformerKit
A block-based API for NSValueTransformer, with a growing collection of useful examples.
data-transformation nsvaluetransformer objective-c swift
Language:Objective-C 845
raystack / optimus
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
airflow etl workflows automation golang bigquery data-warehouse analytics data-modelling analytics-engineering data-transformation data-pipelines elt business-intelligence dataops
Language:Go 743
SebKrantz / collapse
Advanced and Fast Data Transformation in R
cran data-aggregation data-analysis data-manipulation data-processing data-science data-transformation econometrics high-performance panel-data r rstats scientific-computing statistics time-series weighted weights
Language:C 627
microsoft / prose
Microsoft Program Synthesis using Examples SDK is a framework of technologies for the automatic generation of programs from input-output examples. This repo includes samples and sample data for the Microsoft Program Synthesis using Example SDK.
csharp data-transformation data-wrangling dotnet examples microsoft program-synthesis prose sdk synthesis
Language:C# 613
ScriptFUSION / Porter
:lipstick: Durable and asynchronous data imports for consuming data at scale and publishing testable SDKs.
porter data-import framework data-transformation php-development abstraction scalability durability asynchronous library fibers
Language:PHP 611
dbohdan / sqawk
Like awk but with SQL and table joins
awk sql data-wrangling cli csv tsv delimited-files data-transformation converter json
Language:Tcl 310
jupyter-naas / naas
Low-code Python library to safely use notebooks in production: schedule workflows, generate assets, trigger webhooks, send notifications, build pipelines, manage secrets (Cloud-only)
ai binder data data-science data-transformation engine etl integration jupyter jupyterlab notebooks open-source pipeline
Language:Python 280
feichao93 / temme
📄 Concise selector to extract JSON from HTML.
css-selector data-transformation html json temme-selector
Language:TypeScript 273
fastverse
fastverse / fastverse
An Extensible Suite of High-Performance and Low-Dependency Packages for Statistical Computing and Data Manipulation in R
high-performance statistical-computing data-manipulation data-transformation low-dependency rstats r c cpp time-series panel-data weights data-aggregation data-science
Language:R 236
mahmoudparsian / data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
spark pyspark data algorithms transformations partitioning-algorithms machine-learning design-patterns data-algorithms data-abstractions python design monoid mapreduce mappers reducers data-transformation dataframes rdd bigdata
Language:Python 198
setl
SETL-Framework / setl
A simple Spark-powered ETL framework that just works 🍺
big-data data-analysis data-engineering data-science data-transformation dataset etl etl-pipeline framework machine-learning modularization pipeline scala setl spark
Language:Scala 177
simongray / clojure-dsl-resources
A curated list of Clojure resources for dealing with domain-specific languages.
data-transformation domain-specific-language dsl nlp parsing
174
markus-wa / cq
Clojure Query: A Command-line Data Processor for JSON, YAML, EDN, XML and more
cli clojure command-line csv data-processing data-transformation edn hacktoberfest json msgpack transformation xml yaml
Language:Clojure 160
strengejacke / sjmisc
Data transformation and utility functions for R
data-transformation r data-wrangling labelled-data recoding
Language:R 157
mahmoudparsian / big-data-mapreduce-course
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
pyspark-algorithms-book mapreduce santa-clara-university pyspark data-algorithms data-transformation data-partition partitioning-algorithms algorithms mapreduce-python mapreduce-algorithm apache-hadoop apache-spark big-data data-analysis data-engineering glossary monoid spark-dataframes spark-rdd
Language:HTML 150
jim-schwoebel / allie
🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.
machine-learning deep-learning machine-learning-library machine-learning-api automl tpot data-augmentation data-cleaning datasets machine-learning-models ludwig voice-computing model-compression model-deployment data-visualization data-cleaning-pipeline data-transformation autokeras autopytorch
Language:Python 141
weaverbird
ToucanToco / weaverbird
A visual data pipeline builder with various backends
mongodb pandas vuejs mysql postgresql redshift snowflake sql data-transformation
Language:TypeScript 94
data-integrations / wrangler
Wrangler Transform: A DMD system for transforming Big Data
avro big-data cdap cdap-plugin data-cleansing data-prep data-science data-transform data-transformation manipulate-data parsing preparation project transform transform-data wrangle
Language:Java 85
galliaproject / gallia-core
A schema-aware Scala library for data transformation
scala data-transformation json spark etl nesting feature-engineering data-science data-engineering data-manipulation
Language:Scala 83
aws-samples / aws-dbs-refarch-datalake
Reference Architectures for Datalakes on AWS
data-lake data-analytics amazon-emr ingest-data emr-cluster glue hive-metastore data-catalog data-transformation
Language:HTML 75
dry-rb / dry-transformer
Data transformation toolkit
dry-rb ruby rubygem library data-transformation data-mapping function-composition functional
Language:Ruby 72
devsgnr / breadroll
breadroll 🥟 is a simple lightweight library for data processing operations written in Typescript and powered by Bun.
csv csv-parser data-engineering data-science tsv tsv-parser eda exploratory-data-analysis bun data-transformation
Language:TypeScript 68
DataWeaveInApex
developerforce / DataWeaveInApex
Examples for working with DataWeave scripts from Apex.
apex data-transformation dataweave
Language:Apex 59
bruin-data / bruin
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
analytics bigquery data-modeling data-pipelines python snowflake sql data-analysis data-transformation
Language:Python 58
daany
bhrnjica / daany
Daany - .NET DAta ANalYtics .NET library with the implementation of DataFrame, Time series decompositions and Linear Algebra routines BLASS and LAPACK.
data-frames series-decomposition data-transformation calculated-columns dataframe mlnet linear-algebra-routines series iris ssa data-frame mkl daany-library
Language:C# 56
assemblee-virtuelle / Semantic-Bus
object flow treatment, data transformation
data-transformation data-mining semantic-data-transformation workflow-sharing worflows
Language:JavaScript 55
scopashq / typestream
⚡️ Next-generation data transformation framework for TypeScript that puts developer experience first
auto-reload data-extraction data-pipeline data-transformation developer-experience typescript
Language:TypeScript 53
nilportugues / php-serializer
Serialize PHP variables, including objects, in any format. Support to unserialize it too.
json-transformation xml-transformation data-transformation json-api array-transformer php php7 yaml-transformer serialization marshaller transformer api json xml yaml yml jsend-transformer hal hal-api jsonapi
Language:PHP 49
hopsoft / pipe_envy
Elixir style pipe operator for Ruby
ruby data-transformation elixir
Language:Ruby 46
bloomberg / pycsvw
A tool to read CSV files with CSVW metadata and transform them into other formats.
csv csvw rdf data-transformation
Language:Python 32
fiddlerwoaroof / data-lens
Functional utilities for Common Lisp
lisp data-transformation data functional-programming transducers
Language:Common Lisp 30
tsantos84 / serializer
A PHP serialization component focused on performance
php7 php-library serialization-library data-transformation
Language:PHP 28