ttungl

Tung Thanh Le's starred repositories

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language:PythonMIT18886 296 1311

data-science-interviews

Data science interview questions and answers

Language:HTMLCC-BY-4.08297 217 16

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

Language:PythonMIT6833 137 450

chatgpt-advanced

WebChatGPT: A browser extension that augments your ChatGPT prompts with web results.

Language:TypeScriptMIT5289 89 108

EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.

Language:Jupyter NotebookNOASSERTION2812 70 446

stat_rethinking_2023

Statistical Rethinking Course for Jan-Mar 2023

Language:RCC0-1.02155 149 15

pymc-resources

PyMC educational resources

Language:Jupyter NotebookMIT1895 65 75

pytorchTutorial

PyTorch Tutorials from my YouTube channel

Language:PythonMIT1692 25 17

tfcausalimpact

Python Causal Impact Implementation Based on Google's R Package. Built using TensorFlow Probability.

Language:PythonApache-2.0584 13 76

causal-inference-tutorial

Repository with code and slides for a tutorial on causal inference.

Language:Jupyter Notebook555 21 9

upliftml

UpliftML: A Python Package for Scalable Uplift Modeling

Language:PythonApache-2.0307 13 7

pymatch

Language:Jupyter NotebookMIT265 11 45

awesome-causal-inference

A (concise) curated list of awesome Causal Inference resources.

207 90

tableone

R package to create "Table 1", description of baseline characteristics with or without propensity score weighting

Language:R206 19 83

window_funcs

A Rust web app to teach SQL window functions

Language:RustApache-2.0128 8 9

amazonqa

Evidence-based QA system for community question answering.

Language:Jupyter Notebook101 7 6

Cracking_The_Machine_Learning_Interview

(Under Construction) I am currently writing a solution from the Medium article "Cracking the Machine Learning Interview," written by Subhrajit Roy. In the past year since the article went public, Subhrajit has only written down the questions with no update on the solutions. I plan on finishing the war. I may add more questions outside of the articles domain. No one else on the internet has written down a solution for machine learning interview, an opportunity I want to take advantage of.

85 50

awesome-Marketing-Analytics

:rotating_light: Resources :briefcase: to learn/practice :dart: Marketing analytics :chart: :rotating_light:

51 40

Cracking-The-Machine-Learning-Interview

Code snippets for our Book solutions

Language:Python40 20

Scanned-document-classification-deep-learning

BFSI sectors deal with lots of unstructured scanned documents which are archived in document management systems for further use.For example in Insurance sector, when a policy goes for underwriting, underwriters attached several raw notes with the policy, Insureds also attach various kind of scanned documents like identity card, bank statement, letters etc. In later parts of the policy life cycle if claims are made on a policy, releted scanned documents also archeived.Now it becomes a tedious job to identify a particular document from this vast repository. The goal of this case study is to develop a deep learning based solution which can automatically classify scanned documents.

Language:Jupyter NotebookMIT35 3 5

Berkeley-Spark

edX:Berkeley:Spark

Language:HTML2100

Document-Image-Classification-with-Intra-Domain-Transfer-Learning-and-Stacked-Generalization-of-Deep

RVL-CDIP could be looked at as the equivalent of ImageNet for the document image community. It’s certainly the largest we’ve seen in the literature. There are 400,000 total document images in the dataset. The dataset contains much noise and variance in composition of each document class. Uncompressed, the dataset size is ~100GB, and comprises 16 classes of document types, with 25,000 samples per classes. Example classes include email, resume, and invoice. Achieved an Accuracy of over 93% which beat the benchmark score of 92% based on https://paperswithcode.com/sota/document-image-classification-on-rvl-cdip

Language:Jupyter Notebook1600

sample-size

This python project is a helper package that uses power analysis to calculate required sample size for any experiment

Language:PythonMIT12 11 1

Deep-Learning

Implemented the deep learning techniques using Google Tensorflow that cover deep neural networks with a fully connected network using SGD and ReLUs; Regularization with a multi-layer neural network using ReLUs, L2-regularization, and dropout, to prevent overfitting; Convolutional Neural Networks (CNNs) with learning rate decay and dropout; and Recurrent Neural Networks (RNNs) for text and sequences with Long Short-Term Memory (LSTM) networks.

Language:Jupyter Notebook11 40

coursera-causality-crash-course

A Crash Course in Causality: Inferring Causal Effects from Observational Data

Language:Jupyter Notebook800

HeteroArchGen4M2S

HeteroArchGen4M2S: An automatic software for configuring and running heterogeneous CPU-GPU architectures on Multi2Sim simulator. This tool is built on top of M2S simulator, it allows us to configure various heterogeneous CPU-GPU architectures (e.g., number of CPU cores, GPU cores, L1$, L2$, memory (size and latency (via CACTI 6.5)), network topologies (currently support 2D-Mesh, customized 2D-Mesh, and Torus networks)...). The output files include the results of network throughput and latency, caches/memory access time, and dynamic power of the cores (can be collected after running McPAT).

Language:GLSLMIT6 30