Tung Thanh Le's starred repositories
data-science-interviews
Data science interview questions and answers
chatgpt-advanced
WebChatGPT: A browser extension that augments your ChatGPT prompts with web results.
EconML
ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
stat_rethinking_2023
Statistical Rethinking Course for Jan-Mar 2023
pymc-resources
PyMC educational resources
pytorchTutorial
PyTorch Tutorials from my YouTube channel
tfcausalimpact
Python Causal Impact Implementation Based on Google's R Package. Built using TensorFlow Probability.
causal-inference-tutorial
Repository with code and slides for a tutorial on causal inference.
awesome-causal-inference
A (concise) curated list of awesome Causal Inference resources.
window_funcs
A Rust web app to teach SQL window functions
Cracking_The_Machine_Learning_Interview
(Under Construction) I am currently writing a solution from the Medium article "Cracking the Machine Learning Interview," written by Subhrajit Roy. In the past year since the article went public, Subhrajit has only written down the questions with no update on the solutions. I plan on finishing the war. I may add more questions outside of the articles domain. No one else on the internet has written down a solution for machine learning interview, an opportunity I want to take advantage of.
awesome-Marketing-Analytics
:rotating_light: Resources :briefcase: to learn/practice :dart: Marketing analytics :chart: :rotating_light:
Cracking-The-Machine-Learning-Interview
Code snippets for our Book solutions
Scanned-document-classification-deep-learning
BFSI sectors deal with lots of unstructured scanned documents which are archived in document management systems for further use.For example in Insurance sector, when a policy goes for underwriting, underwriters attached several raw notes with the policy, Insureds also attach various kind of scanned documents like identity card, bank statement, letters etc. In later parts of the policy life cycle if claims are made on a policy, releted scanned documents also archeived.Now it becomes a tedious job to identify a particular document from this vast repository. The goal of this case study is to develop a deep learning based solution which can automatically classify scanned documents.
Berkeley-Spark
edX:Berkeley:Spark
Document-Image-Classification-with-Intra-Domain-Transfer-Learning-and-Stacked-Generalization-of-Deep
RVL-CDIP could be looked at as the equivalent of ImageNet for the document image community. It’s certainly the largest we’ve seen in the literature. There are 400,000 total document images in the dataset. The dataset contains much noise and variance in composition of each document class. Uncompressed, the dataset size is ~100GB, and comprises 16 classes of document types, with 25,000 samples per classes. Example classes include email, resume, and invoice. Achieved an Accuracy of over 93% which beat the benchmark score of 92% based on https://paperswithcode.com/sota/document-image-classification-on-rvl-cdip
sample-size
This python project is a helper package that uses power analysis to calculate required sample size for any experiment
Deep-Learning
Implemented the deep learning techniques using Google Tensorflow that cover deep neural networks with a fully connected network using SGD and ReLUs; Regularization with a multi-layer neural network using ReLUs, L2-regularization, and dropout, to prevent overfitting; Convolutional Neural Networks (CNNs) with learning rate decay and dropout; and Recurrent Neural Networks (RNNs) for text and sequences with Long Short-Term Memory (LSTM) networks.
coursera-causality-crash-course
A Crash Course in Causality: Inferring Causal Effects from Observational Data
HeteroArchGen4M2S
HeteroArchGen4M2S: An automatic software for configuring and running heterogeneous CPU-GPU architectures on Multi2Sim simulator. This tool is built on top of M2S simulator, it allows us to configure various heterogeneous CPU-GPU architectures (e.g., number of CPU cores, GPU cores, L1$, L2$, memory (size and latency (via CACTI 6.5)), network topologies (currently support 2D-Mesh, customized 2D-Mesh, and Torus networks)...). The output files include the results of network throughput and latency, caches/memory access time, and dynamic power of the cores (can be collected after running McPAT).
causal_inference
Coursera course : "A Crash Course in Causality: Inferring Causal Effects from Observational Data"
Causal-Inference-UPenn
Assignment codes for the coursera course "A Crash Course in Causality: Inferring Causal Effects from Observational Data" by UPenn
causality-course-coursera
A Crash Course in Causality: Inferring Causal Effects from Observational Data - Coursera
Coursera-A-Crash-Course-in-Causality
My notes and solutions to 'A Crash Course in Causality: Inferring Causal Effects from Observational Data' by Jason A. Roy from University of Pennsylvania.