Anthony Doan's repositories
timeseries_meetup
Using the fpp2 book and library examples.
thesis_cerp_guide_forest
Rewrite code that is data agnostic for building cerp guide forest.
10_CERP_GUIDE_find_optimal_number_of_ensemble_prune_default_setting
Second rewrite and 5th simulations.
Financial-Statements-Text-Analysis
Documentation and code for downloading, cleaning, munging, and analyzing financial statements filed by publicly traded companies with the SEC
flexible_imputation_of_missing_data_Burren
Codes for the book Flexible Imputation of Missing Data by Stef van Buuren
master_thesis
Random Forest is one of the widely used tree-based ensemble classification algorithms. Many aspects of building tree ensembles are introduced to reduce correlation among decision trees within the forest. Bootstrap is used in Random Forest to reduce bias decision tree and to decide splits in every decision tree. Classification by Ensembles from Random Partitions (CERP) is a different algorithm to create an ensemble. CERP randomly partitions the data instead of using bootstrap and creates multiple ensembles instead of one. A forest consists of several decision trees, an ensemble of trees. While Random Forest builds a forest, CERP builds an ensemble of forests. A base classifier in Random Forest uses an exhaustive search to find a split. On the other hand, the Generalized, Unbiased, Interaction Detection and Estimation (GUIDE) algorithm uses statistical hypothesis testing, which is faster than exhaustively search algorithms and is able to detect interaction using a statistical method. This thesis investigated tree-based ensemble classification algorithms that include the CERP, GUIDE, and Random Forest for genetic data.
riff_admin
BaubleBox's Internal Admin Tool
thesis_prostate_cancer_CERP_GUIDE_pruned_vs_no_prune
Code to create graph about pruned and unpruned data. Check out the pdf. Optimal partition number for pruned CERP-GUIDE forest is 825. For optimal number of partitions for unpruned CERP-GUIDE forest is 833.
tulipindicators
Technical Analysis Indicator Function Library in C