Kwarahi's repositories



The regression task to predict fiber volume fraction over normalized wall thickness in the cross-section of bamboos



This case serves as an illustration how data science can help analytical chemistry, in-field analysis and ecology. An additional point to be stressed is the reality of the subject case. The best practice for data scientists always consists in facing difficulties present in real cases – data cleaning, preparation, analysis of the data logic, strategy of the exploratory analysis and modeling. To be expert in a domain (area of knowledge, professional background) essentially facilitates and enhances the data interpretation. The present case is taken from the open database: The analysis and modeling were conducted using JSL (JMP Scripting Language, SAS)



The project was prepared and submitted within the Brazilian "Bootcamp Data Science na prática" by Neuron (



The project was prepared and submitted by C. Pinheiro, E. Carvalho, M. Nazarkovsky, G. Piovesan, I. Barros within the Brazlian "Bootcamp Data Science na prática" by Neuron (



On the basis of the article “Rhodium nanoparticles impregnated on TiO2: strong morphological effects on hydrogen production” ( authored by Brunno L. Albuquerque, Gustavo Chacón, Michael Nazarkovsky and Jairton Dupont, data analysis of the size distribution profiles for all three subject samples (NP, NC and Oh) was performed (outliers analysis, distributions classification, analysis of variances, discriminant analysis). The results allowed us to distinguish each type of the samples by their size distribution profiles and make predictive modeling for the algorithms to classify them. To this end, machine learning approaches, such as Naive Bayes, Logistic Regression, K-Nearest Neighbors and Decision Tree were tested, validated and compared by their effectiveness to predict the samples. The most effective model has turned out to be Logistic Regression, whose misclassification rate at the validation stage of the model is less than 13% (12.81%) at the minimal mean -log(p) = 0.2509. As a result, an offline calculator to predict the samples type was developed and the prediction formula was provided.
