hernanrazo / UCI-iris-classification

A python script that classifies iris flower species based on their various dimensions.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

UCI Iris Classification

Description

A python script that predicts plant species based on sepal and petal lengths. The species used in this dataset are iris-setosa, iris-versicolor, iris-virginica. This example is part of the University of California - Irvine Machine Learning Repository.

Libraries used in this example include pandas, seaborn, matplotlib, and scikit-learn. The algorithm used is the k-nearest neighbors algorithm.

Analysis

First, we make box and whisker plots to see the range of values for petal and sepal dimensions.

petalLengthBW

petalWidthBW

sepalLengthBW

sepalWidthBW

Next, plot histograms of the same data.

petalLengthHist

petalWidthHist

sepalLengthHist

sepalWidthHist

These plots give us a good visual for the data. Now use a violin plot to condense it all into two graphs. One violin plot will show petal length and another will show sepal length.

petalLengthViolin

sepalLengthViolin

Now, since we were only given one dataset, we have to split it into a training section and testing section. Most of the data will be in the training dataset.

train, test = train_test_split(df, test_size = 0.3)

#take data features and output for training and testing
train_x = train[['sepal-length', 'sepal-width', 'petal-length', 'petal-width']]
train_y = train['species']

test_x = train[['sepal-length', 'sepal-width', 'petal-length', 'petal-width']]
test_y = train['species']

This example uses the K-nearest Neighbors algorithm so use the following script to train and fit the model:

model = KNeighborsClassifier(n_neighbors = 3)
model.fit(train_x, train_y)
prediction = model.predict(test_x)
print(metrics.accuracy_score(prediction, test_y))
print(' ')

This returns pretty good results but what would happen if we seperated petal and sepal lengths? To do this, again split the data into a training section and a testing section. The only difference this time is that you will to do it for both petal and sepal lengths.

#split the dataset
petal = df[['petal-length', 'petal-width', 'species']]
sepal = df[['sepal-length', 'sepal-width', 'species']]

#split the data into a training and testing section again

#petals
train_petal, test_petal = train_test_split(petal, test_size = 0.3, random_state = 0)
train_petal_x = train_petal[['petal-length', 'petal-width']]
train_petal_y = train_petal['species']

test_petal_x = test_petal[['petal-length', 'petal-width']]
test_petal_y = test_petal['species']

#sepals
train_sepal, test_sepal = train_test_split(sepal, test_size = 0.3, random_state = 0)
train_sepal_x = train_sepal[['sepal-length', 'sepal-width']]
train_sepal_y = train_sepal['species']

test_sepal_x = test_sepal[['sepal-length', 'sepal-width']]
test_sepal_y = test_sepal['species']

Retrain the model for this new scenario:

print('New training session:')
#petals
model = KNeighborsClassifier(n_neighbors = 3)
model.fit(train_petal_x, train_petal_y)
prediction = model.predict(test_petal_x)
print('Petal prediction: ')
print(metrics.accuracy_score(prediction, test_petal_y))
print(' ')

#sepals
model = KNeighborsClassifier(n_neighbors = 3)
model.fit(train_sepal_x, train_sepal_y)
prediction = model.predict(test_sepal_x)
print('Sepal prediction: ')
print(metrics.accuracy_score(prediction, test_sepal_y))

It can be seen that restricting only to petal length gives a better prediction than sepal length or both.

Acknowledgements

This project was made with guidance from various Kaggle kernels and other tutorials. These include this tutorial on machinelearningmastery.com and this IPython Notebook by I,Coder.

Sources and Helpful Links

https://archive.ics.uci.edu/ml/datasets/iris
https://www.kaggle.com/adityabhat24/iris-data-analysis-and-machine-learning-python
https://www.kaggle.com/uciml/iris/home
https://www.kaggle.com/ash316/ml-from-scratch-with-iris

About

A python script that classifies iris flower species based on their various dimensions.


Languages

Language:Python 100.0%