xeneta / LeadQualifier

:dart: Qualify sales leads with machine learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

beat random forest with sgd elasticnet at 88 accuracy score

lampts opened this issue · comments

Thank for sharing your repo. I can beat current tfidf 5K by using sgd with elasticnet penalty and got accuracy 88%.

sgd = SGDClassifier(n_iter=500, loss='modified_huber', penalty='elasticnet')

sgd.fit(X, labels)

y_pred = sgd.predict(X_test)
print metrics.accuracy_score(y_test, y_pred)
print metrics.classification_report(y_test, y_pred)
print metrics.confusion_matrix(y_test, y_pred)

Output

0.881024096386
precision recall f1-score support

    0.0       0.88      0.82      0.85       265
    1.0       0.88      0.92      0.90       399

avg / total 0.88 0.88 0.88 664

[[216 49]
[ 30 369]]

That's awesome! :D

And great with the f1 scores. I'll need to add that to mine as well.

Want to create a pull request and add it to the script?

However, are you sure you fit the model solely on the training data?

sgd.fit(X, labels)

It could look like X represents the entire dataset?

Hi,

X is only training set, I can share my script so on by creating a pull request.

Laam

Gotcha! I'm actually adding your code to a new script which'll make it easier for others to expand upon it. Give me ten minutes, and it'll be ready so you can have a look and provide input, ok?

Voila, that's it. Thanks.