Training with incompatible data types
Snicser opened this issue · comments
Hi,
I am working with NB, but I get this error: Fatal error: Uncaught Rubix\ML\Exceptions\InvalidArgumentException: Naive Bayes (priors: [spam: 0.3, not spam: 0.7], smoothing: 2.5) is incompatible with continuous data types. in
And I cann't find anything about it on the internet.
This is my code:
`<?php
use Rubix\ML\Classifiers\KNearestNeighbors;
use Rubix\ML\Classifiers\NaiveBayes;
use Rubix\ML\CrossValidation\HoldOut;
use Rubix\ML\CrossValidation\Metrics\Accuracy;
use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Datasets\Unlabeled;
use Rubix\ML\Extractors\CSV;
use Rubix\ML\Tokenizers\NGram;
use Rubix\ML\Transformers\NumericStringConverter;
use Rubix\ML\Transformers\TextNormalizer;
use Rubix\ML\Transformers\TfIdfTransformer;
use Rubix\ML\Transformers\WordCountVectorizer;
require_once 'vendor/autoload.php';
$samples = [
['dit is spam'],
['www.kanker.nl'],
['ik heb een vraag over deze shit'],
];
$labels = [
'not spam',
'spam',
'not spam'
];
$dataset = new Labeled($samples, $labels);
$dataset->apply(new TextNormalizer())
->apply(new NumericStringConverter())
// ->apply(new TfIdfTransformer())
->apply(new WordCountVectorizer(10000, 0.01, 0.9, new NGram(1, 2)));
$importedRecords = $dataset->count();
echo 'Important: ' . $importedRecords . '
';
$estimator = new NaiveBayes([
'spam' => 0.3,
'not spam' => 0.7,
], 2.5);
[$training, $testing] = $dataset->randomize()->split(0.8);
$estimator->train($dataset);
//$trained = $estimator->trained();
//var_dump($trained);
//$predication = $estimator->predict($testing);
//$probabilities = $estimator->proba($dataset);
//var_dump($probabilities);
//$metric = new Accuracy();
//$score = $metric->score($predication, $testing->labels());
//echo 'Score: ' . $score;
echo 'Final';`
I don't understand what I am doing wrong, I am just following the DOCS
The problem is that you are trying to train a Learner that is not compatible with continuous data with continuous data (i.e. word count vectors). If you'd like to stick with the Naive Bayes family of algorithms, you can train a Gaussian Naive Bayes estimator instead since it is compatible with continuous data.
https://docs.rubixml.com/1.0/classifiers/gaussian-naive-bayes.html
You can check to see which types an Estimator is compatible with in the API reference. In addition, we provide a cheat sheet in the User Guide.