RubixML / ML

A high-level machine learning and deep learning library for the PHP language.

Home Page:https://rubixml.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training with incompatible data types

Snicser opened this issue · comments

Hi,

I am working with NB, but I get this error: Fatal error: Uncaught Rubix\ML\Exceptions\InvalidArgumentException: Naive Bayes (priors: [spam: 0.3, not spam: 0.7], smoothing: 2.5) is incompatible with continuous data types. in

And I cann't find anything about it on the internet.

This is my code:

`<?php

use Rubix\ML\Classifiers\KNearestNeighbors;
use Rubix\ML\Classifiers\NaiveBayes;
use Rubix\ML\CrossValidation\HoldOut;
use Rubix\ML\CrossValidation\Metrics\Accuracy;
use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Datasets\Unlabeled;
use Rubix\ML\Extractors\CSV;
use Rubix\ML\Tokenizers\NGram;
use Rubix\ML\Transformers\NumericStringConverter;
use Rubix\ML\Transformers\TextNormalizer;
use Rubix\ML\Transformers\TfIdfTransformer;
use Rubix\ML\Transformers\WordCountVectorizer;

require_once 'vendor/autoload.php';

$samples = [
['dit is spam'],
['www.kanker.nl'],
['ik heb een vraag over deze shit'],
];

$labels = [
'not spam',
'spam',
'not spam'
];

$dataset = new Labeled($samples, $labels);

$dataset->apply(new TextNormalizer())
    ->apply(new NumericStringConverter())

// ->apply(new TfIdfTransformer())
->apply(new WordCountVectorizer(10000, 0.01, 0.9, new NGram(1, 2)));

$importedRecords = $dataset->count();

echo 'Important: ' . $importedRecords . '
';

$estimator = new NaiveBayes([
'spam' => 0.3,
'not spam' => 0.7,
], 2.5);

[$training, $testing] = $dataset->randomize()->split(0.8);

$estimator->train($dataset);

//$trained = $estimator->trained();
//var_dump($trained);

//$predication = $estimator->predict($testing);

//$probabilities = $estimator->proba($dataset);
//var_dump($probabilities);

//$metric = new Accuracy();

//$score = $metric->score($predication, $testing->labels());

//echo 'Score: ' . $score;

echo 'Final';`

I don't understand what I am doing wrong, I am just following the DOCS

The problem is that you are trying to train a Learner that is not compatible with continuous data with continuous data (i.e. word count vectors). If you'd like to stick with the Naive Bayes family of algorithms, you can train a Gaussian Naive Bayes estimator instead since it is compatible with continuous data.

https://docs.rubixml.com/1.0/classifiers/gaussian-naive-bayes.html

You can check to see which types an Estimator is compatible with in the API reference. In addition, we provide a cheat sheet in the User Guide.

https://docs.rubixml.com/1.0/choosing-an-estimator.html