RubixML / ML

A high-level machine learning and deep learning library for the PHP language.

Home Page:https://rubixml.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TruncatedSVD() made PHP crash without any message

ahmaruff opened this issue · comments

hello, I try to create a chatbot using LSA Algorithm.
I am new to Machine learning, so I don't know if this is correct.

since I use the LSA algorithm, so I need to call TruncatedSVD()
but this script made PHP dev server crash without any message.
when I remove it from Pipeline everything is working

my machine is
Debian 12
AMD A9
RAM 4GB
PHP 8.2.7
Laravel 9

I have installed tensor extension manually using this fork/PR RubixML/Tensor#36 (since PHP 8.2 is not fully supported yet)

here is my code

<?php
namespace App;
use Illuminate\Support\Facades\Storage;
use Rubix\ML\Classifiers\KDNeighbors;
use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Graph\Trees\BallTree;
use Rubix\ML\Kernels\Distance\Cosine;
use Rubix\ML\Loggers\Screen;
use Rubix\ML\PersistentModel;
use Rubix\ML\Pipeline;
use Rubix\ML\Transformers\TextNormalizer;
use Rubix\ML\Transformers\TfIdfTransformer;
use Rubix\ML\Transformers\TruncatedSVD;
use Rubix\ML\Transformers\WordCountVectorizer;
use Rubix\ML\Persisters\Filesystem;

class Chatbot
{
    private $estimator;
    private $persister;
    private $logger;

    public function __construct() {
        $this->logger = new Screen();
        $modelPath = Storage::path('dataset/lsa.model');
        $this->persister = new Filesystem($modelPath);
        if(is_file($modelPath)){
            $this->estimator = $this->persister->load();
            $this->logger->info('model loaded');
        } else {
            $this->train();
        }
    }
    function train() {
        $start = microtime(true);
        $this->logger->info('start load dataset');
        $samples =[
            ['Cara mengganti password', 'Bagaimana cara mengganti kata sandi wifi', 'cara ganti sandi'],
            ['cara ganti nama wifi', 'bagaimana cara mengganti nama wifi', 'gimana caranya ganti SSID'],
            ['cara bayar tagihan', 'cara membayar wifi', 'nomor rekening pembayaran'],
            ['wifi tidak stabil', 'wifi kadang terkoneksi kadang tidak', 'internet bermasalah'],
            ['indikator warna merah hidup', 'indikator los hidup', 'ada lampu warna merah menyala'],
            ['internet lemot', 'jaringan lambat', 'cara upgrade paket']
        ];

        $labels = [
            "Untuk mengganti password, atau kata sandi caranya adalah dengan masuk Google Chrome lalu Ketik 192.168.0.1 untuk Username sama password untuk login adalah user setelah masuk Klik menu Network, Klik bagian Wlan, Klik bagian Security lalu Setelah diganti klik submit",
            "Untuk mengganti nama WiFi caranya adalah dengan masuk Google Chrome lalu Ketik 192.168.0.1 untuk Username sama password untuk login adalah user setelah masuk Klik menu Network Klik bagian Wlan Klik bagian SSID Setting lalu Setelah diganti klik submit",
            "untuk pembayaran wi-fi bisa melalui transfer dengan no rekening MANDIRI : 185000416082",
            "jika internet wifi lambat atau lemot, silakan coba dikurangi jumlah penggunanya agar pemakaianya stabil dan lancar, atau upgrade pake wifi",
            "apabila indikator LOS berwarna merah menyala, kemungkinan terdapat permasalahan pada device atau terdapat kabel yang terputus sehingga diperlukan pengecekan langsung di lokasi Silakan hubungi admin melalui layanan Pengaduan Masalah.",
            "jika internet wifi lambat atau lemot, silakan coba dikurangi jumlah penggunanya agar pemakaianya stabil dan lancar, atau upgrade pake wifi",
        ];

        $dataset = new Labeled($samples, $labels);
        $nearest = 5;

        $this->estimator = new PersistentModel(
            new Pipeline([
                new TextNormalizer(),
                new WordCountVectorizer(1000,1, 0.8),
                new TfIdfTransformer(),
                new TruncatedSVD(10), //crash. but when I remove this, everything is working.
            ], new KDNeighbors($nearest,true,new BallTree($nearest, new Cosine()))),
            $this->persister
        );

        $this->estimator->train($dataset);

        $this->estimator->save();

        dump((memory_get_peak_usage()/1024/1024)." MB\n");
        $time = microtime(true) - $start;
        dump("Time: ".$time." s\n");
    }

}

Hi @ahmaruff I'm pretty sure there's a problem with the call to SVD using the Tensor extension. I remember it not working a while ago. Can you post this on the Tensor repo as well? Hopefully someone will see it and fix it or if not I can maybe get around to it when I have some free time.

thanks for the suggestion @andrewdalpino , here it is RubixML/Tensor#38

I believe we have a fix for this now ... please try compiling the Tensor extension from the latest master branch. Or wait for us to test and release Tensor 3.0.5 and download from PECL.

RubixML/Tensor#41