sajari / regression

Multivariable regression library in Go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

No change in predictions when new data is added for training.

pradykaushik opened this issue · comments

If we use the same regression object to add additional training data (using Train(...)) and then call Run(), there seems to be no change in the predictions made.

On further inspection, it seems that Run() can be called only once. Is this expected behavior? So, if I'm constantly acquiring new data and want to increase my training data set and retrain the model, I would have to create a new object of regression.Regression and pass the union of the old and the new data?

Below is example code to illustrate the same.

        r = new(regression.Regression)
        r.SetObserved("Z")
	r.SetVar(0, "X")
	r.SetVar(1, "Y")
	r.Train(
		regression.DataPoint(12.59, []float64{3, 0.25}),
		regression.DataPoint(17.54, []float64{1, 0.40}),
		regression.DataPoint(24.14, []float64{1, 0.268}),
		regression.DataPoint(21.47, []float64{2, 0.35}),
	)
	r.Run()

	fmt.Printf("Regression formula:\n%v\n", r.Formula)

	prediction, _ := r.Predict([]float64{2, 0.30})
	fmt.Println("Prediction = " + strconv.FormatFloat(prediction, 'f', 3, 64))

	fmt.Println("adding new data points...")
	r.Train(
		regression.DataPoint(15.65, []float64{3, 0.45}),
		regression.DataPoint(13.35, []float64{2, 0.65}),
	)
	r.Run() // attempt to retrain using also the newly added data.

	fmt.Printf("Regression formula:\n%v\n", r.Formula)

	prediction, e := r.Predict([]float64{2, 0.30})
	fmt.Println("Prediction_new = " + strconv.FormatFloat(prediction, 'f', 3, 64))

The output for the above code is as shown below. As we can see, the formula and the prediction hasn't changed.

Regression formula:
Predicted = 33.43 + X*-4.47 + Y*-21.06
Prediction = 18.176
adding new data points...
Regression formula:
Predicted = 33.43 + X*-4.47 + Y*-21.06
Prediction_new = 18.176

Yes correct. You’re not handling the error from the r.Run() function, which tells you this exactly.

The code in the Run function is pretty straight forward. It could probably be done differently, but doesn’t really impact the usage. You can retain an iterative dataPoints object and periodically train a new regression from it.