read_data() data validation

Question

read_data() data validation

aryavish opened this issue 2 years ago · comments

When using the read_data() method for reading in data files, how do we ensure the data files are being read in correctly such that if a parameter contains 1000 values then the parameter object in python also contains 1000 values? Is there any way to check the data integrity for Data file read ins? I recently discovered that if certain sets are ordered then the read_data() fails to read in the parameter values if the data file for the parameter values is not ordered. I verified this by read_data(data.file).get_values().to_pandas().shape which shows an empty pandas dataframe even though the parameter data file has > 1000 values.

Filipe Brandao · Answer 1 · Thu Apr 07 2022 01:33:25 GMT+0800 (China Standard Time)

From what you mention, I believe you have something like:

from amplpy import AMPL
ampl = AMPL()
ampl.eval("set I ordered; param P{I};")
ampl.eval("data;param P := 1 1 2 2 3 3;")
print(ampl.get_parameter("P").get_values().to_pandas())

This results in an empty pandas dataframe because the data failed to load silently. We are looking into this and it should be fixed in the next release.

To see what is wrong with the data you can do the following:

from amplpy import AMPL

ampl = AMPL()
ampl.eval("set I ordered; param P{I};")
ampl.eval("data;param P := 1 1 2 2 3 3;")
ampl.eval("display P;") # since there is lazy evaluation, the data will only be loaded into P at this point

This will result in the following:

amplpy.exceptions.AMPLException: Error executing "display" command:
error processing param P[...]:
	no data for set I

The problem is that the data for the indexing set of parameter needs to be set before:

You can do that with:

ampl = AMPL()
ampl.eval("set I ordered; param P{I};")
ampl.eval("data; param: I: P := 1 1 2 2 3 3;")
ampl.eval("display P;")

or:

ampl = AMPL()
ampl.eval("set I ordered; param P{I};")
ampl.eval("data; set I := 1 2 3; param P := 1 1 2 2 3 3;")
ampl.eval("display P;")

Vishal A. · Answer 2 · Thu Apr 07 2022 03:59:14 GMT+0800 (China Standard Time)

It's similar, what I'm trying to validate, using your example above is as follows:

from amplpy import AMPL

ampl = AMPL()
ampl.eval("set I ordered; param P{I};")
ampl.read_data(p.data)

where p.data is the data file for the Parameter variable P.

dfp = ampl.get_parameter('P').get_values().to_pandas()

dfp should contain the parameter values of P in a pandas dataframe but is sometimes empty due to the set ordering issue as described above. My question is how we validate to ensure dfp contains all values contained in the p.data file that is read in

Filipe Brandao · Answer 3 · Fri Apr 08 2022 00:21:14 GMT+0800 (China Standard Time)

We have just released amplpy v0.8.2b0 with the bugfix for the missing exception when some operation like Entity.get_values fails to retrieve values due to any issues with the data.

You can install it with python -m pip install "amplpy>=0.8.2b0" --upgrade --pre.

Since there is lazy evaluation in AMPL, the data is only validated in the moment where it is needed for the first time. In the previous version, if the first time the data was needed was for Entity.get_values it would fail silently due to a bug. If, for instance, the moment the data is needed is when solving you would get an error like the following:

>>> ampl = AMPL()
>>> ampl.eval("set I ordered; param p{I};")
>>> ampl.eval("data; set I := 1 2; param p := 2 1 1 3 4 5;")
>>> ampl.eval("maximize obj: sum{i in I} i * p[i];")
>>> ampl.solve()
Error:
	Error executing "solve" command:
	error processing param p:
		invalid subscript p[4] discarded.
No variables declared.

>>> ampl = AMPL()
>>> ampl.eval("set I ordered; param p{I};")
>>> ampl.eval("data; set I := 9 1 2; param p := 2 1 1 3 9 5;")
>>> ampl.param["p"].get_values().to_pandas()
       p
9.0  5.0
1.0  3.0
2.0  1.0
>>> ampl = AMPL()
>>> ampl.eval("set I ordered; param p{I};")
>>> ampl.eval("data; set I := 9 1 2; param p := 2 1 1 3 9 5 10 2;")
>>> ampl.param["p"].get_values().to_pandas()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/fdabrandao/github/amplpy/venv/lib/python3.9/site-packages/amplpy/entity.py", line 179, in get_values
    return DataFrame._from_data_frame_ref(self._impl.getValues())
RuntimeError: Error executing "display" command:
error processing param p:
	invalid subscript p[10] discarded.

Note that however, as in the example above, having the data for the parameter in a different order than the order in the indexing set should not cause any issues. To get a failure I had to specify data for an invalid subscript.

If you want to force all the data to be validated, you can do the following right after loading the data:

for s_name, s in ampl.get_sets():
    print(s_name, len(s.get_values().to_list()))

for p_name, p in ampl.get_parameters():
    print(p_name, len(p.get_values().to_list()))