Add a factors function to get factors without multiplicity
oscarbenjamin opened this issue · comments
This is a trivial function in terms of existing functions but I often find myself wanting this when using SymPy interactively or otherwise:
In [1]: def factors(*args, **kwargs):
...: return [f for f, m in factor_list(*args, **kwargs)[1]]
...:
In [2]: factors((x - 1)**2*(x-2))
Out[2]: [x - 2, x - 1]
In [7]: factors(x*y - y, x)
Out[7]: [x - 1]
In [8]: factors(x*y - y, y)
Out[8]: [y]
Although this is trivial it is awkward because there is no way to write the list comprehension inline.
What do people think about adding this?
If added then it should be a method on at least Poly
and PolyElement
as well if not something that also appears throughout the dup_*
etc stack.
Internally factorisation routines actually compute the equivalent of factors
first before computing the multiplicity afterwards:
sympy/sympy/polys/factortools.py
Lines 88 to 91 in 8255cf2
I think that there can be some discussions for styles of outputs.
I had many times discussed myself about the whether it is better to use set
or list
for such things,
and it is very hard to pick which one would be better for this.
The mathematical definitions about factors
are more suggestive to use set
.
However, there are some reason to use list
instead of set
,
even if its definition intrinsically makes people worry about duplicate elements.
Because some algorithms may not append
duplicate elements of the lists at all,
and it can be more efficient to use list
instead of set
for such narrow cases,
however, that is not the case I'm very sure about.
And I'm anyway not very sure whether old developers of SymPy had taken such factors into discussion at all.
There were similar old discussions of replacing the return type of factor
(or even prime factorization) to use dict
or Counter
.
I don't think that they were adopted due to some backward compatibility issues,
however, if we had been using dict
or Counter
instead of list
at the first place,
we wouldn't anyway face something like this issue.
Because it is possible to get the set of keys of dictionary using keys
, and something like factor_dict(...).keys()
would be more concise.
Both set and dict are awkward for interactive use because you can't just index into them. I very often find myself doing e.g. list(expr.free_symbols)[0]
for this reason. As data structures set and dict are useful if you are going to use them for hash table lookups or intersections etc. If you just want a container that holds the objects so you can access them then it is generally easier to use list or tuple.
Note that it is actually easy to convert the output of factor_list
to a dict in an inline expression:
In [3]: d = dict(factor_list(x**2 - 1)[1])
In [4]: d
Out[4]: {x - 1: 1, x + 1: 1}
But now you need to convert to a list again to get the actual factors:
In [5]: list(d)[0]
Out[5]: x - 1
I suppose that is another way to get the equivalent of factors
:
In [7]: list(dict(factor_list(x**2 - 1)[1]))
Out[7]: [x - 1, x + 1]
Both set and dict are awkward for interactive use because you can't just index into them.
I thought that it was rarely the cases where you need to index the factors.
It may be interesting to see the use cases of this anyway because it is unusual to hear from me that I have almost never encountered strong need of this.
I think that the only need for this is that when there are only one factors, or if you want to pick factors arbitrary, however, even for such cases, pop()
would be sufficient.
And also, dict
had been much more convenient to use because it had changed to keep determimistic insertion order from Python 2 to 3, so that could be the factor why I could use dict
more widely at my work experience, where it previously had used list for some trivial reasons.
I still agree for the nondeterministism issues about using set
, though.
Here is a simple example:
In [45]: gb = [x**2 - y, (x-3)**2*(y-4)]
In [46]: f = factors(gb[-1])
In [47]: groebner([*gb, f[0]])
Out[47]:
⎛⎡ 2 ⎤ ⎞
GroebnerBasis⎝⎣x - 4, y - 4⎦, x, y, domain=ℤ, order=lex⎠
In [48]: groebner([*gb, f[1]])
Out[48]: GroebnerBasis([x - 3, y - 9], x, y, domain=ℤ, order=lex)
The return type of factor_list
is just a bit awkward if you actually want to get the expressions for the factors and do something with them:
In [49]: f = factor_list(gb[-1])
In [50]: groebner([*gb, f[1][0][0]])
Out[50]:
⎛⎡ 2 ⎤ ⎞
GroebnerBasis⎝⎣x - 4, y - 4⎦, x, y, domain=ℤ, order=lex⎠
In [51]: groebner([*gb, f[1][1][0]])
Out[51]: GroebnerBasis([x - 3, y - 9], x, y, domain=ℤ, order=lex)
Also:
In [53]: f1, f2 = factors(gb[-1])
In [54]: [(f1, _), (f2, _)] = factor_list(gb[-1])[1]
In [55]: f1, f2 = dict(factor_list(gb[-1])[1])
Other things like roots
, free_symbols
etc are awkward as well. If you actually want to do something with the expressions for the roots or with the symbols then it is better to have a list rather than dict/set.
Having solve
return dicts is different because you can use those with subs
which is typically what you want to do with the output of solve. It actually makes sense that solve
returns a map and when you use solve
you already have the keys because they are the symbols like x
and y
. The difference with roots
is that the the expressions that you want are the keys and dicts make it awkward to get the keys.
There is a distinction between interactive use and programmatic use here. For programmatic use it is not a big deal to loop over the output of factor_list
if that is what you need to do:
for f, _ in factor_list(expr)[1]:
In an interactive session I don't often write loops though unless it is a one liner like:
In [52]: for p in gb: print(p)
x**2 - y
6*x*y - 24*x - y**2 - 5*y + 36
y**3 - 22*y**2 + 153*y - 324
I may understand some need to use list anyway, and I don't see very big issues with it, because converting list to set is very trivial, however, I notice some other ambiguity for the specifications of factor_list
.
For example, the reason that you often use much longer expression like factor_list(...)[1]...
is that you would like to skip on
factor_list(...)[0]
(which are often 1) and you would only need factor_list(...)[1]
.
However, I'm not clear if factor_list
output should contain the factor_list(...)[0]
or not.
For example, if your expectation x**2 + 2*x + 1
should give [x + 1]
, and there is no ambiguity for it, then would you expect the input 2*x**2 + 4*x + 2
to give [x + 1]
or [2, x + 1]
?
would you expect the input
2*x**2 + 4*x + 2
to give[x + 1]
or[2, x + 1]
?
I would expect it to give [x + 1]
.
The factors represent the roots of the polynomial. In the univariate case with rational coefficients each factor is the minimal polynomial for a number of conjugate roots. Any numeric multiple of the polynomial has the same roots so we don't care about those. If we view the polynomial as representing an equation to be solved then the output of factors
represents the same solutions.
there is no way to write the list comprehension inline
I am not sure what you mean by this since you can write [f for f,m in factor_list(*args, **kwargs)[1] if sym in f.free_symbols]
inline. But I agree that this seems like a nice "battery" to include. It's kind of like divisors
in terms of convenience.