sympy / sympy

A computer algebra system written in pure Python

Home Page:https://sympy.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add a factors function to get factors without multiplicity

oscarbenjamin opened this issue · comments

This is a trivial function in terms of existing functions but I often find myself wanting this when using SymPy interactively or otherwise:

In [1]: def factors(*args, **kwargs):
   ...:     return [f for f, m in factor_list(*args, **kwargs)[1]]
   ...: 

In [2]: factors((x - 1)**2*(x-2))
Out[2]: [x - 2, x - 1]

In [7]: factors(x*y - y, x)
Out[7]: [x - 1]

In [8]: factors(x*y - y, y)
Out[8]: [y]

Although this is trivial it is awkward because there is no way to write the list comprehension inline.

What do people think about adding this?

If added then it should be a method on at least Poly and PolyElement as well if not something that also appears throughout the dup_* etc stack.

Internally factorisation routines actually compute the equivalent of factors first before computing the multiplicity afterwards:

def dup_trial_division(f, factors, K):
"""
Determine multiplicities of factors for a univariate polynomial
using trial division.

I think that there can be some discussions for styles of outputs.
I had many times discussed myself about the whether it is better to use set or list for such things,
and it is very hard to pick which one would be better for this.

The mathematical definitions about factors are more suggestive to use set.
However, there are some reason to use list instead of set,
even if its definition intrinsically makes people worry about duplicate elements.
Because some algorithms may not append duplicate elements of the lists at all,
and it can be more efficient to use list instead of set for such narrow cases,
however, that is not the case I'm very sure about.
And I'm anyway not very sure whether old developers of SymPy had taken such factors into discussion at all.

There were similar old discussions of replacing the return type of factor (or even prime factorization) to use dict or Counter.
I don't think that they were adopted due to some backward compatibility issues,
however, if we had been using dict or Counter instead of list at the first place,
we wouldn't anyway face something like this issue.

Because it is possible to get the set of keys of dictionary using keys, and something like factor_dict(...).keys() would be more concise.

Both set and dict are awkward for interactive use because you can't just index into them. I very often find myself doing e.g. list(expr.free_symbols)[0] for this reason. As data structures set and dict are useful if you are going to use them for hash table lookups or intersections etc. If you just want a container that holds the objects so you can access them then it is generally easier to use list or tuple.

Note that it is actually easy to convert the output of factor_list to a dict in an inline expression:

In [3]: d = dict(factor_list(x**2 - 1)[1])

In [4]: d
Out[4]: {x - 1: 1, x + 1: 1}

But now you need to convert to a list again to get the actual factors:

In [5]: list(d)[0]
Out[5]: x - 1

I suppose that is another way to get the equivalent of factors:

In [7]: list(dict(factor_list(x**2 - 1)[1]))
Out[7]: [x - 1, x + 1]

Both set and dict are awkward for interactive use because you can't just index into them.

I thought that it was rarely the cases where you need to index the factors.
It may be interesting to see the use cases of this anyway because it is unusual to hear from me that I have almost never encountered strong need of this.

I think that the only need for this is that when there are only one factors, or if you want to pick factors arbitrary, however, even for such cases, pop() would be sufficient.

And also, dict had been much more convenient to use because it had changed to keep determimistic insertion order from Python 2 to 3, so that could be the factor why I could use dict more widely at my work experience, where it previously had used list for some trivial reasons.
I still agree for the nondeterministism issues about using set, though.

Here is a simple example:

In [45]: gb = [x**2 - y, (x-3)**2*(y-4)]

In [46]: f = factors(gb[-1])

In [47]: groebner([*gb, f[0]])
Out[47]: 
             ⎛⎡ 2           ⎤                           ⎞
GroebnerBasis⎝⎣x  - 4, y - 4⎦, x, y, domain=, order=lexIn [48]: groebner([*gb, f[1]])
Out[48]: GroebnerBasis([x - 3, y - 9], x, y, domain=, order=lex)

The return type of factor_list is just a bit awkward if you actually want to get the expressions for the factors and do something with them:

In [49]: f = factor_list(gb[-1])

In [50]: groebner([*gb, f[1][0][0]])
Out[50]: 
             ⎛⎡ 2           ⎤                           ⎞
GroebnerBasis⎝⎣x  - 4, y - 4⎦, x, y, domain=, order=lexIn [51]: groebner([*gb, f[1][1][0]])
Out[51]: GroebnerBasis([x - 3, y - 9], x, y, domain=, order=lex)

Also:

In [53]: f1, f2 = factors(gb[-1])

In [54]: [(f1, _), (f2, _)] = factor_list(gb[-1])[1]

In [55]: f1, f2 = dict(factor_list(gb[-1])[1])

Other things like roots, free_symbols etc are awkward as well. If you actually want to do something with the expressions for the roots or with the symbols then it is better to have a list rather than dict/set.

Having solve return dicts is different because you can use those with subs which is typically what you want to do with the output of solve. It actually makes sense that solve returns a map and when you use solve you already have the keys because they are the symbols like x and y. The difference with roots is that the the expressions that you want are the keys and dicts make it awkward to get the keys.

There is a distinction between interactive use and programmatic use here. For programmatic use it is not a big deal to loop over the output of factor_list if that is what you need to do:

for f, _ in factor_list(expr)[1]:

In an interactive session I don't often write loops though unless it is a one liner like:

In [52]: for p in gb: print(p)
x**2 - y
6*x*y - 24*x - y**2 - 5*y + 36
y**3 - 22*y**2 + 153*y - 324

I may understand some need to use list anyway, and I don't see very big issues with it, because converting list to set is very trivial, however, I notice some other ambiguity for the specifications of factor_list.

For example, the reason that you often use much longer expression like factor_list(...)[1]... is that you would like to skip on
factor_list(...)[0] (which are often 1) and you would only need factor_list(...)[1].

However, I'm not clear if factor_list output should contain the factor_list(...)[0] or not.
For example, if your expectation x**2 + 2*x + 1 should give [x + 1], and there is no ambiguity for it, then would you expect the input 2*x**2 + 4*x + 2 to give [x + 1] or [2, x + 1]?

would you expect the input 2*x**2 + 4*x + 2 to give [x + 1] or [2, x + 1]?

I would expect it to give [x + 1].

The factors represent the roots of the polynomial. In the univariate case with rational coefficients each factor is the minimal polynomial for a number of conjugate roots. Any numeric multiple of the polynomial has the same roots so we don't care about those. If we view the polynomial as representing an equation to be solved then the output of factors represents the same solutions.

there is no way to write the list comprehension inline

I am not sure what you mean by this since you can write [f for f,m in factor_list(*args, **kwargs)[1] if sym in f.free_symbols] inline. But I agree that this seems like a nice "battery" to include. It's kind of like divisors in terms of convenience.