My favorites | Sign in
Logo
                
New issue | Search
for
| Advanced search | Search tips
Issue 1598: New polynomials manipulation module
5 people starred this issue and may be notified of changes. Back to list
Status:  Started
Owner:  mattpap
Type-Enhancement
Priority-High
NeedsReview
Milestone-Release0.7.0


Sign in to add a comment

Blocking:
issue 1647
 
Reported by mattpap, Aug 13, 2009
So, after 5 weeks of coding new module is done. Well, almost done because
many tests are missing and some functionalities may still be quite buggy.
However, after minor fixes, SymPy can work with new module very well, so
lets dig into new code and share your opinions.

Sixteen patches are available from:

 http://github.com/mattpap/sympy-polys/commits/polys

Please read commit message of 7f4e5466414302c3106b2ac9e2224eac2f4d75d2 to
get brief idea what was done. Some parts of new code I already showed at
EuroSciPy conference but now all this is really working. Feel free to
experiment.

Is this the end? ... no, sorry, it smells like the beginning of work on
really functional module. Plans for near future? Besides testing and
documentation writing, proper support for algebraic number fields
(isomorphism, inclusion, splitting fields and factorization).

Is this the right approach to implementing polynomials in SymPy? Don't know
but it seems to work, it's fast and does not require too much algebra
knowledge to work with (at least as a user). With some minor fixes on new
polynomials module side (or Cython side -> nested functions) it could be
Cythonized to obtain even more computational power.
Comment 1 by asmeurer, Aug 13, 2009
Wow!  This is great.  Simplification is much better with this.  For example, consider the complex expression 
from issue 1562.  In your branch:
>>> a = wronskian([sin(x), cos(x), sin(x)*x, cos(x)*x, 1], x)
>>> print a
8*cos(x)**2*sin(x)**2 + 4*cos(x)**4 + 4*sin(x)**4
>>> a = factor(a)
>>> print a
4*(cos(x)**2 + sin(x)**2)**2
>>> trigsimp(a)
4

The original wronskian is simpler (I am assuming because of better intermediate simplification), and the 
expression can be factored!  I tried running a more complex expression through simplify, but it hangs.

>>> simplify(-576*cos(x)**26*sin(x)**30/(32*cos(x)**18*sin(x)**30 + 224*cos(x)**16*sin(x)**32 + 
672*cos(x)**14*sin(x)**34 + 1120*cos(x)**12*sin(x)**36 + 1120*cos(x)**10*sin(x)**38 + 
672*cos(x)**8*sin(x)**40 + 224*cos(x)**6*sin(x)**42 + 32*cos(x)**4*sin(x)**44) - 
6336*cos(x)**24*sin(x)**32/(32*cos(x)**18*sin(x)**30 + 224*cos(x)**16*sin(x)**32 + 
672*cos(x)**14*sin(x)**34 + 1120*cos(x)**12*sin(x)**36 + 1120*cos(x)**10*sin(x)**38 + 
672*cos(x)**8*sin(x)**40 + 224*cos(x)**6*sin(x)**42 + 32*cos(x)**4*sin(x)**44) - 
31680*cos(x)**22*sin(x)**34/(32*cos(x)**18*sin(x)**30 + 224*cos(x)**16*sin(x)**32 + 
672*cos(x)**14*sin(x)**34 + 1120*cos(x)**12*sin(x)**36 + 1120*cos(x)**10*sin(x)**38 + 
672*cos(x)**8*sin(x)**40 + 224*cos(x)**6*sin(x)**42 + 32*cos(x)**4*sin(x)**44) - 
95040*cos(x)**20*sin(x)**36/(32*cos(x)**18*sin(x)**30 + 224*cos(x)**16*sin(x)**32 + 
672*cos(x)**14*sin(x)**34 + 1120*cos(x)**12*sin(x)**36 + 1120*cos(x)**10*sin(x)**38 + 
672*cos(x)**8*sin(x)**40 + 224*cos(x)**6*sin(x)**42 + 32*cos(x)**4*sin(x)**44) - 
190080*cos(x)**18*sin(x)**38/(32*cos(x)**18*sin(x)**30 + 224*cos(x)**16*sin(x)**32 + 
672*cos(x)**14*sin(x)**34 + 1120*cos(x)**12*sin(x)**36 + 1120*cos(x)**10*sin(x)**38 + 
672*cos(x)**8*sin(x)**40 + 224*cos(x)**6*sin(x)**42 + 32*cos(x)**4*sin(x)**44) - 
266112*cos(x)**16*sin(x)**40/(32*cos(x)**18*sin(x)**30 + 224*cos(x)**16*sin(x)**32 + 
672*cos(x)**14*sin(x)**34 + 1120*cos(x)**12*sin(x)**36 + 1120*cos(x)**10*sin(x)**38 + 
672*cos(x)**8*sin(x)**40 + 224*cos(x)**6*sin(x)**42 + 32*cos(x)**4*sin(x)**44) - 
266112*cos(x)**14*sin(x)**42/(32*cos(x)**18*sin(x)**30 + 224*cos(x)**16*sin(x)**32 + 
672*cos(x)**14*sin(x)**34 + 1120*cos(x)**12*sin(x)**36 + 1120*cos(x)**10*sin(x)**38 + 
672*cos(x)**8*sin(x)**40 + 224*cos(x)**6*sin(x)**42 + 32*cos(x)**4*sin(x)**44) - 
190080*cos(x)**12*sin(x)**44/(32*cos(x)**18*sin(x)**30 + 224*cos(x)**16*sin(x)**32 + 
672*cos(x)**14*sin(x)**34 + 1120*cos(x)**12*sin(x)**36 + 1120*cos(x)**10*sin(x)**38 + 
672*cos(x)**8*sin(x)**40 + 224*cos(x)**6*sin(x)**42 + 32*cos(x)**4*sin(x)**44) - 
95040*cos(x)**10*sin(x)**46/(32*cos(x)**18*sin(x)**30 + 224*cos(x)**16*sin(x)**32 + 
672*cos(x)**14*sin(x)**34 + 1120*cos(x)**12*sin(x)**36 + 1120*cos(x)**10*sin(x)**38 + 
672*cos(x)**8*sin(x)**40 + 224*cos(x)**6*sin(x)**42 + 32*cos(x)**4*sin(x)**44) - 
31680*cos(x)**8*sin(x)**48/(32*cos(x)**18*sin(x)**30 + 224*cos(x)**16*sin(x)**32 + 
672*cos(x)**14*sin(x)**34 + 1120*cos(x)**12*sin(x)**36 + 1120*cos(x)**10*sin(x)**38 + 
672*cos(x)**8*sin(x)**40 + 224*cos(x)**6*sin(x)**42 + 32*cos(x)**4*sin(x)**44) - 
6336*cos(x)**6*sin(x)**50/(32*cos(x)**18*sin(x)**30 + 224*cos(x)**16*sin(x)**32 + 
672*cos(x)**14*sin(x)**34 + 1120*cos(x)**12*sin(x)**36 + 1120*cos(x)**10*sin(x)**38 + 
672*cos(x)**8*sin(x)**40 + 224*cos(x)**6*sin(x)**42 + 32*cos(x)**4*sin(x)**44) - 
576*cos(x)**4*sin(x)**52/(32*cos(x)**18*sin(x)**30 + 224*cos(x)**16*sin(x)**32 + 
672*cos(x)**14*sin(x)**34 + 1120*cos(x)**12*sin(x)**36 + 1120*cos(x)**10*sin(x)**38 + 
672*cos(x)**8*sin(x)**40 + 224*cos(x)**6*sin(x)**42 + 32*cos(x)**4*sin(x)**44))

It takes 218.32 s to complete.  If I substitute sin(x) and cos(x) for symbols first, that drops to 88.54 s.  On the 
other hand, cancel(together(expr)) takes 0.30 s to produce the same result.

This is from wronskian([sin(x)**2, cos(x)**2, sin(x)*cos(x), sin(x), cos(x)], x) in the current SymPy.  Fortunately, 
if you run the wronskian straight in your branch, it comes out nice:
In [1]: wronskian([sin(x)**2, cos(x)**2, sin(x)*cos(x), sin(x), cos(x)], x)
Out[1]: -72*cos(x)**6*sin(x)**2 - 108*cos(x)**4*sin(x)**4 - 72*cos(x)**2*sin(x)**6 - 18*cos(x)**8 - 
18*sin(x)**8

In [2]: trigsimp(factor(wronskian([sin(x)**2, cos(x)**2, sin(x)*cos(x), sin(x), cos(x)], x)))
Out[2]: -18

I posted some comments on the code in your branch.  I like the philosophy that you outline in your commit 
7f4e5466414302c3106b2ac9e2224eac2f4d75d2.  I don't know much about abstract algebra (yet), so I can't 
say much about the internal part, though the code looks clean.  
Comment 2 by ondrej.certik, Aug 13, 2009
Wow, Mateusz, you are a genius. 


I tried this:

In [1]: a = (x+y+z)**20

In [2]: b = a.expand()

In [6]: time factor(b)
CPU times: user 3.08 s, sys: 0.00 s, total: 3.08 s
Wall time: 3.11 s
Out[7]: 
           20
(x + y + z)  


And try maxima, it can't do it! :) So we are slowly overtaking maxima, that's exciting.




This fails for me, but it's minor:

$ bin/test sympy/utilities/
============================= test process starts ==============================
executable:   /usr/bin/python  (2.6.2-final-0)

sympy/utilities/tests/test_code_quality.py[2] ..                            [OK]
sympy/utilities/tests/test_codegen.py[10] ..........                        [OK]
sympy/utilities/tests/test_decorator.py[1] .                                [OK]
sympy/utilities/tests/test_iterables.py[5] .....                            [OK]
sympy/utilities/tests/test_lambdify.py[25] ...........f.............        [OK]
sympy/utilities/tests/test_pickling.py[28] ..........fffffff...........     [OK]
sympy/utilities/tests/test_pytest.py[1] .                                   [OK]
sympy/utilities/tests/test_source.py[2] ..                                  [OK]
sympy/utilities/tests/test_tests_names.py[1]
['/home/ondrej/repos/sympy/sympy/mpmath/tests/test_rootfinding.py',
'/home/ondrej/repos/sympy/sympy/polys/tests/test_rootfinding.py']
F                            [FAIL]

________________________________________________________________________________
______ sympy/utilities/tests/test_tests_names.py:test_no_duplicate_names _______
  File "/home/ondrej/repos/sympy/sympy/utilities/tests/test_tests_names.py", line 21,
in test_no_duplicate_names
    assert False, message % (fname)
AssertionError: Test 'test_rootfinding.py' has a duplicate name, py.test will not run it!

======= tests finished: 66 passed, 1 failed, 8 xfailed in 20.82 seconds ========
DO *NOT* COMMIT!

Comment 3 by ondrej.certik, Aug 13, 2009
(the test tests a bug in py.test, that it fails to execute two files in the sympy
tree with the same filenames, e.g. one in mpmath and one in polys)
Comment 4 by ondrej.certik, Aug 13, 2009
Mateusz, would you manage to use cython to speed up the factorization a bit? Just for
fun, so that I can show it on scipy09, as an example of using cython. Here is the
result in Sage, that uses pari:

sage: var("x y z")
(x, y, z)
sage: a = (x+y+z)**20
sage: b = a.expand()
sage: time factor(b)
CPU times: user 0.19 s, sys: 0.00 s, total: 0.19 s
Wall time: 0.30 s
(x + y + z)^20


e.g. it's only 10x faster. Pari is optimized in C, so --- I don't know, but cython
can speed up things easily by a factor of 10x in many cases. Let's try it?
Comment 5 by ondrej.certik, Aug 13, 2009
And for this:

In [5]: a = (x+y+sin(z))**20

In [6]: b = a.expand()

In [7]: time factor(b)
CPU times: user 3.38 s, sys: 0.00 s, total: 3.38 s
Wall time: 3.46 s
Out[8]: 
                20
(x + y + sin(z))  


Sage is only 4x faster:

sage: a = (x+y+sin(z))**20
sage: b = a.expand()      
sage: time factor(b)      
CPU times: user 0.13 s, sys: 0.01 s, total: 0.14 s
Wall time: 0.80 s
(x + y + sin(z))^20



Could you beat it? That'd be fun. :)
Comment 6 by smichr, Aug 13, 2009
SO NICE to see the changes...this is going to make some of my other work much easier.
Thanks!

I noticed that expressions with NumberSymbols can be made into polys, but the new
module doesn't like reals...can those just be changed internally to constants (like
cse does)?

###
>>> Poly(pi-x)
Poly(pi - x, pi, x, domain='ZZ')
>>> Poly(.1-x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Documents and Settings\chris\sympy\sympy\polys\polytools.py", line 43
5, in __new__
    result = _init_poly_from_basic(rep, *gens, **args)
  File "C:\Documents and Settings\chris\sympy\sympy\polys\polytools.py", line 39
4, in _init_poly_from_basic
    domain, coeffs = _find_out_domain(rep, **args)
  File "C:\Documents and Settings\chris\sympy\sympy\polys\polytools.py", line 12
5, in _find_out_domain
    raise NotImplementedError('inexact coefficients')
NotImplementedError: inexact coefficients
>>>
###

/c
Comment 7 by mattpap, Aug 14, 2009
Thanks guys for comments. So ...

Ondrej:

What's really slow in your examples is expand(). See the following phenomenon:

In [1]: a = (x+y+sin(z))**20

In [2]: %time b = a.expand()
CPU times: user 5.57 s, sys: 0.00 s, total: 5.57 s
Wall time: 5.68 s

In [4]: %time b = b.expand()
CPU times: user 7.77 s, sys: 0.00 s, total: 7.77 s
Wall time: 7.90 s

Expanding an expanded expression is much slower than the original one. This is slow:

In [6]: %time c = factor(b)
CPU times: user 12.94 s, sys: 0.00 s, total: 12.95 s
Wall time: 13.72 s

However you can turn off expanding in Poly or any function in new module by setting
`expand=False`, provided you know the input expression is already expanded:

In [8]: %time c = factor(b, expand=False)
CPU times: user 5.24 s, sys: 0.00 s, total: 5.24 s
Wall time: 5.37 s

This way you really test efficiency of the factoring algorithm.

btw. Creating a Poly instance without expanding is also fast:

In [12]: %time p = Poly(b, expand=False)
CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s
Wall time: 0.01 s

> AssertionError: Test 'test_rootfinding.py' has a duplicate name, (...)

OK, so this is py.test bug. I can rename rootfinding to polyroots (or similar).

> would you manage to use cython to speed up the factorization a bit?

I can, I even started to write pxd declarations for dense*.py but didn't yet run it
because of lack of nested defs in Cython. However I can rewrite nested functions, to
make Cython happy and experiment, but I don't guarantee that this can be done in one
day. Another thing is that I tried your Cython usage in sympy example, but I can't
reproduce this great speedup you got. I get only like 30% not an order of magnitude.
I will write more about this on the mailing list in the appropriate thread.

> Could you beat it? That'd be fun. :)

Oh, sure I would be ;) What can be improved? Cython and pure mode is one thing but
also factoring algorithm isn't finished yet. Current implementation is focused on
correctness not speed and one big branch of Wang's algorithm isn't implemented at
all, so improvements are possible. I'm also curious how switching to recursive sparse
polynomial representation would affect performance.

asmeurer:

> Should this rather be "Returns self as if it were an Add instance."

Right ;)

> How is this routine different than radsimp?

This code was a part of Poly.cancel, which was copied from radsimp(). However,
radsimp() tries to do more and is problematic (there are issues opened for this).

> I tried running a more complex expression through simplify, but it hangs.

Yes, but this is once again expand() (or really core problem). Now I see that I
should do some hacking and replace expand() (at least partially) with an algorithm
based only on polynomials, i.e. replace p**n with Poly(p)**n if p is expanded,
replace p_1*p_2*...*p_n with Poly(p_1)*Poly(p_2)*...*Poly(p_n) if p_1,...,p_n are
expanded etc. This should be a lot faster. I will also port fast low-level expanding
algorithm to polynomials module to replace the current powering algorithm. All this
is doable because Basic.as_numer_denom() is fast so I don't have to care about
negative exponents.

smichr:

> but the new module doesn't like reals

I does not and rising this exception is intentional. The problem is that some poly
algorithms don't care about coefficients, mostly arithmetics, some can work with
reals but need special treatment, e.g. div, gcd, and some just won't work, e.g.
factorization. Consider all this a work in progress. Soon I will provide an algebra
for reals and wrap up mpmath to do computations. The idea is to specify tolerance of
computation in this algebra so that zero equivalence could be solved and which would
allow division algorithm to work (this is similar to Mathematica Tolerance argument
to polynomial manipulation functions). This way computations will be fast and safe.
Comment 8 by mattpap, Aug 14, 2009
I forgot to mention that you can easily switch between different ground types, just
specify SYMPY_GROUND_TYPES=something before isympy. So lets factor once again, this
time using gmpy's mpz and mpq types for ZZ and QQ algebras respectively:

$ SYMPY_GROUND_TYPES=gmpy isympy 
Python 2.6.2 console for SymPy 0.7.0-git (cache: off)

In [1]: a = (x+y+sin(z))**20

In [2]: b = a.expand()

In [3]: %time c = factor(b, expand=False)
CPU times: user 0.87 s, sys: 0.00 s, total: 0.87 s
Wall time: 0.95 s

and lets compare with standard Python's types:

$ SYMPY_GROUND_TYPES=python isympy 
Python 2.6.2 console for SymPy 0.7.0-git (cache: off)

In [1]: a = (x+y+sin(z))**20

In [2]: b = a.expand()

In [3]: %time c = factor(b, expand=False)
CPU times: user 5.25 s, sys: 0.01 s, total: 5.26 s
Wall time: 5.37 s

If Fraction is not available SymPy can run in a mixed mode, i.e. you can work with
int as ZZ and mpq as QQ. Note, however, this mode is not yet tested well.

btw. I thought, Ondrej, that the conference is this weekend but it starts next
Thursday so I can surely experiment with Cython and try to use it in polynomials
module before then.
Comment 9 by ondrej.certik, Aug 14, 2009
ok, first expand=False only speeds things up a bit for me (from 3.16s to 2.84s),
however, this:

$ SYMPY_GROUND_TYPES=gmpy bin/isympy
Python 2.6.2 console for SymPy 0.7.0-git

These commands were executed:
>>> from __future__ import division
>>> from sympy import *
>>> x, y, z = symbols('xyz')
>>> k, m, n = symbols('kmn', integer=True)
>>> f, g, h = map(Function, 'fgh')

Documentation can be found at http://sympy.org/

In [1]: a = (x+y+sin(z))**20

In [2]: b = a.expand()

In [3]: %time c = factor(b, expand=False)
CPU times: user 0.44 s, sys: 0.00 s, total: 0.44 s
Wall time: 0.45 s


makes it faster than sage!
sage: var("x y z")        
(x, y, z)
sage: a = (x+y+sin(z))**20
sage: b = a.expand()      
sage: %time c = factor(b) 
CPU times: user 0.15 s, sys: 0.01 s, total: 0.16 s
Wall time: 0.59 s

but that's because sage is using maxima if there is sin(x) in the expression. So
that's really awesome.

So let's rather compare this:

$ SYMPY_GROUND_TYPES=gmpy bin/isympy
Python 2.6.2 console for SymPy 0.7.0-git

These commands were executed:
>>> from __future__ import division
>>> from sympy import *
>>> x, y, z = symbols('xyz')
>>> k, m, n = symbols('kmn', integer=True)
>>> f, g, h = map(Function, 'fgh')

Documentation can be found at http://sympy.org/

In [1]: a = (x+y+z)**20

In [2]: b = a.expand()

In [3]: %time c = factor(b, expand=False)
CPU times: user 0.40 s, sys: 0.00 s, total: 0.40 s
Wall time: 0.40 s


to this:

sage: var("x y z")       
(x, y, z)
sage: a = (x+y+z)**20    
sage: b = a.expand()     
sage: %time c = factor(b)
CPU times: user 0.14 s, sys: 0.00 s, total: 0.15 s
Wall time: 0.15 s


E.g. Sage (pari) is only about 2.6x faster. My lecture is on Thursday at 4pm, so if
you can make it 3x faster with gmpy + Cython, then we can beat Sage.
Comment 10 by ondrej.certik, Aug 14, 2009
Yeah, but once we beat Sage, it's still nothing, since Mathematica can do this instantly:

In[1]:= c = Factor[Expand[(x+y+z)^100]]
        
                   100
Out[1]= (x + y + z)

so, the journey will be long.
Comment 11 by asmeurer, Aug 14, 2009
> This code was a part of Poly.cancel, which was copied from radsimp(). However,
> radsimp() tries to do more and is problematic (there are issues opened for this
OK.  I didn't know that was from Poly.cancel().  Looking at the radsimp source, it looks like the only difference 
between your code and radsimp is that radsimp tries to handle the case a + b*sqrt(c) where b, c == 0 and it tries 
to collect terms.  Maybe you could add a keyword argument to radsimp that turns this stuff off (like radsimp(expr, 
simple=False)), and then use that argument in the call in simplify().  That way, you are not duplicating code.  
Comment 12 by asmeurer, Aug 14, 2009
It looks like factor also chokes on reals, and you have a bug:
>>> factor(x*(y + z)**(1/2)*(1 + y))
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "./sympy/polys/polytools.py", line 1839, in factor
    F = Poly(f, *gens, **args)
  File "./sympy/polys/polytools.py", line 435, in __new__
    result = _init_poly_from_basic(rep, *gens, **args)
  File "./sympy/polys/polytools.py", line 367, in _init_poly_from_basic
    rep, gens = dict_from_basic(ex, **args)
  File "./sympy/polys/polyutils.py", line 168, in dict_from_basic
    return _dict_from_basic_no_gens(ex, **args)
  File "./sympy/polys/polyutils.py", line 123, in _dict_from_basic_no_gens
    base, exp = _analyze_power(*factor.as_Pow())
  File "./sympy/polys/polyutils.py", line 65, in _analyze_power
    raise PolynomialError("%s is not a valid polynomial term" % term)
NameError: global name 'term' is not defined

It would be nice if factor just returned the expression if it can't do it, instead of raising PolynomialError.  That 
way, you don't have to have a try block every time you want to factor something.
Sign in to add a comment

Hosted by Google Code