| Issue 1598: | New polynomials manipulation module | |
| 5 people starred this issue and may be notified of changes. | Back to list |
Sign in to add a comment
|
So, after 5 weeks of coding new module is done. Well, almost done because many tests are missing and some functionalities may still be quite buggy. However, after minor fixes, SymPy can work with new module very well, so lets dig into new code and share your opinions. Sixteen patches are available from: http://github.com/mattpap/sympy-polys/commits/polys Please read commit message of 7f4e5466414302c3106b2ac9e2224eac2f4d75d2 to get brief idea what was done. Some parts of new code I already showed at EuroSciPy conference but now all this is really working. Feel free to experiment. Is this the end? ... no, sorry, it smells like the beginning of work on really functional module. Plans for near future? Besides testing and documentation writing, proper support for algebraic number fields (isomorphism, inclusion, splitting fields and factorization). Is this the right approach to implementing polynomials in SymPy? Don't know but it seems to work, it's fast and does not require too much algebra knowledge to work with (at least as a user). With some minor fixes on new polynomials module side (or Cython side -> nested functions) it could be Cythonized to obtain even more computational power. |
||||||||||||
,
Aug 13, 2009
Wow! This is great. Simplification is much better with this. For example, consider the complex expression from issue 1562. In your branch: >>> a = wronskian([sin(x), cos(x), sin(x)*x, cos(x)*x, 1], x) >>> print a 8*cos(x)**2*sin(x)**2 + 4*cos(x)**4 + 4*sin(x)**4 >>> a = factor(a) >>> print a 4*(cos(x)**2 + sin(x)**2)**2 >>> trigsimp(a) 4 The original wronskian is simpler (I am assuming because of better intermediate simplification), and the expression can be factored! I tried running a more complex expression through simplify, but it hangs. >>> simplify(-576*cos(x)**26*sin(x)**30/(32*cos(x)**18*sin(x)**30 + 224*cos(x)**16*sin(x)**32 + 672*cos(x)**14*sin(x)**34 + 1120*cos(x)**12*sin(x)**36 + 1120*cos(x)**10*sin(x)**38 + 672*cos(x)**8*sin(x)**40 + 224*cos(x)**6*sin(x)**42 + 32*cos(x)**4*sin(x)**44) - 6336*cos(x)**24*sin(x)**32/(32*cos(x)**18*sin(x)**30 + 224*cos(x)**16*sin(x)**32 + 672*cos(x)**14*sin(x)**34 + 1120*cos(x)**12*sin(x)**36 + 1120*cos(x)**10*sin(x)**38 + 672*cos(x)**8*sin(x)**40 + 224*cos(x)**6*sin(x)**42 + 32*cos(x)**4*sin(x)**44) - 31680*cos(x)**22*sin(x)**34/(32*cos(x)**18*sin(x)**30 + 224*cos(x)**16*sin(x)**32 + 672*cos(x)**14*sin(x)**34 + 1120*cos(x)**12*sin(x)**36 + 1120*cos(x)**10*sin(x)**38 + 672*cos(x)**8*sin(x)**40 + 224*cos(x)**6*sin(x)**42 + 32*cos(x)**4*sin(x)**44) - 95040*cos(x)**20*sin(x)**36/(32*cos(x)**18*sin(x)**30 + 224*cos(x)**16*sin(x)**32 + 672*cos(x)**14*sin(x)**34 + 1120*cos(x)**12*sin(x)**36 + 1120*cos(x)**10*sin(x)**38 + 672*cos(x)**8*sin(x)**40 + 224*cos(x)**6*sin(x)**42 + 32*cos(x)**4*sin(x)**44) - 190080*cos(x)**18*sin(x)**38/(32*cos(x)**18*sin(x)**30 + 224*cos(x)**16*sin(x)**32 + 672*cos(x)**14*sin(x)**34 + 1120*cos(x)**12*sin(x)**36 + 1120*cos(x)**10*sin(x)**38 + 672*cos(x)**8*sin(x)**40 + 224*cos(x)**6*sin(x)**42 + 32*cos(x)**4*sin(x)**44) - 266112*cos(x)**16*sin(x)**40/(32*cos(x)**18*sin(x)**30 + 224*cos(x)**16*sin(x)**32 + 672*cos(x)**14*sin(x)**34 + 1120*cos(x)**12*sin(x)**36 + 1120*cos(x)**10*sin(x)**38 + 672*cos(x)**8*sin(x)**40 + 224*cos(x)**6*sin(x)**42 + 32*cos(x)**4*sin(x)**44) - 266112*cos(x)**14*sin(x)**42/(32*cos(x)**18*sin(x)**30 + 224*cos(x)**16*sin(x)**32 + 672*cos(x)**14*sin(x)**34 + 1120*cos(x)**12*sin(x)**36 + 1120*cos(x)**10*sin(x)**38 + 672*cos(x)**8*sin(x)**40 + 224*cos(x)**6*sin(x)**42 + 32*cos(x)**4*sin(x)**44) - 190080*cos(x)**12*sin(x)**44/(32*cos(x)**18*sin(x)**30 + 224*cos(x)**16*sin(x)**32 + 672*cos(x)**14*sin(x)**34 + 1120*cos(x)**12*sin(x)**36 + 1120*cos(x)**10*sin(x)**38 + 672*cos(x)**8*sin(x)**40 + 224*cos(x)**6*sin(x)**42 + 32*cos(x)**4*sin(x)**44) - 95040*cos(x)**10*sin(x)**46/(32*cos(x)**18*sin(x)**30 + 224*cos(x)**16*sin(x)**32 + 672*cos(x)**14*sin(x)**34 + 1120*cos(x)**12*sin(x)**36 + 1120*cos(x)**10*sin(x)**38 + 672*cos(x)**8*sin(x)**40 + 224*cos(x)**6*sin(x)**42 + 32*cos(x)**4*sin(x)**44) - 31680*cos(x)**8*sin(x)**48/(32*cos(x)**18*sin(x)**30 + 224*cos(x)**16*sin(x)**32 + 672*cos(x)**14*sin(x)**34 + 1120*cos(x)**12*sin(x)**36 + 1120*cos(x)**10*sin(x)**38 + 672*cos(x)**8*sin(x)**40 + 224*cos(x)**6*sin(x)**42 + 32*cos(x)**4*sin(x)**44) - 6336*cos(x)**6*sin(x)**50/(32*cos(x)**18*sin(x)**30 + 224*cos(x)**16*sin(x)**32 + 672*cos(x)**14*sin(x)**34 + 1120*cos(x)**12*sin(x)**36 + 1120*cos(x)**10*sin(x)**38 + 672*cos(x)**8*sin(x)**40 + 224*cos(x)**6*sin(x)**42 + 32*cos(x)**4*sin(x)**44) - 576*cos(x)**4*sin(x)**52/(32*cos(x)**18*sin(x)**30 + 224*cos(x)**16*sin(x)**32 + 672*cos(x)**14*sin(x)**34 + 1120*cos(x)**12*sin(x)**36 + 1120*cos(x)**10*sin(x)**38 + 672*cos(x)**8*sin(x)**40 + 224*cos(x)**6*sin(x)**42 + 32*cos(x)**4*sin(x)**44)) It takes 218.32 s to complete. If I substitute sin(x) and cos(x) for symbols first, that drops to 88.54 s. On the other hand, cancel(together(expr)) takes 0.30 s to produce the same result. This is from wronskian([sin(x)**2, cos(x)**2, sin(x)*cos(x), sin(x), cos(x)], x) in the current SymPy. Fortunately, if you run the wronskian straight in your branch, it comes out nice: In [1]: wronskian([sin(x)**2, cos(x)**2, sin(x)*cos(x), sin(x), cos(x)], x) Out[1]: -72*cos(x)**6*sin(x)**2 - 108*cos(x)**4*sin(x)**4 - 72*cos(x)**2*sin(x)**6 - 18*cos(x)**8 - 18*sin(x)**8 In [2]: trigsimp(factor(wronskian([sin(x)**2, cos(x)**2, sin(x)*cos(x), sin(x), cos(x)], x))) Out[2]: -18 I posted some comments on the code in your branch. I like the philosophy that you outline in your commit 7f4e5466414302c3106b2ac9e2224eac2f4d75d2. I don't know much about abstract algebra (yet), so I can't say much about the internal part, though the code looks clean. |
|||||||||||||
,
Aug 13, 2009
Wow, Mateusz, you are a genius.
I tried this:
In [1]: a = (x+y+z)**20
In [2]: b = a.expand()
In [6]: time factor(b)
CPU times: user 3.08 s, sys: 0.00 s, total: 3.08 s
Wall time: 3.11 s
Out[7]:
20
(x + y + z)
And try maxima, it can't do it! :) So we are slowly overtaking maxima, that's exciting.
This fails for me, but it's minor:
$ bin/test sympy/utilities/
============================= test process starts ==============================
executable: /usr/bin/python (2.6.2-final-0)
sympy/utilities/tests/test_code_quality.py[2] .. [OK]
sympy/utilities/tests/test_codegen.py[10] .......... [OK]
sympy/utilities/tests/test_decorator.py[1] . [OK]
sympy/utilities/tests/test_iterables.py[5] ..... [OK]
sympy/utilities/tests/test_lambdify.py[25] ...........f............. [OK]
sympy/utilities/tests/test_pickling.py[28] ..........fffffff........... [OK]
sympy/utilities/tests/test_pytest.py[1] . [OK]
sympy/utilities/tests/test_source.py[2] .. [OK]
sympy/utilities/tests/test_tests_names.py[1]
['/home/ondrej/repos/sympy/sympy/mpmath/tests/test_rootfinding.py',
'/home/ondrej/repos/sympy/sympy/polys/tests/test_rootfinding.py']
F [FAIL]
________________________________________________________________________________
______ sympy/utilities/tests/test_tests_names.py:test_no_duplicate_names _______
File "/home/ondrej/repos/sympy/sympy/utilities/tests/test_tests_names.py", line 21,
in test_no_duplicate_names
assert False, message % (fname)
AssertionError: Test 'test_rootfinding.py' has a duplicate name, py.test will not run it!
======= tests finished: 66 passed, 1 failed, 8 xfailed in 20.82 seconds ========
DO *NOT* COMMIT!
|
|||||||||||||
,
Aug 13, 2009
(the test tests a bug in py.test, that it fails to execute two files in the sympy tree with the same filenames, e.g. one in mpmath and one in polys) |
|||||||||||||
,
Aug 13, 2009
Mateusz, would you manage to use cython to speed up the factorization a bit? Just for
fun, so that I can show it on scipy09, as an example of using cython. Here is the
result in Sage, that uses pari:
sage: var("x y z")
(x, y, z)
sage: a = (x+y+z)**20
sage: b = a.expand()
sage: time factor(b)
CPU times: user 0.19 s, sys: 0.00 s, total: 0.19 s
Wall time: 0.30 s
(x + y + z)^20
e.g. it's only 10x faster. Pari is optimized in C, so --- I don't know, but cython
can speed up things easily by a factor of 10x in many cases. Let's try it?
|
|||||||||||||
,
Aug 13, 2009
And for this:
In [5]: a = (x+y+sin(z))**20
In [6]: b = a.expand()
In [7]: time factor(b)
CPU times: user 3.38 s, sys: 0.00 s, total: 3.38 s
Wall time: 3.46 s
Out[8]:
20
(x + y + sin(z))
Sage is only 4x faster:
sage: a = (x+y+sin(z))**20
sage: b = a.expand()
sage: time factor(b)
CPU times: user 0.13 s, sys: 0.01 s, total: 0.14 s
Wall time: 0.80 s
(x + y + sin(z))^20
Could you beat it? That'd be fun. :)
|
|||||||||||||
,
Aug 13, 2009
SO NICE to see the changes...this is going to make some of my other work much easier.
Thanks!
I noticed that expressions with NumberSymbols can be made into polys, but the new
module doesn't like reals...can those just be changed internally to constants (like
cse does)?
###
>>> Poly(pi-x)
Poly(pi - x, pi, x, domain='ZZ')
>>> Poly(.1-x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Documents and Settings\chris\sympy\sympy\polys\polytools.py", line 43
5, in __new__
result = _init_poly_from_basic(rep, *gens, **args)
File "C:\Documents and Settings\chris\sympy\sympy\polys\polytools.py", line 39
4, in _init_poly_from_basic
domain, coeffs = _find_out_domain(rep, **args)
File "C:\Documents and Settings\chris\sympy\sympy\polys\polytools.py", line 12
5, in _find_out_domain
raise NotImplementedError('inexact coefficients')
NotImplementedError: inexact coefficients
>>>
###
/c
|
|||||||||||||
,
Aug 14, 2009
Thanks guys for comments. So ... Ondrej: What's really slow in your examples is expand(). See the following phenomenon: In [1]: a = (x+y+sin(z))**20 In [2]: %time b = a.expand() CPU times: user 5.57 s, sys: 0.00 s, total: 5.57 s Wall time: 5.68 s In [4]: %time b = b.expand() CPU times: user 7.77 s, sys: 0.00 s, total: 7.77 s Wall time: 7.90 s Expanding an expanded expression is much slower than the original one. This is slow: In [6]: %time c = factor(b) CPU times: user 12.94 s, sys: 0.00 s, total: 12.95 s Wall time: 13.72 s However you can turn off expanding in Poly or any function in new module by setting `expand=False`, provided you know the input expression is already expanded: In [8]: %time c = factor(b, expand=False) CPU times: user 5.24 s, sys: 0.00 s, total: 5.24 s Wall time: 5.37 s This way you really test efficiency of the factoring algorithm. btw. Creating a Poly instance without expanding is also fast: In [12]: %time p = Poly(b, expand=False) CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s Wall time: 0.01 s > AssertionError: Test 'test_rootfinding.py' has a duplicate name, (...) OK, so this is py.test bug. I can rename rootfinding to polyroots (or similar). > would you manage to use cython to speed up the factorization a bit? I can, I even started to write pxd declarations for dense*.py but didn't yet run it because of lack of nested defs in Cython. However I can rewrite nested functions, to make Cython happy and experiment, but I don't guarantee that this can be done in one day. Another thing is that I tried your Cython usage in sympy example, but I can't reproduce this great speedup you got. I get only like 30% not an order of magnitude. I will write more about this on the mailing list in the appropriate thread. > Could you beat it? That'd be fun. :) Oh, sure I would be ;) What can be improved? Cython and pure mode is one thing but also factoring algorithm isn't finished yet. Current implementation is focused on correctness not speed and one big branch of Wang's algorithm isn't implemented at all, so improvements are possible. I'm also curious how switching to recursive sparse polynomial representation would affect performance. asmeurer: > Should this rather be "Returns self as if it were an Add instance." Right ;) > How is this routine different than radsimp? This code was a part of Poly.cancel, which was copied from radsimp(). However, radsimp() tries to do more and is problematic (there are issues opened for this). > I tried running a more complex expression through simplify, but it hangs. Yes, but this is once again expand() (or really core problem). Now I see that I should do some hacking and replace expand() (at least partially) with an algorithm based only on polynomials, i.e. replace p**n with Poly(p)**n if p is expanded, replace p_1*p_2*...*p_n with Poly(p_1)*Poly(p_2)*...*Poly(p_n) if p_1,...,p_n are expanded etc. This should be a lot faster. I will also port fast low-level expanding algorithm to polynomials module to replace the current powering algorithm. All this is doable because Basic.as_numer_denom() is fast so I don't have to care about negative exponents. smichr: > but the new module doesn't like reals I does not and rising this exception is intentional. The problem is that some poly algorithms don't care about coefficients, mostly arithmetics, some can work with reals but need special treatment, e.g. div, gcd, and some just won't work, e.g. factorization. Consider all this a work in progress. Soon I will provide an algebra for reals and wrap up mpmath to do computations. The idea is to specify tolerance of computation in this algebra so that zero equivalence could be solved and which would allow division algorithm to work (this is similar to Mathematica Tolerance argument to polynomial manipulation functions). This way computations will be fast and safe. |
|||||||||||||
,
Aug 14, 2009
I forgot to mention that you can easily switch between different ground types, just specify SYMPY_GROUND_TYPES=something before isympy. So lets factor once again, this time using gmpy's mpz and mpq types for ZZ and QQ algebras respectively: $ SYMPY_GROUND_TYPES=gmpy isympy Python 2.6.2 console for SymPy 0.7.0-git (cache: off) In [1]: a = (x+y+sin(z))**20 In [2]: b = a.expand() In [3]: %time c = factor(b, expand=False) CPU times: user 0.87 s, sys: 0.00 s, total: 0.87 s Wall time: 0.95 s and lets compare with standard Python's types: $ SYMPY_GROUND_TYPES=python isympy Python 2.6.2 console for SymPy 0.7.0-git (cache: off) In [1]: a = (x+y+sin(z))**20 In [2]: b = a.expand() In [3]: %time c = factor(b, expand=False) CPU times: user 5.25 s, sys: 0.01 s, total: 5.26 s Wall time: 5.37 s If Fraction is not available SymPy can run in a mixed mode, i.e. you can work with int as ZZ and mpq as QQ. Note, however, this mode is not yet tested well. btw. I thought, Ondrej, that the conference is this weekend but it starts next Thursday so I can surely experiment with Cython and try to use it in polynomials module before then. |
|||||||||||||
,
Aug 14, 2009
ok, first expand=False only speeds things up a bit for me (from 3.16s to 2.84s),
however, this:
$ SYMPY_GROUND_TYPES=gmpy bin/isympy
Python 2.6.2 console for SymPy 0.7.0-git
These commands were executed:
>>> from __future__ import division
>>> from sympy import *
>>> x, y, z = symbols('xyz')
>>> k, m, n = symbols('kmn', integer=True)
>>> f, g, h = map(Function, 'fgh')
Documentation can be found at http://sympy.org/
In [1]: a = (x+y+sin(z))**20
In [2]: b = a.expand()
In [3]: %time c = factor(b, expand=False)
CPU times: user 0.44 s, sys: 0.00 s, total: 0.44 s
Wall time: 0.45 s
makes it faster than sage!
sage: var("x y z")
(x, y, z)
sage: a = (x+y+sin(z))**20
sage: b = a.expand()
sage: %time c = factor(b)
CPU times: user 0.15 s, sys: 0.01 s, total: 0.16 s
Wall time: 0.59 s
but that's because sage is using maxima if there is sin(x) in the expression. So
that's really awesome.
So let's rather compare this:
$ SYMPY_GROUND_TYPES=gmpy bin/isympy
Python 2.6.2 console for SymPy 0.7.0-git
These commands were executed:
>>> from __future__ import division
>>> from sympy import *
>>> x, y, z = symbols('xyz')
>>> k, m, n = symbols('kmn', integer=True)
>>> f, g, h = map(Function, 'fgh')
Documentation can be found at http://sympy.org/
In [1]: a = (x+y+z)**20
In [2]: b = a.expand()
In [3]: %time c = factor(b, expand=False)
CPU times: user 0.40 s, sys: 0.00 s, total: 0.40 s
Wall time: 0.40 s
to this:
sage: var("x y z")
(x, y, z)
sage: a = (x+y+z)**20
sage: b = a.expand()
sage: %time c = factor(b)
CPU times: user 0.14 s, sys: 0.00 s, total: 0.15 s
Wall time: 0.15 s
E.g. Sage (pari) is only about 2.6x faster. My lecture is on Thursday at 4pm, so if
you can make it 3x faster with gmpy + Cython, then we can beat Sage.
|
|||||||||||||
,
Aug 14, 2009
Yeah, but once we beat Sage, it's still nothing, since Mathematica can do this instantly:
In[1]:= c = Factor[Expand[(x+y+z)^100]]
100
Out[1]= (x + y + z)
so, the journey will be long.
|
|||||||||||||
,
Aug 14, 2009
> This code was a part of Poly.cancel, which was copied from radsimp(). However, > radsimp() tries to do more and is problematic (there are issues opened for this OK. I didn't know that was from Poly.cancel(). Looking at the radsimp source, it looks like the only difference between your code and radsimp is that radsimp tries to handle the case a + b*sqrt(c) where b, c == 0 and it tries to collect terms. Maybe you could add a keyword argument to radsimp that turns this stuff off (like radsimp(expr, simple=False)), and then use that argument in the call in simplify(). That way, you are not duplicating code. |
|||||||||||||
,
Aug 14, 2009
It looks like factor also chokes on reals, and you have a bug:
>>> factor(x*(y + z)**(1/2)*(1 + y))
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "./sympy/polys/polytools.py", line 1839, in factor
F = Poly(f, *gens, **args)
File "./sympy/polys/polytools.py", line 435, in __new__
result = _init_poly_from_basic(rep, *gens, **args)
File "./sympy/polys/polytools.py", line 367, in _init_poly_from_basic
rep, gens = dict_from_basic(ex, **args)
File "./sympy/polys/polyutils.py", line 168, in dict_from_basic
return _dict_from_basic_no_gens(ex, **args)
File "./sympy/polys/polyutils.py", line 123, in _dict_from_basic_no_gens
base, exp = _analyze_power(*factor.as_Pow())
File "./sympy/polys/polyutils.py", line 65, in _analyze_power
raise PolynomialError("%s is not a valid polynomial term" % term)
NameError: global name 'term' is not defined
It would be nice if factor just returned the expression if it can't do it, instead of raising PolynomialError. That
way, you don't have to have a try block every time you want to factor something.
|
|||||||||||||
|
|
|||||||||||||