to_cnf needs to be more efficient. For some expressions, it takes forever and produces huge expressions, which tend to have tons of unnecessary clauses (lots of x | ~x).
I googled, and http://gauss.ececs.uc.edu/Courses/c626/lectures/BDD/st.pdf looks like it might be an algorithm. Any kind of optimization would probably help. Right now, it just does the naive distribution.
Comment #1
Posted on Oct 28, 2013 by Happy ElephantThe algorithm apparently works by adding new symbols to the expression. The guarantee is made that the new expression is satisfiable if and only if the original was, and by the same model. This is a little different than what to_cnf does, so probably this should be a separate function, or a flag in to_cnf. It would probably be a little surprising if to_cnf returned an expression that is not exactly logically equivalent to the input, but this is definitely what we need in the call to to_cnf in satisfiable().
The algorithm in the paper claims to be O(m), where m is the number of operators (And(x, y, z) is counted as two operators). A modified algorithm that efficiently handles duplicate subexpressions is O(m*log(m)). The naive algorithm of distributing ands and ors is I believe O(2**m) (or something like that). See for instance https://en.wikipedia.org/wiki/Conjunctive_normal_form#Conversion_into_CNF.
Comment #2
Posted on Oct 28, 2013 by Quick HippoAny conversion to CNF using Tseitin (or a related approach) will have auxiliary variables introduced. The encoding time goes down, but the solving time increases (sometimes substantially). Probably good to keep that in mind...
Comment #3
Posted on Oct 28, 2013 by Happy ElephantOh I didn't know that. So we would need to find a heuristic to balance the two.
Comment #4
Posted on Oct 28, 2013 by Happy ElephantDo you know of any better references for this, aimed more toward implementation? It seems like in many (most?) cases, adding new variables is unnecessary. For instance, if I have a conjunction of a bunch of implications or double implications, that is basically already in cnf. As a somewhat less trivial example, if I have Implies(p, x), where p is in dnf and x is a variable, then the conversion is not so bad, because ~p is in cnf, and you then just have to distribute the x to each term.
By the way, another question. If the input to satisfiable is an Or, would it be more efficient to just recursively call satisfiable on each argument until an answer is found? This seems like it might be more efficient than converting it to cnf, especially since if it happens to be in dnf, then the answer would be found instantly. Even if you have a huge Or that ends up being unsatisfiable (the worst case), it still seems like this would be more efficient than converting to cnf, at least given the current naive to_cnf. But I didn't test it, so I could be wrong.
Comment #5
Posted on Oct 28, 2013 by Quick HippoBetter references? There should be, but I just know it as general lore around the SAT solving community. Your best bet is to analyze the types of input you are expecting -- if it's more towards DNF, then flatten to that and use model enumeration to solve your query. If it's more towards CNF, then flatten to that and use a SAT solver. If it really is stuck half-way in between (and flattening to either causes a combinatorial explosion), then what I've seen is either Tseiten encodings or circuit solvers. The former introduces the new variables, and the latter just runs DPLL on top of the arbitrary formula.
Because flattening is a recursive procedure, it shouldn't be so hard to create a DP that estimates the final CNF (or DNF) size without actually computing it. That could be used at run time to determine what technique is best.
Comment #6
Posted on Oct 28, 2013 by Happy Elephant(No comment was entered for this change.)
Comment #7
Posted on Nov 5, 2013 by Happy ElephantIs it a bad idea for satisfiable() to always recurse across an Or (even ones that are easy to flatten)? It seems to me like the analogue of solve() always recursing across a Mul. One disadvantage I can think of is that you will no longer be guaranteed to get a full model, as DPLL2 seems to do. Could it ever be far less efficient than converting to CNF first and running the normal solver, as it currently does?
Comment #8
Posted on Mar 5, 2014 by Happy ElephantWe have moved issues to GitHub https://github.com/sympy/sympy/issues.
Comment #9
Posted on Apr 6, 2014 by Happy RabbitMigrated to http://github.com/sympy/sympy/issues/7174
Status: Valid
Labels:
Type-Defect
Priority-High
Assumptions
Logic
Restrict-AddIssueComment-Commit