 # SAT and MAX-SAT for the Lay-Researcher

This page contains information about satisfiability (SAT) and maximum satisfiability (MAX-SAT). Although the ideas behind SAT and MAX-SAT are simple I will make them even simpler for all those who are interested in the topic. So here goes...

First, let me be formal in defining it, then I will give examples to clarify things a bit. A SAT expression is composed from a set of m clauses in conjunction, , where each clause is formed by the disjunction of Boolean variables, , or their negation, where . This form of the conjunction of clauses and the disjunction of literals (variables and their negation) is called Conjunctive Normal Form (CNF). In SAT the problem asks if there exists a truth assignment such that the expression is true.

What this means is if you have a bunch of variables OR-ed together then you have a clause. If you have several clauses AND-ed together, then you have a formula or an expression that is referred to as CNF formula. Here's an example:

Suppose we have the following variables, We can create some clauses by ORing variables,    That was simple, now join all these clauses together with AND operators and you will have a formula that looks like this: where is the negation of or not .

Here's what researchers are concerned about. They are interested in finding out if there exists a Boolean assignments for each such that is 1. An example of a satisfying assignment would be:     1 0 1 1 1

When we substitute each of the values of into we get, The result is = 1. That sounds simple doesn't it? Well it's not. This problem can be NP-Complete, and NP-Complete problems get harder and harder as the problem becomes larger. So when the number of variables in the formula increase, the problem gets more difficult.

OK, some demystification is in order here. What does NP-Complete mean? I want to make sure that I get this point correctly because my supervisor hates it when I make this mistake. A lot of people think that NP in NP-Complete stands for Non-Polynomial time. This is not true. NP stands for Non-deterministic Polynomial time. What this means is that the problem can be solved with a non-deterministic algorithm, and what's polynomial about it is the verification part. Verifying that the solution provided is a correct one can be done in polynomial time.

Imagine if your boss had a problem that contained 1,000,000 variables and 4,300,000 clauses, and you were asked to find out whether there is an assignment that satisfies the formula. I would suggest that you pack your bags and leave. However, suppose that aliens came to earth and tramped around on their long legged tripods, but instead of herding people into nets and sucking their blood, they solved your problem with their celestially complicated undeterministic machine. You quickly donate a pint of your blood in return for the solution. Now that you have the solution you are faced with something you can easily do: verifying the correctness of the solution. It shouldn't take you very long to show that the solution the aliens gave you does, in fact, satisfy the formula.

Now that we know what SAT is, what is MAX-SAT? MAX-SAT is general form of SAT. In SAT we wanted a decision whether the problem had a satisfiying assignment or not. In MAX-SAT we go beyond that making the problem even harder. The question that MAX-SAT asks is, what is the maximum number of clauses that can be satisfied? This makes maximum satisfiability NP-hard. For one thing, since it's not a decision problem it automatically falls outside the realm of NP-Complete problems. All NP-Complete problems are decision problems. Also, MAX-SAT problems are as hard or harder than SAT problems.

One thing I need to mention before I move on from the NP-Complete realm. When I mentioned that a SAT problem can be NP-Complete I was trying to say that not all SAT problems are NP-Complete. A problem that has k variables per clause is called k-SAT, or MAX-k-SAT. Now, when k 3, then the SAT problem becomes NP-Complete. Otherwise, it is in P, and therefore easy to solve. With MAX-k-SAT, when k 2 the problem become NP-Complete. So If boss gave you a SAT problem with k = 2, your job is still secure.

# Applications of SAT and MAX-SAT

So why do we care about this problem? Really who cares if there is a satisfying assignment or not. SAT and MAX-SAT problems are more significant than you think. First and foremost, there are real world applications that depend on finding solutions to SAT problems. Second, because SAT problems can be represented easily it makes them very attractive to researchers, and since SAT is NP-Complete that means that if a polynomial time algorithm was somehow found for it (I doubt it!), then all other NP-Complete algorithms can be solved the same way. Our friendly blood loving aliens solved this problem a long long time ago, in a galaxy far far away. However, because of their prime directive, "thou shalt not interfere with internal affairs of other civilizations unless thirsty" means that we have to win the Nobel prize on our own.

The following are some of the applications of SAT:

• Electronic Design Automation (EDA) and verification
• Automatic Test Pattern Generation
• path delay faults
• Field Programmable Gate Array routing
• Logic synthesis
• Crosstalk noise analysis
• Functional vector generation

# SAT and MAX-SAT Solvers

You will probably not appreciate how difficult an NP-Complete problem such as SAT is until you understand how the solvers work. Suppose your friend, a very naive programmer who flunked his job interview with Google, wanted to take on the SAT problem. He proposed a very simple algorithm: enumerate all possible 1 and 0 configurations, and verify each one. After you smacked him on the head nearly knocking off his wisdom tooth, you explained that this would work fine for small problems, but as the number of variables becomes large, enumerating all possible combinations becomes impossible. Generally speaking, if you had n variables, then this exhaustive search requires, at the worst case, 2n tests. For example, if you had 5 variables, then there are 25 = 32 possible configurations: 00000, 00001, 00010, 00011, and so on until 11111. On the other hand, if you had 1000 variables then you need to have,

`10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376 `

tests in the worst case to find out if the formula is satisfiable. With 1,000,000 variables there are approximately 10301029 possible configurations. Compare that to the number of atoms in the universe which is only about 1080. That should make your friend feel ashamed of himself.

Attempting to salvage what's left of your friendship, John Saturn, your friend rummaged through Google's Scholar for better ideas. This is what he found out. There are two types of search algorithms: complete and incomplete methods. Complete methods perform exhaustive searches. They guarantee finding an optimal solution if there exists one. However, being complete does not mean checking every possible configuration. They use clever methods to trim the search down. Incomplete methods, on the other hand, do not guarantee finding a an optimal solution, but if they do find solutions they do it very quickly. They are sometimes referred to as local or stochastic search algorithms.

### Complete Methods

• Branch and Bound.
• DPL (Davis Putnam Loveland) algorithm.

#### Branch and Bound

To make life easier, Mr. Saturn, started off with the simplest complete solver he could get his hands on, Branch and Bound. It is based on a very simple concept: set a variable to a particular value (0 or 1), apply it to the formula. If that value satisfies all the clauses its in, then assign a value to the next variable, and so on. If, on the other hand, an assigned variable unsatisfies one of the clauses, then it is futile to maintain the state of that variable and assign bits to the rest of the variables. What this process does is clear. It prunes branches of this large tree that the different 0 or 1 assignments create. Let's try the following example.

Let's use the formula we started with as an example: Figure shows the possible assignments each of the variables through can take. So let's start from with its value being a zero. When we substitute this value into the formula we get a zero in the 2st and 4th clauses, but this does not satisfy the formula or unsatisfies it. We go to the next variable in the tree, and that's a 0 for . When we plug this value into the formula, then we have the 1st clause satisfied, but because the 4th clause has both and as zeros, then that clause in unsatisfied. Therefore, it doesn't make sense to continue down that branch for the rest of the variables. That is because there is no way we can satisfy the formula by applying the rest of the variables, , , or (think about it or try it for yourself). We try the second branch of , and assign a 1 to it. Now we have satisfied the 4th clause. Traversing the branch down to with 0 we get clause 2 satisfied. We keep performing this process until all clauses are satisfied. Notice that we didn't have to go through every possible branch there is to the end to guarantee finding the solution or solutions. Figure : All possible combinations of 0's and 1's for 5 variables.

Although we have reduced the number of checks we performed it does not mean that the problem has become any easier. It is still hard to solve, and it is NP-Complete for SAT. Also, this way of pruning the tree applies to SAT problems, but not MAX-SAT. There is a slight difference that makes the problem even harder. With MAX-SAT we are not looking to satisfy all clauses. Instead we want to satisfy the maximum number of clauses, and therefore having an unsatisfied clause when assigning a particular variable does not mean that we stop at that point and bound back to the other branch. It could be that assigning a 0 unsatisfies a single clause while satisfying many others, and it could be that if we continue down that branch we get maximum satisfiability. This means that there are a whole lot more branches that have to be verified. Bounding from a branch happens only if at any point we find that assigning a value to yields a greater than or an equal number of unsatisfied clauses than we previously had.

#### DPL (Davis Putnam Loveland) algorithm

The DPL method is the most commonly used complete method in SAT and MAX-SAT, and the majority of algorithms created to this date rely on a modified form of this algorithm. It is basically the same as Branch and bound, but there is a slight difference which makes it more powerful. The way it works is this, select a literal (a variable or its negation). Assign the literal a value. If the literal is true, and hence satisfies the clause, then remove all clauses containing that literal from the formula. On the other hand, where literal is false due to its negation, remove only that specific variable from the clause. So far this is sort of like branch and bound except that we are looking at it from the point of view of the clauses. We recursively do this until we find a solution. To make this even better, if a clause contained a single literal, then assign that literal a value that makes it true, and perform the same procedure. That is called unit clause elimination. You can imagine how this would have a cascading effect. As more variables are eliminated from clauses the more clauses turn into unit clauses. So does DPLL make the problem easier? No. That means that you still have a chance at being recognised as Nobelaureate if you discover the solution (however infinitesimally this chance may be).

Here's an example of how DPLL works. Let's start off with our formula again. Select , and assign a 0 (which makes the clause true). Eliminate the first clause from the formula, and eliminate from the 4th clause. This leaves us with the following formula: Notice now that the third clause in this formula has become a unit clause. We immediately assign a 1. It wouldn't make sense to do otherwise. Now we remove both clauses 1 and 3 from the formula. Finally, we set to 0, and eliminate it from the formula, and we are left with . Setting it to zero satisfies the formula. There is another trick that was applied to DPLL that was supposed to improve on the search. It's called pure literal elimination, but it is omitted from current implementations of the algorithm due to the limited effect it has. Basically, if a literal appears in a particular configuration, say in all the clauses that the variable appears in, then all clauses containing this variable can be satisfied by assigning a 0 to the literal. That means that all clauses containing this variable can be removed.