diff options
author | Alex Ozdemir <aozdemir@hmc.edu> | 2019-07-02 02:37:44 -0700 |
---|---|---|
committer | GitHub <noreply@github.com> | 2019-07-02 02:37:44 -0700 |
commit | 1c5a8e7bee22e9b154a5ac65c52cc04d3d2ba3c0 (patch) | |
tree | e174311348cb95398b3802fd043c1eec9c64e0d9 /src/proof/clausal_bitvector_proof.h | |
parent | 74fd30f3ac172ac7f1e3708a5d47df1a1018c51b (diff) |
Optimize DRAT optimization: clause matching (#3074)
* improved proof production statistics
This commit creates new statistics which will help us understand what is
so expensive about proof production.
There are already statistics for:
* Net time spent building/printing the LFSC proof.
* Size of the LFSC proof.
This commit adds statistics for:
* The time cost of DRAT optimization:
* net time
* tool time (drat-trim)
* clause matching time (matching core clauses with input clauses)
* Non-trivial because drat-trim can (and does) dedup and reorder
literals
* The advantage of DRAT optimization (proof/formula size before/after)
* The time cost of DRAT translation [to LRAT or ER] (net time, tool time)
* The time cost of skeleton traversal
* The time cost of printing declatations
* The time cost of printing CNF proofs
* The time cost of printing theory lemmas
* The time cost of printing final UNSAT proof.
There is a method, toStream, which is responsible for much of proof
production. The whole method was timed, but subsections were not. The
timings above consider subsections of it.
I also wanted to better understand the cost of DRAT optimization and
translation.
* [BV Proof] Optimize DRAT optimization
tldr: I made a bad data-structure/algorithm choice when implementing
part of DRAT/CNF-optimization, which consumed **a lot** of time on some
bechmarks. This commit fixes that choice.
Long Version:
Set-Keyed Maps Considered Harmful
=================================
Algorithmic Problem
-------------------
The DRAT optimization process spits out a unsatifiable core of the CNF.
The clauses in the core are all from the original formula, but their
literals may have been reordered and deduplicated. We must match the
old clauses with new ones, so we know which old clauses are in the core.
Old (BAD) Solution
------------------
Before I didn't really think about this process very much. I built a
solution in which clauses were canonically represented by hash sets of
literals, and made a hash map from canonical clauses to clause indices
into the original CNF.
Problem With Old Solution
-------------------------
In hindsight this was a bad idea. First, it required a new hash set to
be heap-allocated for every clause in the CNF. Second, the map lookups
required set-hashes (reasonable -- linear time, once) and hash-set
equality (not reasonable -- quadratic time, multiple times) on every
lookup.
New Solution
------------
The ideal solution is probably to have the map from clauses to clause
ids be something like a trie. STL doesn't have a trie, but representing
clauses as sorted, deduped vectors of literal in a tree based on
lexicographical comparison is pretty closed to this. On randomly chosen
examples it seems to be a huge improvement over the old
map-keyed-by-sets solution, and I'm in the process of running a full set
of bechmarks.
Also, we store pointers to the clauses already stored elsewhere in the
proof, instead of allocating new memory for them.
Future Work
-----------
It may also be reasonable to do a hash map of sorted, deduped, vector
clauses. I haven't tried this, yet (there's a TODO in the code).
* Update src/proof/clausal_bitvector_proof.h
Thanks andres!
Co-Authored-By: Andres Noetzli <andres.noetzli@gmail.com>
* Respond to Andres' Review: better use of CodeTimer
* Removed commented code (Andres)
Diffstat (limited to 'src/proof/clausal_bitvector_proof.h')
-rw-r--r-- | src/proof/clausal_bitvector_proof.h | 44 |
1 files changed, 44 insertions, 0 deletions
diff --git a/src/proof/clausal_bitvector_proof.h b/src/proof/clausal_bitvector_proof.h index a410b5b38..2047e325c 100644 --- a/src/proof/clausal_bitvector_proof.h +++ b/src/proof/clausal_bitvector_proof.h @@ -33,6 +33,7 @@ #include "prop/cnf_stream.h" #include "prop/sat_solver_types.h" #include "theory/bv/theory_bv.h" +#include "util/statistics_registry.h" namespace CVC4 { @@ -67,10 +68,53 @@ class ClausalBitVectorProof : public BitVectorProof std::ostringstream d_binaryDratProof{}; std::vector<ClauseId> d_coreClauseIndices{}; + struct DratTranslationStatistics + { + DratTranslationStatistics(); + ~DratTranslationStatistics(); + + // Total time spent doing translation (optimized binary DRAT -> in memory + // target format including IO, postprocessing, etc.) + TimerStat d_totalTime; + // Time that the external tool actually spent + TimerStat d_toolTime; + }; + + DratTranslationStatistics d_dratTranslationStatistics; + private: // Optimizes the DRAT proof stored in `d_binaryDratProof` and returns a list // of clause actually needed to check that proof (a smaller UNSAT core) void optimizeDratProof(); + + // Given reference to a SAT clause encoded as a vector of literals, puts the + // literals into a canonical order + static void canonicalizeClause(prop::SatClause& clause); + + struct DratOptimizationStatistics + { + DratOptimizationStatistics(); + ~DratOptimizationStatistics(); + + // Total time spent using drat-trim to optimize the DRAT proof/formula + // (including IO, etc.) + TimerStat d_totalTime; + // Time that drat-trim actually spent optimizing the DRAT proof/formula + TimerStat d_toolTime; + // Time that was spent matching clauses in drat-trim's output to clauses in + // its input + TimerStat d_clauseMatchingTime; + // Bytes in binary DRAT proof before optimization + IntStat d_initialDratSize; + // Bytes in binary DRAT proof after optimization + IntStat d_optimizedDratSize; + // Bytes in textual DIMACS bitblasted formula before optimization + IntStat d_initialFormulaSize; + // Bytes in textual DIMACS bitblasted formula after optimization + IntStat d_optimizedFormulaSize; + }; + + DratOptimizationStatistics d_dratOptimizationStatistics; }; /** |