1 files changed, 170 insertions, 2 deletions
diff --git a/README.md b/README.md
index ce7a9b6..8dce35f 100644
--- a/README.md
+++ b/README.md
@@ -1,3 +1,171 @@
-# sifter
+# Sifter
+This repository contains code implementing the prototype analogy-making program
+Sifter, as described in our 'Onward!' 2020 paper "Analogy-Making as a Core
+Primitive in the Software Engineering Toolbox."
 
-Code for our paper [Analogy-Making as a Core Primitive in the Software Engineering Toolbox](https://arxiv.org/abs/2009.06592) will be posted here by mid-October.
+Sifter can make _analogies_ about programs, e.g., identifying data structures
+and methods which play corresponding roles in two different programs. Sifter
+can also _complete_ analogies, which allows it to automatically learn and
+generalize source code transformations from a small number of examples.
+
+#### Dependencies
+You must install [Bazel](https://bazel.build) as well as set up
+[bazel_python](https://github.com/95616ARG/bazel_python) with Python 3.7.4.
+
+#### Running Tests, Examples
+You should then be able to run
+```bash
+bazel test //... && bazel run coverage_report
+```
+to get a coverage report in `htmlcov/index.html`.
+
+To run the examples:
+```bash
+bazel run examples/letter_analogies:letter_analogy
+bazel run examples/turing_machine:turing_machine
+bazel run examples/program_analysis:program_understanding
+bazel run examples/program_analysis:api_migration
+bazel run examples/program_analysis:transform_learning
+```
+For the `program_analysis` ones, you can view the result by visiting
+`http://localhost:8001` in your browser once prompted.
+
+#### Goals, Status, and Future Work
+This repository accompanies our paper in
+[Onward! 2020](https://2020.splashcon.org/track/splash-2020-Onward-papers?#the-character-of-onward),
+with the goal of encouraging interest and future work in automated
+analogy-making for software engineering.  Ultimately, we see analogy-making as
+involving three main processes:
+1. _Raw Perception_, such as reading files (code, documentation) from the
+   filesystem and into the triplet structure workspace.
+
+   **Status:** The `LazyStructure` and `LazyTextDocument` classes in
+   [examples/program_analysis/lazy_structure.py](examples/program_analysis/lazy_structure.py)
+   read a file from the filesystem and add a representation of the file's
+   contents to the triplet structure workspace. These interfaces are 'lazy' in
+   that they allow selecting only certain parts of a file to be included in the
+   workspace. Currently we either specify ahead-of-time which portions of the
+   file to include in the workspace or just add the entire file contents.
+
+   **Future Work:** We envision the laziness being used to initially focus the
+   analogy-making on a small subset of the relevant code, then expanding to
+   larger portions as the analogy solidifies.
+2. _Semantic Rewriting_, such as grouping individual characters into a single
+   token, inlining a function, and annotating invariants found via program
+   analysis.
+
+   **Status:** Currently, the default `LazyTextDocument` interface performs a
+   light-weight tokenization pass that groups characters separated by spaces
+   and special characters (such as `+`) before encoding the document into the
+   workspace. Extra semantic information, such as program invariants, are
+   specified ahead-of-time by either directly editing the workspace or using
+   the `AnnotateFact` method of `LazyTextDocument`. There is an example of
+   annotating program invariants in
+   [examples/program_analysis/transform_learning.py](examples/program_analysis/transform_learning.py).
+
+   **Future Work:** We would like to directly connect Sifter to program
+   analysis tools so analysis results can be automatically imported into the
+   triplet structure workspace instead of needing to be manually annotated.  We
+   would like to incorporate rewrite rules that express semantic equivalence,
+   such as inlining, syntax de-sugaring, and grouping. Finally, we would like
+   to develop heuristics that can determine when to apply a program analyzer or
+   semantics-preserving rewrite rule. Such a heuristic would need to operate in
+   tandem with and be guided by the abstraction/anti-unification process.
+3. _Abstraction/Anti-Unification_, where we pair up corresponding objects in
+   the workspace to form an analogy.
+
+   **Status:** We have fairly-complete rules and heuristics for this that are
+   described in more detail in [AnalogyUtils.md](AnalogyUtils.md). These rules
+   roughly implement a syntactic anti-unification on unrooted, labelled graphs.
+
+   **Future Work:** Our existing abstraction rules work well, but depend on the
+   semantic rewriting to have already exposed most of the correspondences in
+   the syntactic representation of the workspace itself. So the primary future
+   work for this process is to provide feedback to the semantic rewriting
+   engine. For example, given programs `if x: y += 1` and `z += a ? 1 : 0;` we
+   may not be able to initially find a good syntactic abstraction, but we want
+   the abstraction process to be able to give feedback to the semantic
+   rewriting process to, e.g., desugar the ternary `?:` operator into the
+   corresponding `if` statement, which we can then find a syntactic
+   correspondance with. Beyond such better heuristics, we would also like to
+   support higher-order 'slips,' i.e., analogies where the types themselves are
+   abstracted.
+
+In addition to these three processes, there are two other main directions for
+future work:
+* The runtime does not currently support modifying rules with other rules; in
+  fact, rule-related nodes are removed from the structure completely after an
+  initial rule-parsing pass. In the future we may decide to change this to
+  assist with meta-learning.
+* We are still working on a good visualization module for triplet structures.
+  There is a rudimentary CLI interface
+  ([runtime/interactive.py](runtime/interactive.py)) as well as an interface
+  specifically for source code analogies
+  ([examples/program_analysis/ui](examples/program_analysis/ui)), however we
+  plan to introduce a more general-purpose interface in the near future.
+
+#### High-level File Overview
+- [ts_lib.py](ts_lib.py): the core library, defines the triplet structure data
+  structure and some embedded-DSL-style helpers for describing triplet
+  structures.
+- [ts_utils.py](ts_utils.py): a set of macros which make working with the
+  library (especially expressing rules) easier.
+- [tactic_utils.py](tactic_utils.py): a set of macros which make writing
+  tactics ("controllers for applying the rules") easier.
+- [mapper.py](mapper.py): rules which can be added to any triplet structure to
+  enable making abstract analogies (i.e., identifying isometries in
+  sub-structures).
+- [analogy_utils.py](analogy_utils.py): a Python interface and tactics to
+  building analogies in triplet structures that have the rules from
+  [mapper.py](mapper.py) added to them.
+- [ts_cpp/](ts_cpp/): C++ implementation of the core triplet structure data
+  structure, as well as a backtracking triplet constraint solver.
+- [runtime/](runtime/): a runtime to parse and execute rules on
+  TripletStructures.
+- [examples/](examples/): examples of using triplet structures to solve
+  analogies.
+
+#### Quickstart with the Code
+Please see [TSLang.md](TSLang.md), which documents how to write code using the
+Triplet Structure language.
+
+#### Programming Style
+We see much of the code in this repository as defining a domain-specific
+language [TSLang](TSLang.md) embedded in Python. We then write, within
+`TSLang`, the core analogy-making code for Sifter.
+
+To highlight the difference between "plumbing" code implementing the `TSLang`
+language and runtime vs. the actual analogy-making code written in `TSLang`, we
+use a distinct coding convention for each type of code:
+1. Code implementing `TSLang`, specifically `ts_lib.py` and `runtime/*`, is
+   written using Google-style Python naming (e.g., `snake_case` for methods and
+   `PascalCase` for classes).
+2. Code written in `TSLang` (everything else) is written using `PascalCase` for
+   method names.
+
+This naming convention is enforced intra-file by `.pylintrc`.
+
+#### Citing
+```
+@inproceedings{sifter:onward20,
+  author    = {Matthew Sotoudeh and
+               Aditya V. Thakur},
+  title     = {Analogy-Making as a Core Primitive in the Software Engineering Toolbox},
+  booktitle = {Proceedings of the 2020 {ACM} {SIGPLAN} International Symposium on
+               New Ideas, New Paradigms, and Reflections on Programming and Software,
+               Onward! 2020, November 18-20, Virtual, USA},
+  publisher = {{ACM}},
+  year      = {2020},
+  url       = {https://doi.org/10.1145/3426428.3426918},
+  doi       = {10.1145/3426428.3426918},
+}
+```
+
+#### People
+- [Matthew Sotoudeh](https://masot.net/): email
+  [masotoudeh@ucdavis.edu](mailto:masotoudeh@ucdavis.edu).
+- [Aditya Thakur](https://thakur.cs.ucdavis.edu/): email
+  [avthakur@ucdavis.edu](mailto:avthakur@ucdavis.edu).
+
+#### License
+Licensed under the [AGPLv3](LICENSE).