diff options
Diffstat (limited to 'README.md')
-rw-r--r-- | README.md | 172 |
1 files changed, 170 insertions, 2 deletions
@@ -1,3 +1,171 @@ -# sifter +# Sifter +This repository contains code implementing the prototype analogy-making program +Sifter, as described in our 'Onward!' 2020 paper "Analogy-Making as a Core +Primitive in the Software Engineering Toolbox." -Code for our paper [Analogy-Making as a Core Primitive in the Software Engineering Toolbox](https://arxiv.org/abs/2009.06592) will be posted here by mid-October. +Sifter can make _analogies_ about programs, e.g., identifying data structures +and methods which play corresponding roles in two different programs. Sifter +can also _complete_ analogies, which allows it to automatically learn and +generalize source code transformations from a small number of examples. + +#### Dependencies +You must install [Bazel](https://bazel.build) as well as set up +[bazel_python](https://github.com/95616ARG/bazel_python) with Python 3.7.4. + +#### Running Tests, Examples +You should then be able to run +```bash +bazel test //... && bazel run coverage_report +``` +to get a coverage report in `htmlcov/index.html`. + +To run the examples: +```bash +bazel run examples/letter_analogies:letter_analogy +bazel run examples/turing_machine:turing_machine +bazel run examples/program_analysis:program_understanding +bazel run examples/program_analysis:api_migration +bazel run examples/program_analysis:transform_learning +``` +For the `program_analysis` ones, you can view the result by visiting +`http://localhost:8001` in your browser once prompted. + +#### Goals, Status, and Future Work +This repository accompanies our paper in +[Onward! 2020](https://2020.splashcon.org/track/splash-2020-Onward-papers?#the-character-of-onward), +with the goal of encouraging interest and future work in automated +analogy-making for software engineering. Ultimately, we see analogy-making as +involving three main processes: +1. _Raw Perception_, such as reading files (code, documentation) from the + filesystem and into the triplet structure workspace. + + **Status:** The `LazyStructure` and `LazyTextDocument` classes in + [examples/program_analysis/lazy_structure.py](examples/program_analysis/lazy_structure.py) + read a file from the filesystem and add a representation of the file's + contents to the triplet structure workspace. These interfaces are 'lazy' in + that they allow selecting only certain parts of a file to be included in the + workspace. Currently we either specify ahead-of-time which portions of the + file to include in the workspace or just add the entire file contents. + + **Future Work:** We envision the laziness being used to initially focus the + analogy-making on a small subset of the relevant code, then expanding to + larger portions as the analogy solidifies. +2. _Semantic Rewriting_, such as grouping individual characters into a single + token, inlining a function, and annotating invariants found via program + analysis. + + **Status:** Currently, the default `LazyTextDocument` interface performs a + light-weight tokenization pass that groups characters separated by spaces + and special characters (such as `+`) before encoding the document into the + workspace. Extra semantic information, such as program invariants, are + specified ahead-of-time by either directly editing the workspace or using + the `AnnotateFact` method of `LazyTextDocument`. There is an example of + annotating program invariants in + [examples/program_analysis/transform_learning.py](examples/program_analysis/transform_learning.py). + + **Future Work:** We would like to directly connect Sifter to program + analysis tools so analysis results can be automatically imported into the + triplet structure workspace instead of needing to be manually annotated. We + would like to incorporate rewrite rules that express semantic equivalence, + such as inlining, syntax de-sugaring, and grouping. Finally, we would like + to develop heuristics that can determine when to apply a program analyzer or + semantics-preserving rewrite rule. Such a heuristic would need to operate in + tandem with and be guided by the abstraction/anti-unification process. +3. _Abstraction/Anti-Unification_, where we pair up corresponding objects in + the workspace to form an analogy. + + **Status:** We have fairly-complete rules and heuristics for this that are + described in more detail in [AnalogyUtils.md](AnalogyUtils.md). These rules + roughly implement a syntactic anti-unification on unrooted, labelled graphs. + + **Future Work:** Our existing abstraction rules work well, but depend on the + semantic rewriting to have already exposed most of the correspondences in + the syntactic representation of the workspace itself. So the primary future + work for this process is to provide feedback to the semantic rewriting + engine. For example, given programs `if x: y += 1` and `z += a ? 1 : 0;` we + may not be able to initially find a good syntactic abstraction, but we want + the abstraction process to be able to give feedback to the semantic + rewriting process to, e.g., desugar the ternary `?:` operator into the + corresponding `if` statement, which we can then find a syntactic + correspondance with. Beyond such better heuristics, we would also like to + support higher-order 'slips,' i.e., analogies where the types themselves are + abstracted. + +In addition to these three processes, there are two other main directions for +future work: +* The runtime does not currently support modifying rules with other rules; in + fact, rule-related nodes are removed from the structure completely after an + initial rule-parsing pass. In the future we may decide to change this to + assist with meta-learning. +* We are still working on a good visualization module for triplet structures. + There is a rudimentary CLI interface + ([runtime/interactive.py](runtime/interactive.py)) as well as an interface + specifically for source code analogies + ([examples/program_analysis/ui](examples/program_analysis/ui)), however we + plan to introduce a more general-purpose interface in the near future. + +#### High-level File Overview +- [ts_lib.py](ts_lib.py): the core library, defines the triplet structure data + structure and some embedded-DSL-style helpers for describing triplet + structures. +- [ts_utils.py](ts_utils.py): a set of macros which make working with the + library (especially expressing rules) easier. +- [tactic_utils.py](tactic_utils.py): a set of macros which make writing + tactics ("controllers for applying the rules") easier. +- [mapper.py](mapper.py): rules which can be added to any triplet structure to + enable making abstract analogies (i.e., identifying isometries in + sub-structures). +- [analogy_utils.py](analogy_utils.py): a Python interface and tactics to + building analogies in triplet structures that have the rules from + [mapper.py](mapper.py) added to them. +- [ts_cpp/](ts_cpp/): C++ implementation of the core triplet structure data + structure, as well as a backtracking triplet constraint solver. +- [runtime/](runtime/): a runtime to parse and execute rules on + TripletStructures. +- [examples/](examples/): examples of using triplet structures to solve + analogies. + +#### Quickstart with the Code +Please see [TSLang.md](TSLang.md), which documents how to write code using the +Triplet Structure language. + +#### Programming Style +We see much of the code in this repository as defining a domain-specific +language [TSLang](TSLang.md) embedded in Python. We then write, within +`TSLang`, the core analogy-making code for Sifter. + +To highlight the difference between "plumbing" code implementing the `TSLang` +language and runtime vs. the actual analogy-making code written in `TSLang`, we +use a distinct coding convention for each type of code: +1. Code implementing `TSLang`, specifically `ts_lib.py` and `runtime/*`, is + written using Google-style Python naming (e.g., `snake_case` for methods and + `PascalCase` for classes). +2. Code written in `TSLang` (everything else) is written using `PascalCase` for + method names. + +This naming convention is enforced intra-file by `.pylintrc`. + +#### Citing +``` +@inproceedings{sifter:onward20, + author = {Matthew Sotoudeh and + Aditya V. Thakur}, + title = {Analogy-Making as a Core Primitive in the Software Engineering Toolbox}, + booktitle = {Proceedings of the 2020 {ACM} {SIGPLAN} International Symposium on + New Ideas, New Paradigms, and Reflections on Programming and Software, + Onward! 2020, November 18-20, Virtual, USA}, + publisher = {{ACM}}, + year = {2020}, + url = {https://doi.org/10.1145/3426428.3426918}, + doi = {10.1145/3426428.3426918}, +} +``` + +#### People +- [Matthew Sotoudeh](https://masot.net/): email + [masotoudeh@ucdavis.edu](mailto:masotoudeh@ucdavis.edu). +- [Aditya Thakur](https://thakur.cs.ucdavis.edu/): email + [avthakur@ucdavis.edu](mailto:avthakur@ucdavis.edu). + +#### License +Licensed under the [AGPLv3](LICENSE). |