1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
|
# Sifter
This repository contains code implementing the prototype analogy-making program
Sifter, as described in our 'Onward!' 2020 paper "Analogy-Making as a Core
Primitive in the Software Engineering Toolbox."
Sifter can make _analogies_ about programs, e.g., identifying data structures
and methods which play corresponding roles in two different programs. Sifter
can also _complete_ analogies, which allows it to automatically learn and
generalize source code transformations from a small number of examples.
#### Dependencies
You must install [Bazel](https://bazel.build) as well as set up
[bazel_python](https://github.com/95616ARG/bazel_python) with Python 3.7.4.
#### Running Tests, Examples
You should then be able to run
```bash
bazel test //... && bazel run coverage_report
```
to get a coverage report in `htmlcov/index.html`.
To run the examples:
```bash
bazel run examples/letter_analogies:letter_analogy
bazel run examples/turing_machine:turing_machine
bazel run examples/program_analysis:program_understanding
bazel run examples/program_analysis:api_migration
bazel run examples/program_analysis:transform_learning
```
For the `program_analysis` ones, you can view the result by visiting
`http://localhost:8001` in your browser once prompted.
#### Goals, Status, and Future Work
This repository accompanies our paper in
[Onward! 2020](https://2020.splashcon.org/track/splash-2020-Onward-papers?#the-character-of-onward),
with the goal of encouraging interest and future work in automated
analogy-making for software engineering. Ultimately, we see analogy-making as
involving three main processes:
1. _Raw Perception_, such as reading files (code, documentation) from the
filesystem and into the triplet structure workspace.
**Status:** The `LazyStructure` and `LazyTextDocument` classes in
[examples/program_analysis/lazy_structure.py](examples/program_analysis/lazy_structure.py)
read a file from the filesystem and add a representation of the file's
contents to the triplet structure workspace. These interfaces are 'lazy' in
that they allow selecting only certain parts of a file to be included in the
workspace. Currently we either specify ahead-of-time which portions of the
file to include in the workspace or just add the entire file contents.
**Future Work:** We envision the laziness being used to initially focus the
analogy-making on a small subset of the relevant code, then expanding to
larger portions as the analogy solidifies.
2. _Semantic Rewriting_, such as grouping individual characters into a single
token, inlining a function, and annotating invariants found via program
analysis.
**Status:** Currently, the default `LazyTextDocument` interface performs a
light-weight tokenization pass that groups characters separated by spaces
and special characters (such as `+`) before encoding the document into the
workspace. Extra semantic information, such as program invariants, are
specified ahead-of-time by either directly editing the workspace or using
the `AnnotateFact` method of `LazyTextDocument`. There is an example of
annotating program invariants in
[examples/program_analysis/transform_learning.py](examples/program_analysis/transform_learning.py).
**Future Work:** We would like to directly connect Sifter to program
analysis tools so analysis results can be automatically imported into the
triplet structure workspace instead of needing to be manually annotated. We
would like to incorporate rewrite rules that express semantic equivalence,
such as inlining, syntax de-sugaring, and grouping. Finally, we would like
to develop heuristics that can determine when to apply a program analyzer or
semantics-preserving rewrite rule. Such a heuristic would need to operate in
tandem with and be guided by the abstraction/anti-unification process.
3. _Abstraction/Anti-Unification_, where we pair up corresponding objects in
the workspace to form an analogy.
**Status:** We have fairly-complete rules and heuristics for this that are
described in more detail in [AnalogyUtils.md](AnalogyUtils.md). These rules
roughly implement a syntactic anti-unification on unrooted, labelled graphs.
**Future Work:** Our existing abstraction rules work well, but depend on the
semantic rewriting to have already exposed most of the correspondences in
the syntactic representation of the workspace itself. So the primary future
work for this process is to provide feedback to the semantic rewriting
engine. For example, given programs `if x: y += 1` and `z += a ? 1 : 0;` we
may not be able to initially find a good syntactic abstraction, but we want
the abstraction process to be able to give feedback to the semantic
rewriting process to, e.g., desugar the ternary `?:` operator into the
corresponding `if` statement, which we can then find a syntactic
correspondance with. Beyond such better heuristics, we would also like to
support higher-order 'slips,' i.e., analogies where the types themselves are
abstracted.
In addition to these three processes, there are two other main directions for
future work:
* The runtime does not currently support modifying rules with other rules; in
fact, rule-related nodes are removed from the structure completely after an
initial rule-parsing pass. In the future we may decide to change this to
assist with meta-learning.
* We are still working on a good visualization module for triplet structures.
There is a rudimentary CLI interface
([runtime/interactive.py](runtime/interactive.py)) as well as an interface
specifically for source code analogies
([examples/program_analysis/ui](examples/program_analysis/ui)), however we
plan to introduce a more general-purpose interface in the near future.
#### High-level File Overview
- [ts_lib.py](ts_lib.py): the core library, defines the triplet structure data
structure and some embedded-DSL-style helpers for describing triplet
structures.
- [ts_utils.py](ts_utils.py): a set of macros which make working with the
library (especially expressing rules) easier.
- [tactic_utils.py](tactic_utils.py): a set of macros which make writing
tactics ("controllers for applying the rules") easier.
- [mapper.py](mapper.py): rules which can be added to any triplet structure to
enable making abstract analogies (i.e., identifying isometries in
sub-structures).
- [analogy_utils.py](analogy_utils.py): a Python interface and tactics to
building analogies in triplet structures that have the rules from
[mapper.py](mapper.py) added to them.
- [ts_cpp/](ts_cpp/): C++ implementation of the core triplet structure data
structure, as well as a backtracking triplet constraint solver.
- [runtime/](runtime/): a runtime to parse and execute rules on
TripletStructures.
- [examples/](examples/): examples of using triplet structures to solve
analogies.
#### Quickstart with the Code
Please see [TSLang.md](TSLang.md), which documents how to write code using the
Triplet Structure language.
#### Programming Style
We see much of the code in this repository as defining a domain-specific
language [TSLang](TSLang.md) embedded in Python. We then write, within
`TSLang`, the core analogy-making code for Sifter.
To highlight the difference between "plumbing" code implementing the `TSLang`
language and runtime vs. the actual analogy-making code written in `TSLang`, we
use a distinct coding convention for each type of code:
1. Code implementing `TSLang`, specifically `ts_lib.py` and `runtime/*`, is
written using Google-style Python naming (e.g., `snake_case` for methods and
`PascalCase` for classes).
2. Code written in `TSLang` (everything else) is written using `PascalCase` for
method names.
This naming convention is enforced intra-file by `.pylintrc`.
#### Citing
```
@inproceedings{sifter:onward20,
author = {Matthew Sotoudeh and
Aditya V. Thakur},
title = {Analogy-Making as a Core Primitive in the Software Engineering Toolbox},
booktitle = {Proceedings of the 2020 {ACM} {SIGPLAN} International Symposium on
New Ideas, New Paradigms, and Reflections on Programming and Software,
Onward! 2020, November 18-20, Virtual, USA},
publisher = {{ACM}},
year = {2020},
url = {https://doi.org/10.1145/3426428.3426918},
doi = {10.1145/3426428.3426918},
}
```
#### People
- [Matthew Sotoudeh](https://masot.net/): email
[masotoudeh@ucdavis.edu](mailto:masotoudeh@ucdavis.edu).
- [Aditya Thakur](https://thakur.cs.ucdavis.edu/): email
[avthakur@ucdavis.edu](mailto:avthakur@ucdavis.edu).
#### License
Licensed under the [AGPLv3](LICENSE).
|