Lab 4: Datalog
Fall Semester 2019
Due: 21 October, 8:00 a.m. Eastern Time
Corresponding Lecture: Lesson 7 (Constraint Based Analysis)
Objective
In this lab, you will implement two static analyses using a declarative logic programming
language, Datalog. Declarative languages like Datalog are different from imperative languages
like C in that they specify what the program should accomplish but not how the program should
accomplish it.
For those without experience in declarative languages, Datalog relations are similar to SQL
(Structured Query Language) queries. The analyses you will implement are modified-reference
(mod-ref) analysis and local variable analysis. Both analyses are flow-insensitive,
context-insensitive, may (as opposed to must), and interprocedural (requiring reasoning across
method boundaries).
Note on Past Issues
This lab has seen very large numbers of students submitting work that was not their own in past
semesters. It has caused a high number of students to be submitted to the Office of Student
Integrity for Academic Integrity violations. In particular, it’s possible to find solutions to similar
analyses on the internet. Looking at these solutions in any form is likely to influence your
thinking and cause your solutions to be similar to them, especially given the condensed syntax of
Datalog, which is an Academic Integrity violation in this class. If you are unclear of our
guidelines for what is collaboration and what is cheating, we suggest reviewing that section of
the syllabus. If you have any questions about what is allowed and what is not allowed, please
reach out to Instructors via Piazza for clarification.
Students who submit solutions found to be similar to online resources or other students should
expect a 0 grade on the lab, a disciplinary record of an Academic Integrity issue through the
Office of Student Integrity, and will not be eligible to receive a final grade of A in the course.
Students who have had past Academic Integrity issues may find that OSI assigns them higher
penalties.
Important Documentation (You will need to read this to complete the lab!)
● How to write an analysis in Datalog:
https://github.com/petablox-project/petablox/wiki/Datalog-Analysis
● Predefined relations:
https://github.com/petablox-project/petablox/wiki/Predefined-Analyses
Help with Understanding Recursion in Datalog
● Learn Prolog Now website with good tutorials on recursion (Datalog is a subset of
Prolog) [chapters 1 and 3 of the online book in particular]. The example problems are
great material to prepare for exam questions on Datalog:
http://www.learnprolognow.org/lpnpage.php?pageid=online
Online Datalog Test Environment
● Allows you to practice writing Datalog in your web browser. The “Rules” section allows
you to populate relations both from data and by writing rules. The “Query” section allows
you to write a relation whose output will be shown at the bottom of the screen.
Unfortunately most of the examples on the site are in German, so we recommend starting
with the two examples where the code is in English (“Umlaufbahnen von
kosmologischen Objekten” (“Orbits of Cosmological Objects”), “Pfade in einem
Graphen” (“Paths in a Graph”), and “Computer Parts”). For a more complex example,
there are two samples covering paths between stations on parts of the Munich U-Bahn
subway’s stations, where the German isn’t too difficult to figure out:
https://datalog.db.in.tum.de/
Additional Petablox Resources
● Petablox repository:
https://github.com/petablox-project/petablox
● Petablox user guide:
https://github.com/petablox-project/petablox/wiki
Setup
1. Download Datalog.zip from Canvas. When uncompressed, it will produce the directory
Datalog/.
2. Build the example Java program to be analyzed (TestCase) by running ant in the
directory Datalog/examples/test_case/. This will produce the subdirectory classes/.
3. Write your mod-ref analysis in the file Datalog/src/modreference.dlog and your local
variables analysis in the file Datalog/src/local_vars.dlog. See below for more details
on these analyses.
4. To run your analyses, run the following commands in Datalog/:
ant -Dpetablox.work.dir=examples/test_case/ modreference
ant -Dpetablox.work.dir=examples/test_case/ local_vars
6. You can find the output of the analyses under the directory
Datalog/examples/test_case/petablox_output/.
7. Sample outputs for the test cases are provided in Datalog/sample_output/.
8. We have provided a script compare.sh that you can use to verify your analyses against
the provided sample output. From the Datalog/ directory, grant the script execute
permissions with the command chmod +x compare.sh and then use the command
./compare.sh modreference or ./compare.sh local_vars to execute the script. The
script output will indicate how closely your most recent analysis run matches the sample
solutions. Note that you may receive an error for test cases you have not yet executed.
9. If your analysis does not run correctly, you may view any errors that were encountered in
petablox_output/log.txt.
10. Both analyses will use a pre-existing, flow- and context-insensitive pointer analysis with
on-the-fly call graph construction (henceforth called pointer/call-graph analysis). This
analysis is provided in Datalog/cipa_0cfa.dlog. You can study it to learn how to write
your own analyses in Datalog, however you should not spend time trying to understand
each relation defined in this analysis. It declares all the relations (both input and output)
that you will need to write your analyses. Of particular interest are:
○ reachable methods in the relation – reachableM
○ the call graph in the relation IM
○ points-to information in the relations VH, FH, and HFH
○ the relation specifying the containing method of each call site – MI
NOTE: Any changes you make to this file will not be reflected when you run your
analyses. This file is provided as a reference only. Petablox uses an analogous file in
petablox.jar for the purpose of executing analyses.
Part 1: Modified-Reference (mod-ref) Analysis
The goal of mod-ref analysis is to determine which memory locations may be modified and
which memory locations may be referenced during the execution of a method. The execution of a
method includes the execution of any methods called by that method, directly or transitively.
Your analysis must consider each reachable method (that is, for each method in relation
reachableM). The output of your analysis must be in the form of the following four relations:
1. The relation refStatField⊆ (M × F) containing each tuple (m, f) such that
pointer-typed static field f may be read during the execution of method m.
2. The relation modStatField⊆ (M × F) containing each tuple (m, f) such that
pointer-typed static field f may be written to during the execution of method m.
3. The relation modInstField⊆ (M × H × F) containing each tuple (m, h, f) such that the
pointer-typed instance field f of an object allocated at site h may be written to during the
execution of method m. Note that f may be the element 0 in the domain 𝔽, in which case
it denotes some pointer-typed element of an array allocated at site h.
4. The relation refInstField⊆ (M × H × F) containing each tuple (m, h, f) such that
pointer-typed instance field f of an object allocated at site h may be read during the
execution of method m. As in the preceding item, f may denote a pointer-typed array
element.
Use the following relations to write your analysis. They are also used as input relations in the
pointer/call-graph analysis provided to you:
1. The relation MgetStatFldInst⊆ (M × V × F) contains each tuple (m, v, f) such that the
body of method m contains the instruction v = f, where f is a static field of pointer type.
2. The relation MputStatFldInst⊆ (M × F × V) contains each tuple (m, f, v) such that the
body of method m contains the instruction f = v, where f is a static field of pointer type.
3. The relation MgetInstFldInst⊆ (M × V × V × F) contains each tuple (m, v, b, f) such
that the body of method m contains the instruction v = b.f, where f is either an instance
field of pointer type or it is the distinguished hypothetical field (denoted by element 0 in
domain F) that designates any array element.
4. The relation MputInstFldInst⊆ (M × V × F × V) contains each tuple (m, b, f, v) such
that the body of method m contains the instruction b.f = v, where f is either an instance
field of pointer type or it is the distinguished hypothetical field (denoted by element 0 in
domain F) that designates any array element.
You can also use the output relations of the pointer/call-graph analysis provided to you. You
will need to use both call-graph information (to determine method reachability) and points-to
information (to determine the sites that may allocate objects whose instance fields or array
elements may be read/written by a particular method). You can find the descriptions of these
relations at the link provided in the Resources section above.
Datalog supports recursive relations. A relation which is used in its own definition is recursive.
Recursive relations must also have a base case. For example, a recursive relation P defined in
terms of other relations Q and R may look something like this (the second relation is the base
case):
P(i) :- P(i), Q(i).
P(i) :- R(i).
The execution of a method includes the execution of any methods that are called by it directly or
transitively. You can use relations MI and IM recursively to determine such methods.
For this lab, we will ignore reporting static fields, instance fields, and array elements of
non-pointer type (called primitive types in Java, e.g., boolean, int, double, etc.) that may be read
or written during method execution. The above relations such as MgetStatFldInst, etc. already
ignore such fields.
Part 2: Local Variable Analysis
The goal of local variables analysis is to determine for each reachable method m and for each
object allocation site h in the body of m, whether any object allocated at h in any invocation of m
may outlive that invocation. If so, the site is called method-escaping; if not, it is called
method-local and all objects created at that site at runtime can be allocated on the method
invocation stack (instead of being allocated in the heap, which is more expensive).
An object may outlive the invocation of its allocating method if any of the following conditions
hold:
● An object allocated at site h may become reachable from some global variable (i.e., a
static field in Java).
● An object allocated at site h may become reachable from some argument of method m.
● An object allocated at site h may become reachable from some return variable of method
m.
Your analysis must consider each reachable method (that is, for each method in relation
reachableM). The output of your analysis must be a relation localMH⊆ (M × H) containing each
tuple (m, h) such that allocation site h contained in the body of reachable method m is
method-local and a relation escapingMH⊆ (M × H) containing each tuple (m, h) such that
allocation site h contained in the body of reachable method m is method-escaping.
Use the following relations to write your analysis.
● The relation MmethArg⊆ (M × Z× V) contains each tuple (m, z, v) such that variable v is
the zth actual argument of method m. (Z denotes the set of integers.)
● The relation MmethRet⊆ (M × Z× V) contains each tuple (m, z, v) such that variable v is
the return value of method m (you can ignore z).
In addition, you may use the output relations of the pointer/call-graph analysis provided to you
and described above. You will need to use both call-graph information (to determine method
reachability) and points-to information (to determine the sites that may allocate objects that may
escape). You can find the descriptions of these relations at the link provided in the Resources
section above.
Note that this analysis requires reasoning about reachability in the heap (as exemplified by the
use of the word “reachable” in the above three items); use the relation HFH recursively for this
purpose. The relations VH and FH will provide the base cases for the recursion.
Items to Submit
We expect your submission to conform to the standards specified below. To ensure that your
submission is correct, you should run the provided file validator. You should not expect
submissions that have an improper folder structure, incorrect file names, or in other ways do not
conform to the specification to score full credit. The file validator will check folder structure and
names match what is expected for this lab, but won’t (and isn’t intended to) catch everything.
The command to run the tool is: python3 zip_validator.py lab4 lab4.zip
Submit the following files in a single compressed file (.zip format) named lab4.zip. For full
credit, there must not be any subfolders or extra files contained within your zip file.
1. (50 points) modreference.dlog
2. (50 points) local_vars.dlog
Grading Criteria
Generally, credit for this lab is awarded as follows:
● For each program, both analyses will be equally weighted
● All outputs of the analysis will be compared against the correct output
● Each output is equally weighted
● For each output, your score will be % of the [# expected tuples]
[# expected tuples]−[# missing tuples]−[# extra tuples]
total possible points for that output. For example, an output with 10 expected tuples
where you found all 10 expected tuples but also two additional tuples would score
% 0% of the possible credit 10
10−0−2 = 8
Your datalog analyses will be graded against several programs; the three provided benchmark
programs will constitute at least half of your grade but other programs will be run against your
analysis as well. In general, the programs used in grading but not provided as part of the lab are
of the same order of complexity as the provided programs. Note that programs will be weighted
so more complicated programs are worth more credit.
We encourage you to write your own programs and analyze them with your two analyses.
Please feel free to share your programs to be analyzed and your output with your classmates on
Piazza. The course staff will not be evaluating these programs for correctness, but you should
compare your output with your classmates to gain additional insight into if your analyses are
correct. While we encourage you to write and share these programs, please do not share your
Datalog code.
 
		 
                 
                