Pointer analysis

{{short description|Determining what or where each pointer points to in program code}}

In computer science, pointer analysis, or points-to analysis, is a static code analysis technique that establishes which pointers, or heap references, can point to which variables, or storage locations. It is often a component of more complex analyses such as escape analysis. A closely related technique is shape analysis.

This is the most common colloquial use of the term. A secondary use has pointer analysis be the collective name for both points-to analysis, defined as above, and alias analysis. Points-to and alias analysis are closely related but not always equivalent problems.

Example

Consider the following C program:

int *id(int* p) {

return p;

}

void main(void) {

int x;

int y;

int *u = id(&x);

int *v = id(&y);

}

A pointer analysis computes a mapping from pointer expressions to a set of allocation sites of objects they may point to. For the above program, an idealized, fully precise analysis would compute the following results:

class="wikitable"
Pointer expressionAllocation site
&xmain::x
&ymain::y
umain::x
vmain::y
pmain::x, main::y

(Where X::Y represents the stack allocation holding the local variable Y in the function X.)

However, a context-insensitive analysis such as Andersen's or Steensgaard's algorithm would lose precision when analyzing the calls to id, and compute the following result:

class="wikitable"
Pointer expressionAllocation site
&xmain::x
&ymain::y
umain::x, main::y
vmain::x, main::y
pmain::x, main::y

Introduction

As a form of static analysis, fully precise pointer analysis can be shown to be undecidable.{{Cite journal|last=Reps|first=Thomas|date=2000-01-01|title=Undecidability of context-sensitive data-dependence analysis|journal=ACM Transactions on Programming Languages and Systems|volume=22|issue=1|pages=162–186|doi=10.1145/345099.345137|s2cid=2956433|issn=0164-0925|doi-access=free}} Most approaches are sound, but range widely in performance and precision. Many design decisions impact both the precision and performance of an analysis; often (but not always) lower precision yields higher performance. These choices include:{{cite conference

| title=Dimensions of Precision in Reference Analysis of Object-Oriented Programming Languages

| author=Barbara G. Ryder

| year=2003

| book-title=Compiler Construction, 12th International Conference, CC 2003 Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2003 Warsaw, Poland, April 7–11, 2003 Proceedings

| pages=126–137

|doi = 10.1007/3-540-36579-6_10| doi-access=free

}}{{harv|Hind}}

  • Field sensitivity (also known as structure sensitivity): An analysis can either treat each field of a struct or object separately, or merge them.
  • Array sensitivity: An array-sensitive pointer analysis models each index in an array separately. Other choices include modelling just the first entry separately and the rest together, or merging all array entries.
  • Context sensitivity or polyvariance: Pointer analyses may qualify points-to information with a summary of the control flow leading to each program point.
  • Flow sensitivity: An analysis can model the impact of intraprocedural control flow on points-to facts.
  • Heap modeling: Run-time allocations may be abstracted by:
  • their allocation sites (the statement or instruction that performs the allocation, e.g., a call to malloc or an object constructor),
  • a more complex model based on a shape analysis,
  • the type of the allocation, or
  • one single allocation (this is called heap-insensitivity).
  • Heap cloning: Heap- and context-sensitive analyses may further qualify each allocation site by a summary of the control flow leading to the instruction or statement performing the allocation.
  • Subset constraints or equality constraints: When propagating points-to facts, different program statements may induce different constraints on a variable's points-to sets. Equality constraints (like those used in Steensgaard's algorithm) can be tracked with a union-find data structure, leading to high performance at the expense of the precision of a subset-constraint based analysis (e.g., Andersen's algorithm).

Context-insensitive, flow-insensitive algorithms

Pointer analysis algorithms are used to convert collected raw pointer usages (assignments of one pointer to another or assigning a pointer to point to another one) to a useful graph of what each pointer can point to.{{cite conference

| url =https://www.zyrianov.org/papers/ICPC19.pdf

| title =srcPtr: A Framework for Implementing Static Pointer Analysis Approaches

| last1 = Zyrianov | first1 = Vlas

| last2 = Newman | first2 = Christian D.

| last3 = Guarnera | first3 = Drew T.

| last4 = Collard | first4 = Michael L.

| last5 = Maletic | first5 = Jonathan I.

| year =2019

| book-title =ICPC '19: Proceedings of the 27th IEEE International Conference on Program Comprehension

| location = Montreal, Canada

| publisher =IEEE

}}

Steensgaard's algorithm and Andersen's algorithm are common context-insensitive, flow-insensitive algorithms for pointer analysis. They are often used in compilers, and have implementations in [https://github.com/SVF-tools/SVF SVF]

{{cite conference

| url = https://yuleisui.github.io/publications/cc16.pdf

| title = SVF: interprocedural static value-flow analysis in LLVM

| last1 = Sui | first1=Yulei

| last2 = Xue | first2=Jingling

| year = 2016

| book-title = CC'16: Proceedings of the 25th international conference on compiler construction

| publisher = ACM

}}

and LLVM.

Flow-insensitive approaches

Many approaches to flow-insensitive pointer analysis can be understood as forms of abstract interpretation, where heap allocations are abstracted by their allocation site (i.e., a program location).{{Cite book|last1=Smaragdakis|first1=Yannis|last2=Bravenboer|first2=Martin|last3=Lhoták|first3=Ondrej|title=Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages |chapter=Pick your contexts well |date=2011-01-26|chapter-url=https://doi.org/10.1145/1926385.1926390|series=POPL '11|location=Austin, Texas, USA|publisher=Association for Computing Machinery|pages=17–30|doi=10.1145/1926385.1926390|isbn=978-1-4503-0490-0|s2cid=6451826}}

File:Pointer Analysis - Abstracting Memory Addresses by Their Allocation Site.svg

Many flow-insensitive algorithms are specified in Datalog, including those in the Soot analysis framework for Java.{{Cite book|last1=Antoniadis|first1=Tony|last2=Triantafyllou|first2=Konstantinos|last3=Smaragdakis|first3=Yannis|title=Proceedings of the 6th ACM SIGPLAN International Workshop on State of the Art in Program Analysis |chapter=Porting doop to Soufflé |date=2017-06-18|chapter-url=https://doi.org/10.1145/3088515.3088522|series=SOAP 2017|location=Barcelona, Spain|publisher=Association for Computing Machinery|pages=25–30|doi=10.1145/3088515.3088522|isbn=978-1-4503-5072-3|s2cid=3074689}}

Context-sensitive, flow-sensitive algorithms achieve higher precision, generally at the cost of some performance, by analyzing each procedure several times, once per context.{{harv|Smaragdakis|Balatsouras|p=29}} Most analyses use a "context-string" approach, where contexts consist of a list of entries (common choices of context entry include call sites, allocation sites, and types).{{Cite journal|last1=Thiessen|first1=Rei|last2=Lhoták|first2=Ondřej|date=2017-06-14|title=Context transformations for pointer analysis|url=https://doi.org/10.1145/3140587.3062359|journal=ACM SIGPLAN Notices|volume=52|issue=6|pages=263–277|doi=10.1145/3140587.3062359|issn=0362-1340|url-access=subscription}} To ensure termination (and more generally, scalability), such analyses generally use a k-limiting approach, where the context has a fixed maximum size, and the least recently added elements are removed as needed.{{harv|Li|Tan|Møller|Smaragdakis|pp=1:4}} Three common variants of context-sensitive, flow-insensitive analysis are:{{harv|Smaragdakis|Balatsouras}}

  • Call-site sensitivity
  • Object sensitivity
  • Type sensitivity

=Call-site sensitivity=

In call-site sensitivity, the points-to set of each variable (the set of abstract heap allocations each variable could point to) is further qualified by a context consisting of a list of callsites in the program. These contexts abstract the control-flow of the program.

The following program demonstrates how call-site sensitivity can achieve higher precision than a flow-insensitive, context-insensitive analysis.

int *id(int* p) {

return p;

}

void main(void) {

int x;

int y;

int *u = id(&x); // main.3

int *v = id(&y); // main.4

}

For this program, a context-insensitive analysis would (soundly but imprecisely) conclude that {{var|p}} can point to either the allocation holding {{var|x}} or that of {{var|y}}, so {{var|u}} and {{var|v}} may alias, and both could point to either allocation:

class="wikitable"
Pointer expressionAllocation site
&xmain::x
&ymain::y
umain::x, main::y
vmain::x, main::y
pmain::x, main::y

A callsite-sensitive analysis would analyze {{var|id}} twice, once for main.3 and once for main.4, and the points-to facts for {{var|p}} would be qualified by the call-site, enabling the analysis to deduce that when {{var|main}} returns, {{var|u}} can only point to the allocation holding {{var|x}} and {{var|v}} can only point to the allocation holding {{var|y}}:

class="wikitable"
ContextPointer expressionAllocation site
[]&xmain::x
[]&ymain::y
[]umain::x
[]vmain::y
[main.3]pmain::x
[main.4]pmain::y

=Object sensitivity=

In an object sensitive analysis, the points-to set of each variable is qualified by the abstract heap allocation of the receiver object of the method call. Unlike call-site sensitivity, object-sensitivity is non-syntactic or non-local: the context entries are derived during the points-to analysis itself.{{harv|Smaragdakis|Balatsouras|p=37}}

=Type sensitivity=

Type sensitivity is a variant of object sensitivity where the allocation site of the receiver object is replaced by the class/type containing the method containing the allocation site of the receiver object.{{harv|Smaragdakis|Balatsouras|p=39}} This results in strictly fewer contexts than would be used in an object-sensitive analysis, which generally means better performance.

References

{{Reflist}}

Bibliography

  • {{cite conference

| url =https://www.zyrianov.org/papers/ICPC19.pdf

| title =srcPtr: A Framework for Implementing Static Pointer Analysis Approaches

| last1 = Zyrianov | first1 = Vlas

| last2 = Newman | first2 = Christian D.

| last3 = Guarnera | first3 = Drew T.

| last4 = Collard | first4 = Michael L.

| last5 = Maletic | first5 = Jonathan I.

| year =2019

| book-title =ICPC '19: Proceedings of the 27th IEEE International Conference on Program Comprehension

| location = Montreal, Canada

| publisher =IEEE

}}

  • {{cite journal

|last1=Smaragdakis

|first1=Yannis

|last2=Balatsouras

|first2=George

|date=2015

|title=Pointer Analysis

|url=https://yanniss.github.io/points-to-tutorial15.pdf

|journal=Foundations and Trends in Programming Languages

|volume=2

|issue=1

|pages=1–69

|access-date=May 30, 2019

|doi=10.1561/2500000014

|s2cid=207179267

}}

  • {{cite journal

|last1=Li

|first1=Yue

|last2=Tan/

|first2=Tian

|last3=Møller

|first3=Anders

|last4=Smaragdakis

|first4=Yannis

|date=2020-05-18

|title=A Principled Approach to Selective Context Sensitivity for Pointer Analysis

|url=https://doi.org/10.1145/3381915

|journal=ACM Transactions on Programming Languages and Systems

|volume=42

|issue=2

|pages=10:1–10:40

|doi=10.1145/3381915

|s2cid=214812357

|issn=0164-0925

}}

  • {{cite conference

| url =http://www.cs.trinity.edu/~mlewis/CSCI3294-F01/Papers/p54-hind.pdf

| title =Pointer analysis: haven't we solved this problem yet?

| author =Michael Hind

| year =2001

| book-title =PASTE '01: Proceedings of the 2001 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering

| pages =54–61

| isbn =1-58113-413-4

| publisher =ACM

}}

  • {{cite conference

| url =http://cms.dc.uba.ar/materias/aap/2008/c2/descargas/pointsTo.pdf

| title =Points-to analysis in almost linear time

| last = Steensgaard | first = Bjarne

| year =1996

| book-title =POPL '96: Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages

| pages =32–41

| doi = 10.1145/237721.237727

| isbn =0-89791-769-3

| location = New York, NY, USA

| publisher =ACM

}}

  • {{cite thesis

| degree =PhD

| title =Program Analysis and Specialization for the C Programming Language

| url =http://www.cs.cornell.edu/courses/cs711/2005fa/papers/andersen-thesis94.pdf

| author =Andersen, Lars Ole

| year =1994

}}

{{DEFAULTSORT:Pointer Analysis}}

{{Compiler optimizations}}

Category:Static program analysis

Category:Pointers (computer programming)