symbol table

{{Short description|Data structure used by a language translator such as a compiler or interpreter}}

{{More citations needed|date=November 2012}}

In computer science, a symbol table is a data structure used by a language translator such as a compiler or interpreter, where each identifier, symbol, constant, procedure and function in a program's source code is associated with information relating to its declaration or appearance in the source. In other words, the entries of a symbol table store the information related to the entry's corresponding symbol.{{sfn|Copper|Torczon|2011|p=253}}

Background

A symbol table may only exist in memory during the translation process, or it may be embedded in the output of the translation, such as in an ABI object file for later use. For example, it might be used during an interactive debugging session, or as a resource for formatting a diagnostic report during or after execution of a program.{{cite book|last1=Nguyen|first1=Binh|title=Linux Dictionary|date=2004|page=1482|url=https://books.google.com/books?id=vdZWBQAAQBAJ|accessdate=Apr 14, 2018}}

Description

The minimum information contained in a symbol table used by a translator and intermediate representation (IR) includes the symbol's name and its location or address. For a compiler targeting a platform with a concept of relocatability, it will also contain relocatability attributes (absolute, relocatable, etc.) and needed relocation information for relocatable symbols. Symbol tables for high-level programming languages may store the symbol's type: string, integer, floating-point, etc., its size, and its dimensions and its bounds. Not all of this information is included in the output file, but may be provided for use in debugging. In many cases, the symbol's cross-reference information is stored with or linked to the symbol table. Most compilers print some or all of this information in symbol table and cross-reference listings at the end of translation.{{sfn|Copper|Torczon|2011|p=253}}

Implementation

Numerous data structures are available for implementing tables. Trees, linear lists and self-organizing lists can all be used to implement a symbol table. The symbol table is accessed by most phases of a compiler, beginning with lexical analysis, and continuing through optimization.

A compiler may use one large symbol table for all symbols or use separated, or hierarchical symbol tables for different scopes. For example, in a scoped language such as Algol or PL/I a symbol "p" can be declared separately in several procedures, perhaps with different attributes. The scope of each declaration is the section of the program in which references to "p" resolve to that declaration. Each declaration represents a unique identifier "p". The symbol table must have some means of differentiating references to the different "p"s.

A common data structure used to implement symbol tables is the hash table. The time for searching in hash tables is independent of the number of elements stored in the table, so it is efficient for a large number of elements. It also simplifies the classification of literals in tabular format by including the classification in calculation of the hash key.{{sfn|Copper|Torczon|2011|p=254}}

As the lexical analyser spends a great proportion of its time looking up the symbol table, this activity has a crucial effect on the overall speed of the compiler. A symbol table must be organised in such a way that entries can be found as quickly as possible. Hash tables are usually used to organise a symbol table, where the keyword or identifier is 'hashed' to produce an array subscript. Collisions are inevitable in a hash table, and a common way of handling them is to store the synonym in the next available free space in the table.

Applications

An object file will contain a symbol table of the identifiers it contains that are externally visible. During the linking of different object files, a linker will identify and resolve these symbol references. Usually all undefined external symbols will be searched for in one or more object libraries. If a module is found that defines that symbol it is linked together with the first object file, and any undefined external identifiers are added to the list of identifiers to be looked up. This process continues until all external references have been resolved. It is an error if one or more remains unresolved at the end of the process.

While reverse engineering an executable, many tools refer to the symbol table to check what addresses have been assigned to global variables and known functions. If the symbol table has been stripped or cleaned out before being converted into an executable, tools will find it harder to determine addresses or understand anything about the program.

Example

Consider the following program written in C:

// Declare an external function

extern double bar(double x);

// Define a public function

double foo(int count)

{

double sum = 0.0;

// Sum all the values bar(1) to bar(count)

for (int i = 1; i <= count; i++)

sum += bar((double) i);

return sum;

}

A C compiler that parses this code will contain at least the following symbol table entries:

class="wikitable"
style="background:silver"

!style="width:8em" | Symbol name

!style="width:10em" | Type

!style="width:10em" | Scope

barfunction, doubleextern
xdoublefunction parameter
foofunction, doubleglobal
countintfunction parameter
sumdoubleblock local
iintfor-loop statement

In addition, the symbol table may also contain entries generated by the compiler for intermediate expression values (e.g., the expression that casts the i loop variable into a double, and the return value of the call to function bar()), statement labels, and so forth.

Example: SysV ABI

class="wikitable floatright"

|+Example table: SysV ABI

AddressTypeName
style="font-size:105%;font-family:monospace;;"

|00000020

style="text-align:center;" | aT_BIT
style="font-size:105%;font-family:monospace;;"

|00000040

style="text-align:center;" | aF_BIT
style="font-size:105%;font-family:monospace;;"

|00000080

style="text-align:center;" | aI_BIT
style="font-size:105%;font-family:monospace;;"

|20000004

style="text-align:center;" | tirqvec
style="font-size:105%;font-family:monospace;;"

|20000008

style="text-align:center;" | tfiqvec
style="font-size:105%;font-family:monospace;;"

|2000000c

style="text-align:center;" | tInitReset
style="font-size:105%;font-family:monospace;;"

|20000018

style="text-align:center;" | T_main
style="font-size:105%;font-family:monospace;;"

|20000024

style="text-align:center;" | tEnd
style="font-size:105%;font-family:monospace;;"

|20000030

style="text-align:center;" | TAT91F_US3_CfgPIO_useB
style="font-size:105%;font-family:monospace;;"

|2000005c

style="text-align:center;" | tAT91F_PIO_CfgPeriph
style="font-size:105%;font-family:monospace;;"

|200000b0

style="text-align:center;" | Tmain
style="font-size:105%;font-family:monospace;;"

|20000120

style="text-align:center;" | TAT91F_DBGU_Printk
style="font-size:105%;font-family:monospace;;"

|20000190

style="text-align:center;" | tAT91F_US_TxReady
style="font-size:105%;font-family:monospace;;"

|200001c0

style="text-align:center;" | tAT91F_US_PutChar
style="font-size:105%;font-family:monospace;;"

|200001f8

style="text-align:center;" | TAT91F_SpuriousHandler
style="font-size:105%;font-family:monospace;;"

|20000214

style="text-align:center;" | TAT91F_DataAbort
style="font-size:105%;font-family:monospace;;"

|20000230

style="text-align:center;" | TAT91F_FetchAbort
style="font-size:105%;font-family:monospace;;"

|2000024c

style="text-align:center;" | TAT91F_Undef
style="font-size:105%;font-family:monospace;;"

|20000268

style="text-align:center;" | TAT91F_UndefHandler
style="font-size:105%;font-family:monospace;;"

|20000284

style="text-align:center;" | TAT91F_LowLevelInit
style="font-size:105%;font-family:monospace;;"

|200002e0

style="text-align:center;" | tAT91F_DBGU_CfgPIO
style="font-size:105%;font-family:monospace;;"

|2000030c

style="text-align:center;" | tAT91F_PIO_CfgPeriph
style="font-size:105%;font-family:monospace;;"

|20000360

style="text-align:center;" | tAT91F_US_Configure
style="font-size:105%;font-family:monospace;;"

|200003dc

style="text-align:center;" | tAT91F_US_SetBaudrate
style="font-size:105%;font-family:monospace;;"

|2000041c

style="text-align:center;" | tAT91F_US_Baudrate
style="font-size:105%;font-family:monospace;;"

|200004ec

style="text-align:center;" | tAT91F_US_SetTimeguard
style="font-size:105%;font-family:monospace;;"

|2000051c

style="text-align:center;" | tAT91F_PDC_Open
style="font-size:105%;font-family:monospace;;"

|2000059c

style="text-align:center;" | tAT91F_PDC_DisableRx
style="font-size:105%;font-family:monospace;;"

|200005c8

style="text-align:center;" | tAT91F_PDC_DisableTx
style="font-size:105%;font-family:monospace;;"

|200005f4

style="text-align:center;" | tAT91F_PDC_SetNextTx
style="font-size:105%;font-family:monospace;;"

|20000638

style="text-align:center;" | tAT91F_PDC_SetNextRx
style="font-size:105%;font-family:monospace;;"

|2000067c

style="text-align:center;" | tAT91F_PDC_SetTx
style="font-size:105%;font-family:monospace;;"

|200006c0

style="text-align:center;" | tAT91F_PDC_SetRx
style="font-size:105%;font-family:monospace;;"

|20000704

style="text-align:center;" | tAT91F_PDC_EnableRx
style="font-size:105%;font-family:monospace;;"

|20000730

style="text-align:center;" | tAT91F_PDC_EnableTx
style="font-size:105%;font-family:monospace;;"

|2000075c

style="text-align:center;" | tAT91F_US_EnableTx
style="font-size:105%;font-family:monospace;;"

|20000788

style="text-align:center;" | T__aeabi_uidiv
style="font-size:105%;font-family:monospace;;"

|20000788

style="text-align:center;" | T__udivsi3
style="font-size:105%;font-family:monospace;;"

|20000884

style="text-align:center;" | T__aeabi_uidivmod
style="font-size:105%;font-family:monospace;;"

|2000089c

style="text-align:center;" | T__aeabi_idiv0
style="font-size:105%;font-family:monospace;;"

|2000089c

style="text-align:center;" | T__aeabi_ldiv0
style="font-size:105%;font-family:monospace;;"

|2000089c

style="text-align:center;" | T__div0
style="font-size:105%;font-family:monospace;;"

|200009a0

style="text-align:center;" | D_data
style="font-size:105%;font-family:monospace;;"

|200009a0

style="text-align:center;" | A_etext
style="font-size:105%;font-family:monospace;;"

|200009a4

style="text-align:center;" | A__bss_end__
style="font-size:105%;font-family:monospace;;"

|200009a4

style="text-align:center;" | A__bss_start
style="font-size:105%;font-family:monospace;;"

|200009a4

style="text-align:center;" | A__bss_start__
style="font-size:105%;font-family:monospace;;"

|200009a4

style="text-align:center;" | A_edata
style="font-size:105%;font-family:monospace;;"

|200009a4

style="text-align:center;" | A_end

An example of a symbol table can be found in the SysV Application Binary Interface (ABI) specification, which mandates how symbols are to be laid out in a binary file, so that different compilers, linkers and loaders can all consistently find and work with the symbols in a compiled object.

The SysV ABI is implemented in the GNU binutils' nm utility. This format uses a sorted memory address field, a "symbol type" field, and a symbol identifier (called "Name").{{cite web |title=nm |url=http://sourceware.org/binutils/docs-2.17/binutils/nm.html#nm|website=sourceware.org |accessdate=May 30, 2020}}

The symbol types in the SysV ABI (and nm's output) indicate the nature of each entry in the symbol table. Each symbol type is represented by a single character. For example, symbol table entries representing initialized data are denoted by the character "d" and symbol table entries for functions have the symbol type "t" (because executable code is located in the text section of an object file). Additionally, the capitalization of the symbol type indicates the type of linkage: lower-case letters indicate the symbol is local and upper-case indicates external (global) linkage.

Example: the Python symbol table

The Python programming language includes extensive support for creating and manipulating symbol tables.symtable — [https://docs.python.org/3/library/symtable.html Python documentation] Properties that can be queried include whether a given symbol is a free variable or a bound variable, whether it is block scope or global scope, whether it is imported, and what namespace it belongs to.

Example: Dynamic symbol tables

Some programming languages allow the symbol table to be manipulated at run-time, so that symbols can be added at any time. Racket is an example of such a language.Symbols - [https://docs.racket-lang.org/reference/symbols.html Racket Documentation]

Both the LISP and the Scheme programming languages allow arbitrary, generic properties to be associated with each symbol.Symbols - [https://www.gnu.org/software/guile/manual/html_node/Symbols.html Guile Documentation]

The Prolog programming language is essentially a symbol-table manipulation language; symbols are called atoms, and the relationships between symbols can be reasoned over. Similarly, OpenCog provides a dynamic symbol table, called the atomspace, which is used for knowledge representation.

See also

References

{{Reflist}}

Bibliography

  • {{cite book|title= Engineering a Compiler|edition=2|doi=10.1016/C2009-0-27982-7|publisher=Elsevier, Rice University|isbn=978-0-12-088478-0|location=Houston, Texas|first1=Keith D.|last1=Copper|first2=Linda|last2=Torczon|date=18 January 2011|s2cid=40425497|url=https://www.elsevier.com/books/engineering-a-compiler/cooper/978-0-12-088478-0}}

{{Authority control}}

Category:Compiler structures