libSBML C API
libSBML 5.8.0 C API
|
Abstract Syntax Tree (AST) representation of a mathematical expression.
This class of objects is defined by libSBML only and has no direct equivalent in terms of SBML components. This class is not prescribed by the SBML specifications, although it is used to implement features defined in SBML.
Abstract Syntax Trees (ASTs) are a simple kind of data structure used in libSBML for storing mathematical expressions. The ASTNode is the cornerstone of libSBML's AST representation. An AST "node" represents the most basic, indivisible part of a mathematical formula and come in many types. For instance, there are node types to represent numbers (with subtypes to distinguish integer, real, and rational numbers), names (e.g., constants or variables), simple mathematical operators, logical or relational operators and functions. LibSBML ASTs provide a canonical, in-memory representation for all mathematical formulas regardless of their original format (which might be MathML or might be text strings).
An AST node in libSBML is a recursive structure containing a pointer to the node's value (which might be, for example, a number or a symbol) and a list of children nodes. Each ASTNode node may have none, one, two, or more children depending on its type. The following diagram illustrates an example of how the mathematical expression "1 +
2"
is represented as an AST with one plus node having two integer children nodes for the numbers 1
and 2
. The figure also shows the corresponding MathML representation:
Infix | AST | MathML |
---|---|---|
1 + 2
|
<math xmlns="http://www.w3.org/1998/Math/MathML"> <apply> <plus/> <cn type="integer"> 1 </cn> <cn type="integer"> 2 </cn> </apply> </math>
|
The following are other noteworthy points about the AST representation in libSBML:
A numerical value represented in MathML as a real number with an exponent is preserved as such in the AST node representation, even if the number could be stored in a double
data type. This is done so that when an SBML model is read in and then written out again, the amount of change introduced by libSBML to the SBML during the round-trip activity is minimized.
Rational numbers are represented in an AST node using separate numerator and denominator values. These can be retrieved using the methods ASTNode::getNumerator() and ASTNode::getDenominator().
Every ASTNode has an associated type code to indicate, for example, whether it holds a number or stands for an arithmetic operator. The list of possible types is quite long, because it covers all the mathematical functions that are permitted in SBML. The values are shown in the following table:
The types have the following meanings:
If the node is basic mathematical operator (e.g., "+"
), then the node's type will be AST_PLUS
, AST_MINUS
, AST_TIMES
, AST_DIVIDE
, or AST_POWER
, as appropriate.
If the node is a predefined function or operator from SBML Level 1 (in the string-based formula syntax used in Level 1) or SBML Levels 2 and 3 (in the subset of MathML used in SBML Levels 2 and 3), then the node's type will be either AST_FUNCTION_
X, AST_LOGICAL_
X, or AST_RELATIONAL_
X, as appropriate. (Examples: AST_FUNCTION_LOG
, AST_RELATIONAL_LEQ
.)
If the node refers to a user-defined function, the node's type will be AST_FUNCTION
(because it holds the name of the function).
If the node is a lambda expression, its type will be AST_LAMBDA
.
If the node is a predefined constant ("ExponentialE"
, "Pi"
, "True"
or "False"
), then the node's type will be AST_CONSTANT_E
, AST_CONSTANT_PI
, AST_CONSTANT_TRUE
, or AST_CONSTANT_FALSE
.
(Levels 2 and 3 only) If the node is the special MathML csymbol time
, the value of the node will be AST_NAME_TIME
. (Note, however, that the MathML csymbol delay
is translated into a node of type AST_FUNCTION_DELAY
. The difference is due to the fact that time
is a single variable, whereas delay
is actually a function taking arguments.)
(Level 3 only) If the node is the special MathML csymbol avogadro
, the value of the node will be AST_NAME_AVOGADRO
.
AST_INTEGER
, AST_REAL
, AST_REAL_E
, or AST_RATIONAL
, as appropriate. The text-string form of mathematical formulas produced by and read by and
are in a simple C-inspired infix notation. A formula in this text-string form can be handed to a program that understands SBML mathematical expressions, or used as part of a translation system. The libSBML distribution comes with an example program in the "examples"
subdirectory called translateMath
that implements an interactive command-line demonstration of translating infix formulas into MathML and vice-versa.
The formula strings may contain operators, function calls, symbols, and white space characters. The allowable white space characters are tab and space. The following are illustrative examples of formulas expressed in the syntax:
0.10 * k4^2
(vm * s1)/(km + s1)
The following table shows the precedence rules in this syntax. In the Class column, operand implies the construct is an operand, prefix implies the operation is applied to the following arguments, unary implies there is one argument, and binary implies there are two arguments. The values in the Precedence column show how the order of different types of operation are determined. For example, the expression a * b + c is evaluated as (a * b) + c because the *
operator has higher precedence. The Associates column shows how the order of similar precedence operations is determined; for example, a - b + c is evaluated as (a - b) + c because the +
and -
operators are left-associative. The precedence and associativity rules are taken from the C programming language, except for the symbol ^
, which is used in C for a different purpose. (Exponentiation can be invoked using either ^
or the function power
.)
Token | Operation | Class | Precedence | Associates |
---|---|---|---|---|
name | symbol reference | operand | 6 | n/a |
( expression) | expression grouping | operand | 6 | n/a |
f( ...) | function call | prefix | 6 | left |
- | negation | unary | 5 | right |
^ | power | binary | 4 | left |
* | multiplication | binary | 3 | left |
/ | divison | binary | 3 | left |
+ | addition | binary | 2 | left |
- | subtraction | binary | 2 | left |
, | argument delimiter | binary | 1 | left |
A program parsing a formula in an SBML model should assume that names appearing in the formula are the identifiers of Species, Parameter, Compartment, FunctionDefinition, Reaction (in SBML Levels 2 and 3), or SpeciesReference (in SBML Level 3 only) objects defined in a model. When a function call is involved, the syntax consists of a function identifier, followed by optional white space, followed by an opening parenthesis, followed by a sequence of zero or more arguments separated by commas (with each comma optionally preceded and/or followed by zero or more white space characters), followed by a closing parenthesis. There is an almost one-to-one mapping between the list of predefined functions available, and those defined in MathML. All of the MathML functions are recognized; this set is larger than the functions defined in SBML Level 1. In the subset of functions that overlap between MathML and SBML Level 1, there exist a few differences. The following table summarizes the differences between the predefined functions in SBML Level 1 and the MathML equivalents in SBML Levels 2 and 3:
Text string formula functions | MathML equivalents in SBML Levels 2 and 3 |
---|---|
acos | arccos |
asin | arcsin |
atan | arctan |
ceil | ceiling |
log | ln |
log10(x) | log(10, x) |
pow(x, y) | power(x, y) |
sqr(x) | power(x, 2) |
sqrt(x) | root(2, x) |