40-Design.md 14.6 KB
Newer Older
Nicolas Lenz's avatar
Nicolas Lenz committed
1
# Design {#sec:design}
Nicolas Lenz's avatar
Nicolas Lenz committed
2

Nicolas Lenz's avatar
Various    
Nicolas Lenz committed
3
In this chapter we introduce and explain the design as well as the design decisions of Lightfold. We start by setting the design goals and explaining the basic architecture of Lightfold and its internal representations. We continue by presenting the core language for Lightfold expressions and their surface syntax, and finish by describing how Lightfold programs are structured and introducing their syntax.
Nicolas Lenz's avatar
Nicolas Lenz committed
4

5
## Design Goals
Nicolas Lenz's avatar
Various    
Nicolas Lenz committed
6

Nicolas Lenz's avatar
Fixes    
Nicolas Lenz committed
7
Lightfold is supposed to be a proof-of-concept to demonstrate dependent typing and algebraic compilation techniques. Therefore both the implementation and the usage experience aim to be simple and comprehensible as primary design goal. Another objective to keep Lightfold easily extensible for future development. Furthermore we aim for syntax and semantics that allow concise, readable, repetition-free and safe code.
8

Nicolas Lenz's avatar
Nicolas Lenz committed
9
## Architecture {#sec:design:architecture}
Nicolas Lenz's avatar
Nicolas Lenz committed
10

11
Lightfold programs are processed in multiple steps.
Nicolas Lenz's avatar
Nicolas Lenz committed
12

13
Parsing
14
  : is the step of reading Lightfold source code and structuring it into an abstract syntax tree algebra (LightfoldAST).
Nicolas Lenz's avatar
Nicolas Lenz committed
15

16
17
Compilation to LightfoldCore
  : is performed using an AST-to-LightfoldCore-algebra.
Nicolas Lenz's avatar
Nicolas Lenz committed
18

19
Type checking
Nicolas Lenz's avatar
Nicolas Lenz committed
20
  : is done on LightfoldCore terms.
Nicolas Lenz's avatar
Nicolas Lenz committed
21

22
Evaluation
Nicolas Lenz's avatar
Nicolas Lenz committed
23
  : is done on type checked LightfoldCore terms.
Nicolas Lenz's avatar
Nicolas Lenz committed
24

Nicolas Lenz's avatar
Nicolas Lenz committed
25
The two internal representations, LightfoldAST and LightfoldCore, are defined as abstract data types using signatures with Σ-terms and fold functions as introduced in [section @sec:theory:algebraic]. Parsing is done directly into an LightfoldAST-algebra as explained in [section @sec:methods:algebraic]. Using an algebra targeting LightfoldCore, we get a LightfoldCore program.
Nicolas Lenz's avatar
Nicolas Lenz committed
26

Nicolas Lenz's avatar
Typos    
Nicolas Lenz committed
27
LightfoldCore as the main internal representation of Lightfold programs is closely based on the dependently typed lambda calculus \lp introduced with its rules for evaluation and typing in [section @sec:dependent]. We use a core language close to the base calculus to allow for easier application of the various theoretical methods from [chapter @sec:theory].
Nicolas Lenz's avatar
Various    
Nicolas Lenz committed
28

Nicolas Lenz's avatar
Fixes    
Nicolas Lenz committed
29
Type checking and evaluation is performed on LightfoldCore terms using recursive algorithms based on the typing and evaluation rules. That shows another advantage of using a small core language: It keeps Lightfold extensible. Features can be added without changes to the core part of the language, as long as they can be translated into LightfoldCore.
Nicolas Lenz's avatar
Nicolas Lenz committed
30

31
Note that while both LightfoldAST and LightfoldCore are defined through signatures as abstract data types and used as such, in the following we will only show the concrete Σ-term types for easier understanding and brevity.
Nicolas Lenz's avatar
Various    
Nicolas Lenz committed
32

Nicolas Lenz's avatar
Nicolas Lenz committed
33
## Terms {#sec:design:terms}
Nicolas Lenz's avatar
Nicolas Lenz committed
34

Nicolas Lenz's avatar
Nicolas Lenz committed
35
The most important part of Lightfold are terms: They are needed to represent values and types. LightfoldCore's abstract syntax for terms in [listing @fig:def:core-term] is closely based on the abstract syntax of \lp shown in [section @sec:dependent].
Nicolas Lenz's avatar
Nicolas Lenz committed
36

Nicolas Lenz's avatar
Nicolas Lenz committed
37
38
Since our type system for \lp uses the bidirectional typing from [section @sec:bidirectional], we follow the same approach as Löh, McBride and Swierstra in [@lambdapi] and introduce separate types for representation of checkable (`TermCheck`) and inferable terms (`TermInfer`) for improved type safety: The type inferrer is able to ensure it gets an inferable term as input on the type level and vice versa.

39
40
::: {.listing #fig:def:core-term title="LighfoldCore Terms Abstract Syntax"}
This is the abstract syntax of LightfoldCore terms defined as Haskell data types.
Nicolas Lenz's avatar
Various    
Nicolas Lenz committed
41

42
43
44
45
```haskell
data TermCheck
    = Infer TermInfer
    | Lambda TermCheck
Nicolas Lenz's avatar
Nicolas Lenz committed
46
47
48
```

\pagebreak
Nicolas Lenz's avatar
Various    
Nicolas Lenz committed
49

Nicolas Lenz's avatar
Nicolas Lenz committed
50
```haskell
51
52
53
54
55
56
57
58
data TermInfer
    = Annotation TermCheck TermCheck
    | Star
    | Pi TermCheck TermCheck
    | Bound Int
    | Free Name
    | Application TermInfer TermCheck
    | Elimination Text TermCheck [TermCheck] TermCheck
Nicolas Lenz's avatar
Various    
Nicolas Lenz committed
59

60
61
62
data Name = Global Text | Local Int
```
:::
Nicolas Lenz's avatar
Various    
Nicolas Lenz committed
63

Nicolas Lenz's avatar
Fixes    
Nicolas Lenz committed
64
Applications are represented as pairs of a function and an argument, functions with multiple arguments are nested. LightfoldCore employs the representation of bound variables using de Bruijn indices introduced in [section @sec:de-bruijn]. Bound and free variables have separate constructors: bound variables have an integer for the de Bruijn index in the `Bound` constructor, free variables have either a de Bruijn index with the `Name` constructor `Local` if it is a variable pointing to a binder outside of the current scope or it can reference a value definition by name with `Global`. Data types and their constructors are represented using named free variables as explained in [section @sec:eliminators].
Nicolas Lenz's avatar
Nicolas Lenz committed
65

Nicolas Lenz's avatar
Nicolas Lenz committed
66
Because binders are referenced using de Bruijn indices, the `Lambda` and `Pi` constructors do not need an identifier string.
Nicolas Lenz's avatar
Nicolas Lenz committed
67

68
Eliminations from [section @sec:eliminators] have their own constructor. An elimination has a text with the name of the type that is being eliminated. The first argument term is the motive, the list of terms contains the operations, there should be one for each constructor of the type. The last argument term is the elimination input.
Nicolas Lenz's avatar
Nicolas Lenz committed
69

Nicolas Lenz's avatar
Nicolas Lenz committed
70
### Surface Syntax
Nicolas Lenz's avatar
Nicolas Lenz committed
71

Nicolas Lenz's avatar
Nicolas Lenz committed
72
Lightfold's surface syntax for terms is based on the following principles:
Nicolas Lenz's avatar
Nicolas Lenz committed
73

Nicolas Lenz's avatar
Nicolas Lenz committed
74
75
- Anything between `/*` and `*/` or following a `//` in the same line is regarded as comment and ignored. This comment syntax is the same as in many languages, including C, C++ and Java.

Nicolas Lenz's avatar
Fixes    
Nicolas Lenz committed
76
- As identifiers any strings consisting of numbers and lower-case or upper-case latin letters are allowed as long as the first character is a letter and no number.
Nicolas Lenz's avatar
Nicolas Lenz committed
77

78
- Named variables are used instead of de Bruijn indices. While de Bruijn indices are handy for internal representation, named variables are better suited for the surface syntax as they allow for meaningful names to be used and do not have to be changed when adding or removing binders.
Nicolas Lenz's avatar
Nicolas Lenz committed
79

Nicolas Lenz's avatar
Fixes    
Nicolas Lenz committed
80
- Type annotations are written using a single colon, for example `x : Nat`, which is concise and what is used in mathematics. This syntax is, among others, used by Idris. Haskell, that we previously used as base for pseudocode, uses two colons.
Nicolas Lenz's avatar
Nicolas Lenz committed
81

Nicolas Lenz's avatar
Nicolas Lenz committed
82
- As common in functional programming languages like Haskell, function application does not require parentheses to keep it as terse as possible: a blank is enough, for example: `f x`. In \lp and LightfoldCore application is always between exactly one function and one argument, so a chain of applications like `f x y z` is actually nested: `((f x) y) z`. However, in our syntax, we model application chains as a simple repetition. The conversion into nested one-on-one applications happens during the compilation to LightfoldCore.
Nicolas Lenz's avatar
Nicolas Lenz committed
83

84
- Parentheses, as usual, can be used to denote what belongs to what in nested terms.
Nicolas Lenz's avatar
Nicolas Lenz committed
85

Nicolas Lenz's avatar
Nicolas Lenz committed
86
- Lambda terms are written with a double arrow `=>` between argument and result, for example: `x => f x`. This is the same syntax that C#[^lambda-cs] and JavaScript[^lambda-js] use for lambda terms.
Nicolas Lenz's avatar
Nicolas Lenz committed
87

88
  Some imperative languages like Java[^lambda-java] use a single arrow for lambdas: `x -> f(x)`. Haskell[^lambda-hs] uses a single arrow together with a backslash symbolizing a λ, for example `\x -> f x`, Python[^lambda-py] uses "lambda" and a colon: `lambda x: f(x)`.
Nicolas Lenz's avatar
Nicolas Lenz committed
89

Nicolas Lenz's avatar
Fixes    
Nicolas Lenz committed
90
  However, for Lightfold we decided on the double arrow syntax from C#, because it is both as concise as Java's syntax and easy to visually distinguish from function terms like Python's syntax and unlike Java's and Haskell's syntax.
91

Nicolas Lenz's avatar
Various    
Nicolas Lenz committed
92
- Function types are denoted by an identifier, a colon and the range type in parantheses, followed by a single arrow and the domain type, for instance `(n : Nat) -> Nat`. This fits well with the annotation syntax, as `n` indeed has the type `Nat` when bound. The single arrow for function types is commonly used in functional programming languages like Haskell, and the dependently typed language Idris uses this syntax as well^[<https://docs.idris-lang.org/en/latest/reference/syntax-guide.html#dependent-functions>].
Nicolas Lenz's avatar
Nicolas Lenz committed
93

Nicolas Lenz's avatar
Fixes    
Nicolas Lenz committed
94
  That both lambdas and function types use arrows also symbolizes their similarities: Both bind a value to the identifier written before the arrow in the following term.
95

Nicolas Lenz's avatar
Nicolas Lenz committed
96
- Variables are simply denoted by their identifier. The identifier `_` is special: It can be used as identifier in binders but causes an error when used as variable.
97
98
99

  Using `_` in binders for inputs therefore ensures and shows intuitively that the variable bound will not be used. This behavior is inspired by Haskell.

Nicolas Lenz's avatar
Nicolas Lenz committed
100
101
- The keyword `Type` is used to denote the type of types which is processed to `Star` in LightfoldCore.

Nicolas Lenz's avatar
Fixes    
Nicolas Lenz committed
102
- Lastly, the keyword `elim` is processed as elimination. It expects an identifier with the type to be eliminated and some arguments. The first argument is processed as motive, the last one as elimination input and all in between are processed as the operations.
103
104
105
106

These rules lead to the formal syntax specified in [definition @fig:def:term-syntax].

::: {.definition #fig:def:term-syntax title="Lightfold Term Syntax"}
Nicolas Lenz's avatar
Fixes    
Nicolas Lenz committed
107
The syntax for terms is defined using Backus-Naur form. $x\ ^{*}$ means that $x$ can occur zero or more times, $x\ ^{+}$ means that $x$ must occur at least once.
108
109
\begin{align*}
term
110
  &\bnfdef \textit{termInner}\ \texttt{:}\ \textit{term} \\
Nicolas Lenz's avatar
Nicolas Lenz committed
111
  &\bnfalt \textit{termInner}\ ^{+}
112
113
114
\end{align*}
\begin{align*}
termInner
115
116
117
  &\bnfdef \texttt{Type} \\
  &\bnfalt \texttt{elim}\ \textit{identifier}\ \textit{termInner}\ ^{*} \\
  &\bnfalt \texttt{(}\ \textit{term}\ \texttt{)} \\
Nicolas Lenz's avatar
Nicolas Lenz committed
118
119
120
  &\bnfalt \textit{identifier}\ \texttt{=>}\ \textit{term} \\
  &\bnfalt \texttt{(}\ \textit{identifier}\ \texttt{:}\ \textit{term}\ \texttt{)}\ \texttt{->}\ \textit{term} \\
  &\bnfalt \textit{identifier}
121
122
\end{align*}
:::
Nicolas Lenz's avatar
Nicolas Lenz committed
123

Nicolas Lenz's avatar
Fixes    
Nicolas Lenz committed
124
We use two non-terminals in the grammar, $term$ and $termInner$. We do that to allow the grammar to be parsed in a left-to-right fashion. For example, without the separation the rule for annotations would look like this: $term \bnfdef term : term$. This is a rule for the non-terminal *term* that has *term* itself as first symbol: it is left-recursive [@leftrecursion]. This is problematic, as a left-to-right parser would descend into an endless loop when trying to parse a term, because when parsing a term, it would try to parse a term next, and so on. By separating terms into normal and inner terms, we can circumvent this problem.
125

Nicolas Lenz's avatar
Nicolas Lenz committed
126
Note that a single *termInner* can be embedded in a *term* through *term*'s second rule: $\textit{termInner}^{+}$ also allows for a single repetition.
127

128
## Programs
129

Nicolas Lenz's avatar
Fixes    
Nicolas Lenz committed
130
The general structure of a Lightfold program is inspired by functional programming languages like Haskell: It consists of an arbitrary amount of *definitions*, either defining a value or a type. Definitions can access other definitions, allowing the user to compose values and types from other values and types. As the base calculus \lp does not include recursion though, a definition can only access previous definitions, except for data type definitions that can access themselves to allow recursive types as explained in [section @sec:eliminators]. The types in LightfoldCore for structuring a program are shown in [listing @fig:list:core].
131

132
::: {.listing #fig:list:core title="LightfoldCore Abstract Syntax"}
133
134
135
136
``` haskell
data Lightfold = Lightfold [DefType] [DefValue]

data DefType
Nicolas Lenz's avatar
Nicolas Lenz committed
137
138
    = DefType
        Text
139
        [DefConstructor]
Nicolas Lenz's avatar
Nicolas Lenz committed
140

141
142
data DefConstructor
    = DefConstructor
Nicolas Lenz's avatar
Nicolas Lenz committed
143
        Text
144
        [Maybe Text]
Nicolas Lenz's avatar
Nicolas Lenz committed
145

146
data DefValue = DefValue Text TermCheck TermCheck
Nicolas Lenz's avatar
Nicolas Lenz committed
147
148
149
```
:::

150
151
As can be seen, a Lightfold program consists of some type definitions and some value definitions. A type definition has a name and some constructors, and a constructor consists of a name and and a list of parameters. A parameter is represented by the type `Maybe Text`: We use `Nothing` to denote a recursive parameter, that is, a parameter that has the type that is currently being defined. `Just` is used together with an type identifier to denote a parameter of another type.

Nicolas Lenz's avatar
Fixes    
Nicolas Lenz committed
152
On the other hand, a value definition consist of a name, a type term and a value term. To allow a Lightfold program to be run, a value named `main` must be defined. When executed, the program will output the result of evaluating `main`.
153
154

### Surface Syntax
Nicolas Lenz's avatar
Nicolas Lenz committed
155

Nicolas Lenz's avatar
Various    
Nicolas Lenz committed
156
Using a value definition a term with a type is bound to an identifier. As syntax we use the identifier and the type separated by a colon in the first line. Indented in the next line follows the actual term, for example:
Nicolas Lenz's avatar
Nicolas Lenz committed
157

158
159
160
161
```
id : (n : Nat) -> Nat
    n => n
```
Nicolas Lenz's avatar
Nicolas Lenz committed
162

163
This syntax deviates from the ones used in Haskell, Idris and others in that the value is indented (inspired by Python) to designate that it belongs to the declaration above instead of requiring the user to write the value name again, following the design goal to avoid unnecessary repetition.
Nicolas Lenz's avatar
Nicolas Lenz committed
164

165
Type definitions look similar to value definitions. They start with the type identifier followed by `: Type`, declaring the new type as a type. The constructors with their parameter types follow, each indented in their own line, for example:
Nicolas Lenz's avatar
Nicolas Lenz committed
166

167
168
169
170
171
```
Nat : Type
    zero
    succ Nat
```
172

Nicolas Lenz's avatar
Typos    
Nicolas Lenz committed
173
Note that while identifiers are case-sensitive, Lightfold does not impose any restrictions on the case of the identifiers used in definitions, binders and variables. However, we generally use upper-case identifiers for types and lower-case identifiers in all other occasions.
174

175
The resulting formal syntax rules are shown in [definition @fig:def:syntax].
Nicolas Lenz's avatar
Nicolas Lenz committed
176

177
::: {.definition #fig:def:syntax title="Lightfold Syntax"}
Nicolas Lenz's avatar
Nicolas Lenz committed
178
This is the syntax of Lightfold specified in Backus-Naur form. $x\ ^{*}$ means that $x$ can occur zero or more times, *indent* denotes an indentation (four or more spaces), *newline* a line break.
179
\begin{align*}
Nicolas Lenz's avatar
Nicolas Lenz committed
180
181
\textit{lightfold}
  &\bnfdef \textit{def}\ ^{*}
182
183
\end{align*}
\begin{align*}
Nicolas Lenz's avatar
Nicolas Lenz committed
184
185
186
\textit{def}
  &\bnfdef \textit{identifier}\ \texttt{:}\ \texttt{Type}\ \textit{newline}\ \textit{defConstructor}\ ^{*} \\
  &\bnfalt \textit{identifier}\ \texttt{:}\ \textit{term}\ \textit{newline}\ \textit{indent}\ \textit{term}\ \textit{newline}
187
188
\end{align*}
\begin{align*}
Nicolas Lenz's avatar
Nicolas Lenz committed
189
190
\textit{defConstructor}
  &\bnfdef \textit{indent}\ \textit{identifier}\ \textit{identifier}\ ^{*}\ \textit{newline}
191
\end{align*}
Nicolas Lenz's avatar
Nicolas Lenz committed
192
193
:::

Nicolas Lenz's avatar
Nicolas Lenz committed
194
[Example @fig:ex:syntax] shows a syntactically valid Lightfold program using all introduced syntax constructs.
Nicolas Lenz's avatar
Nicolas Lenz committed
195

196
::: {.example #fig:ex:syntax title="Lightfold Syntax"}
197
198
199
200
```
Nat : Type
    zero
    succ Nat
Nicolas Lenz's avatar
Nicolas Lenz committed
201

202
203
id : (a : Type) -> (_ : a) -> a
    a => x => x
Nicolas Lenz's avatar
Nicolas Lenz committed
204

Nicolas Lenz's avatar
Nicolas Lenz committed
205
206
207
plus : (_ : Nat) -> (_ : Nat) -> Nat
    x => y => elim Nat (_ => Nat) x (_ => n => succ n) y

208
main : Nat
209
    ((id Nat) : (_ : Nat) -> Nat) (succ zero)
210
211
```
:::
Nicolas Lenz's avatar
Nicolas Lenz committed
212

213

214
[^lambda-cs]: <https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/operators/lambda-operator#lambda-operator>
Nicolas Lenz's avatar
Nicolas Lenz committed
215
[^lambda-js]: <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/Arrow_functions>
216
217
218
[^lambda-py]: <https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions>
[^lambda-java]: <https://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexpressions.html>
[^lambda-hs]: <https://wiki.haskell.org/Anonymous_function>