Typed Machine Language and its Semantics PDF

Typed Machine Language and its Semantics

We present TML a new low level typed intermediate language for the proof-carrying code framework. The type system of TML is expressive enough to compile high

A Semantic Model of Types and Machine Instructions for Proof

16 Jul 1999 safety of machine-language programs with a machine- checkable proof. Such proofs have previously defined type-checking rules as part of the ...

Machine Language

Both binary and assembly code are forms of machine language. This article will provide an overview of a typical assembly language as well as a description

A semantic model of types and machine instructions for proof

Proof-carrying code is a framework for proving the safety of machine-language programs with a machine- checkable proof. Previous PCC frameworks have de-.

The universal code of science and machine languages

According to the various types of utilization of linguistic informa- tion in machines various machine languages are being developed.

A Semantic Model of Types and Machine Instructions for Proof

Proof-carrying code is a framework for proving the safety of machine-language programs with a machine- checkable proof. Previous PCC frameworks have de-.

Safety Checking of Machine Code

machine-language programs and applied the safety checker to several examples. of not just the types of the operation's operands

Today (10/6/2008) Assembly vs. machine language R-type format

Machine language the binary representation for instructions. Register-to-register arithmetic instructions use the R-type format.

8086(Machine Language Instruction Formats)

•A machine language instruction format has one or more number of fields associated with it. type of operation to be performed by the CPU.

Machine (Assembly) Language

Typical machine language commands (3 types). ? ALU operations. ? Memory access operations. (addressing mode: how to specify operands).

Typed Machine Language and its Semantics

Kedar N. Swadi

Princeton University

kswadi@cs.princeton.eduAndrew W. Appel

Princeton University

appel@cs.princeton.edu

ABSTRACT

We present TML, a new low level typed intermediate language for the proof-carrying code framework. The type system of TML is expressive enough to compile high level languages like core ML to and can be guaranteed sound. It is also flexible enough to provide a lot of freedom for low-level data representations. We can model real machine instructions inTML, andthus avoidhigh-level opaque operations like memory allocation and perform provably safe opti- misations like array bounds check eliminations. Most important,

TML has a semantic model.

1. INTRODUCTION

Proof-carrying code (PCC) [11] is a framework for the genera- tion of provably safe programs. In this framework, an untrusted program is accompanied by a proof of its compliance with some predefined safety policy (resource access, type safety, or memory safety). The host mechanically checks the proof for correctness to ensure safety of execution of the untrusted code. Though this technique is general enough to encode a very wide range of safety properties, existing PCC systems [10] suffer from a lack of flexi- bility with respect to the high-level source languages they translate from and the target machine architectures they compile to. Building on a semantic model of machine-level types [4], we design TML, a new low-level language that is general enough so that a wide variety of high-level languages may be compiled to it, and is also retargetable to different machines.

TML makes the following contributions :

We have a semantic model for TML types based on a verysmall set of axioms. Our trusted computing base comprisesthe rules of higher-order logic augmented by some elemen-tary facts about number arithmetic. All TML types are mod-elled as (defined) predicates, and type inference rules arelemmas forwhichweprovide machine-checkable proofs. Pre-

vious approaches to low-level semantics [9, 11] did not pro- vide models, but only syntactic consistency results. Low-level type constructors for type intersections and ad- dress arithmetic give front-end compilers more freedom in data layout than high-level (TAL-style) records or objects would.

Preliminary version, July 2001

TML is really a machine language, with instructions speci-fied by integer opcodes; thus we do not need to trust an as-sembler.

Unlike TAL [9] or DTAL [16], we can model each machineinstruction in TML, even those occuring in sequences allo-cating a heap record. While TAL and DTAL have an atomic

"malloc" instruction that expands into several real machine instructions, TML is expressive enough to reason about the intermediate states in the malloc subroutine; similarly for multi-instruction case-discrimination sequences. Each TML instruction corresponds to a single instruction on a real ma- chine like SPARC. This allows a compiler to perform prov- ablysafeoptimisations andthetrustedcomputing baseisalso made smaller as a result.

We can encode complex dataflow facts within the type sys-tem. Tracking dataflow allows us to reason about interme-diate machine states and thus make operations like sum-typediscrimination nonatomic in a safe way.

Like other PCC systems, TML has no decision procedure: weassumethat atype-preserving compiler fromasafesource language produces hints and invariants useful in constructing machine-checkable proofs. However, because each typing rule is proved separately as a derived lemma (not through a global syntactic metatheorem), we find it easy to add ad-hoc rules for the convenience of the prover or compiler (in the case where no change is necessary to the semantic model).

2. TYPED MACHINE LANGUAGE

TML is stratified into the levels ofkinds,typesandvalues.To reduce complexity of the system we have only two kinds in TML. There are no other kind constructors in TML like functions or pair- ing. This simplifies the semantic model. The type constructors are sufficiently low-level to allow translation from a wide range of source languages into TML. We cannot model quantification over higher-order kinds, which are required by the polymorphic typed lambda calculusF w . We can extend TML to higher kinds such as W !W!W, but we can't quantify over these higher kinds; still, this is enough to do type constructors in core ML.

2.1 TML Kinds

Toreasonabout machine-level programs withmachine-level types, in TML we have kinds forinstructionsandtype maps.Type maps are judgements on a vector of values associating them with their types, and have the kindW. In our semantic model, we can use type maps also to represent scalar types (judgements on a single value, not a vector) and also to represent numbers. The main mo- tivation to unify the three kinds into one was to have a smaller set of quantification operators and also fewer rules for specifying static semantics. 1

2.2 Types (t:W)

In TML, the state comprises a register bankRand a memory M. All accessible values live in registers ofR. These values may point to data in memory

M.Ifxis a number thenR(x)is the value

of thexth register; informally, we writer x . Unlike TAL, which distinguishes address values from integer values, our machine-level calculus uses numbers to index memory. As in the Appel-Felty semantic model of types [4], types are encoded as predicates on values. Then, M`r x :tifMandR(x) satisfy the predicatet. TML has the following primitive types: >,?: Every value has type>; no value has type?. id: The identity type-constructor i.e.,id(t)=t. box(t):Theboxtype constructor is for memory references. v: box(t)if the content ofM(v)satisfies the predicatet.box makes immutable references. TML can accommodate muta- ble references using the semantic model of Ahmed, Appel, and Virga [2], but we will not show details here. rec(t): This constructor allows us to define recursive types. oset(i)(t):Thevaluevhas this type ifv+isatisfiest. This constructor is useful wherever address arithmetic is re- quired. For example, if location

R(x)contains a pointer to

a record with two fields [0:t 1 ;1:t 2 ], we could sayr x oset(1)(boxt 2 ). A convenient abbreviation,eldctis the same as oset(c)(boxt). t 1 \t 2 is the type of a value satisfying predicates for botht 1 andt 2 , while a value of typet 1 [t 2 satisfies predicates for at least one of types.

8;9: These are the universal and existential quantifiers. We

allow quantification overWbut not overINSTR.Weusede Bruijn indices [6] (see Section 3.1) rather than variables, so

8or9implicitly binds a de Bruijn variable. Informally, how-

ever, we will often show variables with the quantifiers. rec: General (covariant, contravariant) recursive types are written as rect,whererecimplicitly binds a de Bruijn index that may be free int. For appropriate ("contractive") expres- sionst,wehave rectis a fixed point oft, which is written as rect=t[rectid]in the calculus of explicit substitutions. co deptr: The judgementv:co deptrfholds ifvis a code pointer with formal parametersf. That is, it is safe to jump to locationvif the registers satisfyf. The formal parameters are modelled as a type map (Section 2.4). at(i;s): If at memory locationl, we have a TML instruction iof sizes,thenl: at(i;s).

2.3 Integers (n:W)

In TML, we can specify the types of integers, and constant and bounded integers. int p (i): The type of integersxsuch thatxpi. For example, int<100 is the type of integers less than 100, andint= (c) is the singleton type of integers equal toc. For example, the type of machine integers is int32 =int0\int<2 32
+: This type captures the resultof additions. After executing r i r j +r k wherer j :int=n 1 andr k :int=n 2 ,weinferr i int= (n 1 +n 2 )Other arithmetic operators like subtraction or multiplication on integers can also be easily defined, as can modular arithmetic. These constructors allow us a great deal of flexibility for data representation. Consider, for example, a list-of-integers datatype. List cells could have untagged or tagged representations. In an Kinds k::=W jINSTR

Typ es(W)

t;f;G::=>j? jco deptrf joset(n;t) jidt jboxt jrect jt\t 0j t[t 0 j8tj9t jfn:tg jfnn jint p n jn 1 +n 2 jat(i;n) jn

Instructions(INSTR)

i::=instr(G;f;f 0) j8 i i j9 i i p::=>j jFigure 1: TML Syntax : Kinds and Types untagged scheme (Figure 2a), ifr 1 is a pointer to a list, assuming a

4-byte word, we represent this as

r 1 :int=0[(int6=0\(eld0int32 )\(eld4t)) where the left union type is the nil case and the right is the cons case. Ifr 1 contains a non-zero value it points to a record of two fields, the firstbeing the data and the second being the pointer tothe next cell. This makes pointers non-abstract in TML. (This example is a simplification which avoids dealing with the recursive nature of the list data type; see section 3.1 for a discussion of recursive types.) 10

130...

00 1 10

130200

300
300
Nil

ConsNil

Cons (a) (b) r1 r2r1 r2

Figure 2: Data Representation

To get a tagged representation (Figure 2b) we write r 1 :(eld0int=0)[(eld 0int

1\eld4int32

\eld8t) In this case, a nil cell has a tag of 0, and the cons cell has a nonzero tag field followed by the data and the next-cell pointer. 2

2.4 Type Maps (f:W)

A type map is a collection of typing judgements for registers, as- sociating each registerr x to some typet x . For most purposes, type maps serve the function of environments. If any register is not ex- plicitly referenced in the environment, it is assumed unconstrained.

The empty type map is>.

fi:tg:Thisisasingletontype map, where the only judge- mentisthatregisterr i has typet. f 1 \f 2 : IfregisterbankR:f 1 andR:f 2 ,wehaveR:f 1 \f 2

We also write

fi:t 1 ;j:t 2 gforfi:t 1 g\fj:t 2 g. fnicontains the type judgements for all registers infexcept for ther i Existential and singleton types allow us to capture dataflow infor- mation which can be used to relate the contents of two registers.

For example, if

R(i)=R(j), we can express this fact as

9n:fi:int=n;j:int=ng

A more general type map,relate, captures a wider class of such register relations. relate p (F;G)(i;j)9 n:f r i :F(int p (n))g\fr j :G(int= (n))g We use the following example to illustrate the use ofrelate.Let f 1 =fr i :boxtg. We fetch the contents of the memory (M)at

R(i)into registerr

j ,(r j M[r i ]). This operation results in a new environment,f 2 . Trivially,R(j)=M[R(i)]inf 2 . This fact is ex- pressed using the relateconstructor asrelate= (box;id)(i;j).Inthe definition above, the existential is instantiated with

M[R(i)].This

instantiation can be shown to satisfy the two sub-environments. We now have f 2 =fi:boxtg\relate= (box;id)(i;j). Fromthisenvironment, we have lemmas which can infer thatr j :t2f 2 . We can also use relateto reason about the safety of certain optimisations. For ex- ample, Section7.3 explains how to perform provably safe sum-type discriminations using relate. : tv: tv : tv Before After load loadMemoryMemoryRegister Bank Register Bank r rr r jii j

Figure 3:relateExample

2.5 Code Context (G:W)

Program safety can be proved from type safety. Thus, we en- sure that certain typing judgements hold before and after the ex- ecution of every instruction. Consider an instruction at locationl which requires its arguments to be of certain types. For example, a load instruction might require its source register to be of a eld type. In terms of Hoare logic, if the register bankRsatisfies some precondition, it is safe to jump to locationl. This precondition on Ris represented as the type mapf. Thus we havelbeing of type co deptr(f). This fact itself can be encoded as the type map

fl:co deptr(f)g, thoughlis an address indexing into the code andnot the register bank. This type map describes the local typing in-

variants at locationl. Using intersection ( \), we collect the local invariants at all program locations. This resultant map isG. As a running example, consider the SPARC program in Figure 5 which adds up the elements of a list. 031
2 o1

Register Bank Memory

Figure 4: List Representation in Memory

In this program, registero1is a list value, which is either 0 or a pointer to a two-word record containing an integer and a list value.

A length-3 list containing

[1;2;3]is shown in Figure 4. Register o2is a temporary to hold the data till it is added to the accumulator in registero3. SPARC uses delay slots after branch instructions. and they are executed whether or not the branch is taken. For this program to be safe, it must start with registero1having the type of a pointer to a list, and registero7should have the return address, i.e. it should have the type of a code pointer which expects registero3to have an integer. Assuming definitions for high-level types like listtyfor lists (as shown at the end of Section 3.1), we have the preconditionf 0 for address 0 to be f 0 =fo1:listtyg\fo7:co deptrfo3:int32 gg If this condition holds, then it is safe to jump to location 0. The compiler provides invariantsf l for each locationl, which are com- bined into G =f0:co deptr(f 0 )g\f

4:co deptr(f

4 )g\ f36 :co deptr(fquotesdbs_dbs6.pdfusesText_12

[PDF] types of operators

[PDF] types of packets in usb protocol

[PDF] types of paragraph with examples pdf

[PDF] types of polynomials

[PDF] types of sentences

[PDF] types of service delivery

[PDF] types of sociology

[PDF] types of stakeholder engagement

[PDF] types of standardized test

[PDF] types of tickets

[PDF] types of topic sentences

[PDF] types of trade agreements

[PDF] typescript connect to mongodb

[PDF] typescript express mongoose

[PDF] typescript import express

[PDF] Typed Machine Language and its Semantics

Typed Machine Language and its Semantics

Kedar N. Swadi

Princeton University

Princeton University

ABSTRACT

TML has a semantic model.

1. INTRODUCTION

TML makes the following contributions :

Preliminary version, July 2001

2. TYPED MACHINE LANGUAGE

2.1 TML Kinds

2.2 Types (t:W)

M.Ifxis a number thenR(x)is the value

R(x)contains a pointer to

8;9: These are the universal and existential quantifiers. We

8or9implicitly binds a de Bruijn variable. Informally, how-

2.3 Integers (n:W)

Typ es(W)

Instructions(INSTR)

4-byte word, we represent this as

130...

130200

ConsNil

Figure 2: Data Representation

1\eld4int32

2.4 Type Maps (f:W)

The empty type map is>.

We also write

For example, if

R(i)=R(j), we can express this fact as

9n:fi:int=n;j:int=ng

R(i)into registerr

M[R(i)].This

Figure 3:relateExample

2.5 Code Context (G:W)

Register Bank Memory

Figure 4: List Representation in Memory

A length-3 list containing

4:co deptr(f