Solidity

Solidity Standalone Assembly

Solidity Standalone Assembly Main Tips

  • Inline assembly is also used alone. This way of writing assembly in Solidity is referred to as standalone assembly.
  • Standalone assembly is planned to eventually become the intermediate language of Solidity’s compiler.

Solidity Standalone Assembly

The language that was previously described as inline assembly is also used standalone, in fact, it is planned to be used as the intermediate language of Solidity’s compiler.

When used standalone, assembly tries to go for a few specific goals:

  1. The programs that are written in standalone assembly are meant to be readable, even when the code has been generated by a Solidity compiler.
  2. When translating assembly to bytecode, the outcome should contain as little unexpected outcomes as possible.
  3. The control flow, when it comes optimization and formal verification, should be easy to detect, in order to help.

To achieve the first and third goals, high-level constructs such as if and switch statements, for loops and function calls are provided by assembly.

Writing assembly programs which do not utilize explicit statement such as DUPSWAPJUMP and JUMPI, should be possible since DUP and SWAP obfuscate the flow of data, while JUMP and JUMPI obfuscate the control flow. Moreover, functional statements in the form of mul (add(x, y), 7) are preferable to statement made of pure opcode such as 7 y x add mul , since in the first form, making out which operand is meant which opcode is a lot simpler.

The second goal can be reached via introduction of a desugaring phase which would only remove the constructs, that are higher level, in an extremely regular way and still allow inspection of the low-level assembly code that is generated. There is only one operation that the assembler performs, which is not local, that being the user-defined identifier (functions, variables, …) name look-up. This operation, however, follows very regular and simplistic rules for scoping and local variable cleanup from the stack.

Scoping: A declared identifier (variable, assembly, function, label) is visible only inside the block in which it was declared (that includes nested blocks in the current block). Local variables access through the function borders is not legal, even if they were in scope. Shadowing is disallowed as well. Variables that are local are inaccessible before being declared, however, functions, labels and assemblies are. In this context, assemblies are specialized blocks used to e.g. return runtime code or contract creation. No identifiers from outer assemblies are visible inside a sub-assembly.

In the scenario that control flow would pass over a block’s end, pop instructions get inserted, which match the amount of local variables that are declared inside that block. Anytime local variables are referenced, the code generator must know the current relative position of this variable inside the stack, thus needing to keep track of the present so-called stack height. Because every local variable is removed at the block’s end, the stack height after and before the block is should not be different. Otherwise, a warning gets issued.


Solidity Standalone Assembly High-Level Constructs

Why do we use high-level constructs such as forswitch and functions?

Firstly, using forswitch and functions, writing complex code without having to use jump or jump manually should be possible. Because of this, it is a lot analyzing the control flow is a lot easier, allowing for improved optimization and formal verification.

Moreover, when you can do manual jumps, the stack height becomes rather complicated to compute. Local variables positions on the stack need to be known, else neither local variables references nor removing getting rid of local variables automatically block end’s stack is going work as intended. The mechanism of desugaring  inserts operations correctly at blocks that are unreachable, which properly adjusts the stack height in case of jumps which have no continuing control flow.

Let’s compare two code examples to explain this in more detail.

In this example, we are going to follow Solidity being compiled to desugared assembly. We consider the runtime bytecode of the following Solidity program:

Example

pragma solidity ^0.4.0;

contract Cont {
  function func(uint a) returns (uint b) {
    b = 1;
    for (uint n = 0; n < a; n++)
      b = 2 * b;
  }
}

 

Try on Remix Try live on Hosting 

The assembly that gets generated:

Example

{
  mstore(0x40, 0x60) // this line will store the "free memory pointer"
  // dispatch function
  switch div(calldataload(0), exp(2, 226))
  case 0xb3de649b {
    let (retData) = func(calldataload(4))
    let retVal := $memAllocate(0x20)
    mstore(retVal, retData)
    return(retVal, 0x20)
  }
  default { revert(0, 0) }
  // allocate memory
  function $memAllocate(size) -> pos {
    pos := mload(0x40)
    mstore(0x40, add(pos, size))
  }
  // function of the contract
  function func(a) -> b {
    b := 1
    for { let n := 0 } lt(n, a) { n := add(n, 1) } {
      b := mul(2, b)
    }
  }
}

 

Try on Remix Try live on Hosting

This is what it looks like once past the desugaring phase:

Example

{
  mstore(0x40, 0x60)
  {
    let $0 := div(calldataload(0), exp(2, 226))
    jumpi($case1, eq($0, 0xb3de648b))
    jump($caseDefault)
    $case1:
    {
      // call the function and put arguments and return label onto the stack
      $retVal1 calldataload(4) jump(f)
      // This code is unreachable. Opcode, which mirror the function's
      // effect are added onto the stack height: Remove arguments
      // and introduce the return values.
      pop pop
      let r := 0
      $retVal1: // the return point itself
      $retVal2 0x20 jump($allocate)
      pop pop let retVal := 0
      $retVal2:
      mstore(retVal, r)
      return(retVal, 0x20)
      // jump is automatically inserted, despite being useless, because
      // the desugaring process is an operatiopn that is purely syntactic
      // meaning that is does not analyze control-flow
      jump($switchEnd)
    }
    $caseDefault:
    {
      revert(0, 0)
      jump($switchEnd)
    }
    $switchEnd:
  }
  jump($functionAfter)
  allocate:
  {
    // The unreachable code introducing the function arguments is jumped over
    jump($toStart)
    let $retposition := 0 let size := 0
    $toStart:
    // both arguments and output variables live in the same scope, which
    // actually is allocated.
    let pos := 0
    {
      pos := mload(0x40)
      mstore(0x40, add(pos, size))
    }
    // Replaces arguments with the return values, jumping back afterwards
    swap1 pop swap1 jump
    // Code that cannot be reached, correcting stack height
    0 0
  }
  f:
  {
    jump($toStart)
    let $retposition := 0 let a := 0
    $toStart:
    let b := 0
    {
      let n := 0
      $for_start:
      jumpi($for_finish, iszero(lt(n, a)))
      {
        b := mul(2, b)
      }
      $for_cont:
      { n := add(n, 1) }
      jump($for_start)
      $for_finish:
    } 
    swap1 pop swap1 jump // Here, a pop instruction is going to be inserted for n
    0 0
  }
  $functionAfter:
  stop
}

 

Try on Remix Try live on Hosting


Solidity Standalone Assembly Stages

It is important to not that assembly happens in four stages:

  1. Parsing stage
  2. Desugaring stage (removing for, switch and functions)
  3. Opcode stream generation stage
  4. Bytecode generation stage
Read previous post:
Solidity Inline Assembly Syntax

Solidity Inline Assembly Syntax Main Tips This tutorial is about the specifics of syntax you use for inline assembly language in...

Close