Solidity Internals Optimizer Main Tips
- Because of Solidity optimizer operating on assembly, it is usable with other languages than Solidity.
- The Solidity compiler splits the instruction sequence into blocks of JUMP and JUMPDEST .
Solidity Internals Optimizer
Because of Solidity optimizer operating on assembly, it is usable with other languages than Solidity.
The instruction sequence is split into blocks of JUMPs and JUMPDESTs by the compiler. In the blocks, analysis is performed on the instructions, with all stack modifications, to storage or memory being recorded as expressions which consist of an instruction and an argument list that are in essence pointers towards other expressions. The core idea now is finding expressions which are constantly equal (on any input) then combine them a class of expression. First, the optimizer will try to find every new expression inside a list of expressions that are known already. In case this would not work, the expression gets simplified by rules such as constant + constant = sumOfConstants or n * 1 = n .
Because of this being done in a recursive manner, the latter rule is also applied if the second factor is an expression that is more complex in which we know that it is always going to evaluate to one. Memory and storage location modifications must erase memory and storage locations that are not known as different: In case we wrote to location x first, only then to the location y with both being input variables, the second may overwrite the former, therefore we do not actually know what gets stored at x after writing to y. Alternatively, if simplifying the expression x – y will evaluate to a constant which is non-zero, we then would know that we may keep the knowledge we possess about what x has stored in it.
By the ending of this process, we are aware what expressions must be on the stack by the end and have a storage and memory modification list. This information gets stored along with the basic blocks in addition to being used for linking them. Moreover, information about memory, storage and the stack configuration gets sent to the following block(s). When we are aware of the targets of each JUMP and JUMPI instruction, it is possible to build a complete control flow program graph. In case only a single target we are not aware of is present (which may happen, because in principle, jump targets are computable from inputs), we must delete all of the knowledge about a block’s input state since it may be the target of an unknown JUMP . If a JUMPI, whose condition would evaluate to a constant, is found, it is then transformed into an unconditional jump.
Lastly, each block’s code gets fully re-generated. A graph on dependency is generated using the expressions that were on the stack by the block’s end and all operations that are not parts of the graph are basically dropped. From then on the code gets generated which applies these modifications to storage and memory in their order expressed in the original code (while dropping those modifications that have been found to be unnecessary) and at last, generate every value that is required for being on the stack at the correct position.
Those steps get applied to every basic block and the freshly generated code gets used as replacement when it is smaller. If a basic block gets split at a JUMPI and as the analysis is happening, the condition is evaluated to a constant, the JUMPI will be replaced based on the contant’s value, so code like this:
var n = 6; data = 7; if (data[n] != n + 1) return 1; else return 0;
Will be simplified into the kind of code that can be compiled from, even if the instructions contained a jump at the start.