Bachelor’s Thesis
Abstract
A lot of software that we use today is written in C and C++. Especially these memory unsafe languages induce vulnerabilities in applications. Therefore people developed techniques which make the exploitation of programming faults harder. My goal is to fortify these techniques and introduce new methods to make programs more secure. I analyzed the following techniques with regard to their security effect and their impact on performance:
- Checking the position of the stack pointer in every system call, which showed an overhead of $(2.7 \pm 3.3)\,\%$ in a microbenchmark. The measured overhead shows a large standard error, thus we cannot be sure that our patch actually makes applications slower.
- Adding random gaps between sequent
mmap
allocations, leading to a maximal speed loss of $(2.8 \pm 0.5)\,\%$. - Improving the ssp (stack smashing protector – also called canary) by clearing the ssp from the stack after checking it (no measureable performance change) and generating a random ssp for every function call ($(265 \pm 4)\,\%$ times slower in a microbenchmark, while more realistic workloads showed a regression of $< 2\,\%$).
For this work I created patches for the stack pointer check and the ssp improvements. Below are also benchmarks for all of these modifications.
For a more in depth analysis, take a look at the thesis.
Problems
Stack Smashing Protector
When using the -fstack-protector-strong
compiler option, the ssp is written and checked for each function call where a buffer is stored on the stack. The fact that the canary value is only set when a process is started and keeps its value throughout the lifetime of a process, induces some possible weaknesses. If it is leaked by an out-of-bounds read or by leaking uninitialized memory, it gets useless, because an attacker can just overwrite the ssp with the correct value, which he now knows, remaining undetected. Another way to find out its value is possible with servers that fork themself for each connection. The fork
system call will clone the process memory, including the ssp. If an attacker can overflow a buffer, he can overwrite only the first byte of the protector ($\frac{1}{256}$ chance to succeed) and observe if the process aborted. If not, he guessed the first byte successfully. By guessing a static value byte by byte, the attack gets feasible. This specific problem was addressed by generating a new ssp on forks.
Improvements
A simple way to prevent the possible leakage of the ssp is to set the canary memory to zero after checking its value. This leads to no more unnecessary copies of the ssp on the stack and completely eliminates the risk to leak the ssp value by uninitialized variables.
Zeroing the canary after checking it can be done with seven added lines of code, including comments. Our patch covers the ssp check on x86 architectures, including the case when SafeStack is used. There is one more implementation of a stack protector in llvm in the SelectionDAG stage, that is not modified in my patch.
For the microbenchmark, the tested program calls a function, which stores two bytes into a buffer, checks the ssp and returns, for $5 \cdot 10^9$ times.
void fun() {
volatile char arr[10];
arr[0] = 'a';
arr[1] = '\0';
}
int main(int argc, char *argv[]) {
for (unsigned long i = 0; i < 5000000000; i++)
fun();
return 0;
}
One approach that was tested, is clearing the ssp by a simple mov
operation. Another way uses xor
to zero the canary. The resulting assembly for fun
looks like this:
sub rsp, 0x18
mov rax, [fs:0x28]
mov [rsp+0x10], rax
mov BYTE [rsp+0xf], 0x61 ; 'a'
mov BYTE [rsp+0xe], 0x0 ; '\0'
mov rax, QWORD [fs:0x28]
cmp rax, QWORD [rsp+0x10]
mov QWORD [rsp+0x10], 0 ; Inserted for the mov change
jne fun_ssp_fail
add rsp, 0x18
ret
fun_ssp_fail:
call __stack_chk_fail
The code using xor
contains more changes:
mov rdx, QWORD [fs:0x28]
xor rdx, QWORD [rsp+0x10]
test rdx, rdx
mov QWORD [rsp+0x10], rdx
jnz ssp_xor_fail
The approach using xor
is slower by $(18.2 \pm 1.5)\,\%$ compared to not zeroing the ssp. The diagram below shows that the mov
variant was slightly, but not significantly, faster than the non-zeroing method in three test runs by $(0.8 \pm 1.1)\,\%$, which is due to random jittering.
The benchmarks were executed on an Intel i7-5820K processor.
The patch was proposed upstream at llvm.