Dissertation Defense: Yuxi Chen
Performance-friendly Concurrency Bug Failure Recovery and
Fixing
Concurrency bugs widely exist and severely threaten system
reliability. The unique non- determinism nature has made them
difficult to avoid, diagnose, and fix by developers. In the past, much
research has been done to fight with concurrency bugs during the early
phases of software life cycle. For example, people designed new
language (e.g. Go) and new static analysis tools to help avoid some
concurrency bugs during code design and implementation. And many
testing techniques and dynamic tools are proposed to expose
concurrency bugs during in-house testing. However, none of these
techniques are perfect. Many concurrency bugs still inevitably slip
into deployment, cause production failures and incur huge cost for
code maintenance, such as 60 million ether theft in bitcoin system.
Consequently, besides the early phases of software life cycle,
automated tools that help fight with concurrency bugs in the late
phases of software life cycle are well desired.
To help fight with concurrency bugs in the late phases of software
life cycle, this dissertation tries to solve two problems: (1) During
production deployment, how to help software survive from failures
caused by concurrency bugs with low overhead? (2) During software
maintenance, how to automatically fix concurrency bugs so that the
patches are correct and simple without unnecessary performance
degradation? In other word, this dissertation aims to improve the
reliability of multi-threaded programs through efficient concurrency
bug failure recovery and fixing.
Along the direction of failure recovery, we present BugTM, an approach
that applies trans- actional memory techniques for failure recovery in
production runs. Requiring no knowledge about where are concurrency
bugs, BugTM uses static analysis and code transformation to enable
BugTM-transformed software to recover from a concurrency-bug failure
by rolling back and re-executing the recent history of a failure
thread. BugTM greatly improves the recovery capability of
state-of-the-art techniques with low run-time overhead and no changes
to OS or hardware, while guarantees not to introduce new bugs.
Along the direction of efficient concurrency bug fixing, we present
BFix, a tool that automatically generates computation-bypassing
patches for some concurrency bugs. Given a bug report, BFix first uses
static analysis to check whether the bypassing strategy is suitable
for the reported bug and, if so, constructs a patch. It further tries
to combine its patches for better performance and code readability. We
have compared BFix patches with bypassing patches manually developed
by programmers, as well as the patches generated by state-of- the-art
auto-fixing tools. The experimental results showed that BFix patches
have similar quality as the manual patches, outperforming
auto-generated patches from previous tools in terms of patch
performance, and patch simplicity.
These two approaches can greatly improve software reliability by
tackling detected or undetected concurrency bugs in the late phases of
software life cycle. BugTM leverages transactional memory techniques
for concurrency bug failure recovery without intensively using
checkpointing. BFix tries to skip certain part of the original
computation in bug- triggering context without using time-consuming
synchronization primitives (i.e., locks). Both of them achieve good
efficiency, which makes them as good choice to help fight with
concurrency bugs in the late phases of software life cycle for
developers.
Yuxi Chen
Yuxi's advisor is Prof. Shan Lu