Thesis Details

Static Analysis of C Programs

Ph.D. Thesis Student: Malík Viktor Academic Year: 2023/2024 Supervisor: Vojnar Tomáš, prof. Ing., Ph.D.
Czech title
Statická analýza programů v jazyce C
Language
English
Abstract

This thesis proposes several original contributions to the area of static analysis of software with focus on low-level systems code written in C. The contributions are split into two parts, each related to a different area of static analysis, namely formal verification of (low-level) C code and static analysis of semantic equivalence of different versions of the same software. The first part proposes new analyses suitable for verification engines that perform automatic invariant inference using an SMT solver. The proposed solution includes two abstract template domains that use logical formulae over bit-vectors to encode the shape of the program heap and the contents of the program arrays. The shape domain is based on computing a points-to relation between pointers and symbolic addresses of abstract memory objects. The array domain is based on splitting the arrays into several non-overlapping contiguous segments and computing a different invariant for each of them. Both domains can be combined with value domains in a straightforward manner, which particularly allows our approach to reason about shapes and contents of heap and array structures at the same time. The information obtained from the analyses can be used to prove memory safety and reachability properties, expressed by user assertions, of programs manipulating data structures. All of the proposed solutions have been implemented in the  2LS framework and compared against state-of-the-art tools that perform the best in the relevant categories of the well-known Software Verification Competition (SV-COMP). Results show that 2LS outperforms these tools on benchmarks requiring combined reasoning about unbounded data structures and their numerical contents. The second part of the thesis is motivated by existence of software projects that undergo regular refactorings and modifications and yet need to ensure semantic stability of some of their core parts. This part proposes a highly-scalable approach for automatically checking semantic equivalence of different versions of large, real-world C projects, with a particular focus on the Linux kernel. The proposed method uses a novel combination of pattern matching with light-weight static analysis and control-flow transformations. The method checks preservation of the semantics of functions forming the API of the project being analyzed as well as of the semantics of its global variables, which typically hold various control parameters. For the latter, a specialised slicing procedure is proposed to slice out code influenced by these variables and concentrate the further analysis on that code only. Although the method cannot prove equivalence on heavily refactored code, it can compare thousands of functions in the order of minutes while producing a low number of false non-equality verdicts as our experiments show. The method has been implemented over the LLVM infrastructure in a tool called DiffKemp. Our results show that DiffKemp, unlike other existing tools, gives practically useful results even on projects of the size of the Linux kernel.

Keywords

static analysis, formal verification, formal methods, shape analysis, array abstract domain, template-based invariant synthesis, abstract interpretation, abstract domains, SAT/SMT solving, loop invariants semantic equivalence, refactoring, pattern matching, semantics- preserving patterns, code change patterns, refactoring patterns, Linux kernel, program slicing, LLVM IR

Department
Degree Programme
Computer Science and Engineering, Field of Study Computer Science and Engineering
Files
Status
defended
Date
5 March 2024
Citation
MALÍK, Viktor. Static Analysis of C Programs. Brno, 2023. Ph.D. Thesis. Brno University of Technology, Faculty of Information Technology. 2024-03-05. Supervised by Vojnar Tomáš. Available from: https://www.fit.vut.cz/study/phd-thesis/1059/
BibTeX
@phdthesis{FITPT1059,
    author = "Viktor Mal\'{i}k",
    type = "Ph.D. thesis",
    title = "Static Analysis of C Programs",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2024,
    location = "Brno, CZ",
    language = "english",
    url = "https://www.fit.vut.cz/study/phd-thesis/1059/"
}
Back to top