Detecting Security Vulnerabilities in Object-Oriented PHP Programs

INTRODUCTION Static analysis is often used to detect security vulnerabilities in programs, because it enables a security analyst to reason about the program without executing it [1]. Security review tools usually scan the whole program to report the security vulnerabilities found in the code [2]. For PHP, many tools, such as Pixy [3], RIPS [4], PHPSAFE [5] and Weverca [6], have been developed to detect such vulnerabilities. However, certain features of PHP, such as include calls, dynamic arrays, and dynamic object-oriented programming (OOP) features, represent major challenges for these approaches. In particular, it is challenging for these tools to reason about the semantics of OOP features in PHP due to its object model. In PHP, object properties do not necessarily have to be declared before accessing them. Therefore, most PHP security tools either do not support (OOP) features in PHP, or partially provide some support at the cost of sacrificing precision and soundness. This limited support cripples the ability of such tools to detect vulnerabilities in a large class of PHP applications, going back to PHP 5 when OOP features were first introduced.

In this paper, we present OOPIXY, a static analysis tool that detects security vulnerabilities in PHP programs that use OOP features. OOPIXY uses interprocedural data-flow analysis [7] to track values of variables and analyze dynamic data structures such as objects and arrays. OOPIXY reports various types of security vulnerabilities, including cross-site scripting (XSS) [8], SQL injection [9], remote code execution, remote command execution, and XPath injection [10]. OOPIXY extends the open-source static code scanner Pixy [3] by providing OOP support. This support enables OOPIXY to analyze more recent PHP programs (PHP 5 and onward), while the original PIXY analyzers only supports PHP 4. Moreover, OOPIXY can detect a wider range of vulnerabilities compared to PIXY that can only detect XSS and SQL injection vulnerabilities. To evaluate OOPIXY, we conducted two experiments, one using micro benchmarks, and the other is a case study of analyzing real-world open-source PHP applications. Our implementation also provides a convenient visualization of the detected vulnerabilities.

BACKGROUND A. Overview of PIXY PIXY is an open source static analysis tool for PHP programs. The tool is implemented in Java, and scans PHP 4 programs for XSS and SQL injection attacks. Figure 2 shows the original architecture of PIXY. First, the tool parses the input program into an intermediate representation called P-Tac, which is similar to the classical three-address code (TAC) [11]. PIXY then converts the parse tree into a control-flow graph (CFG) using a module called TAC Converter. Afterwards, PIXY runs an alias analysis to collect alias information for variables, followed by a literal analysis that uses the collected alias information to track values of variables throughout the program. Finally, PIXY runs a taint analysis to determine tainted variables (i.e., hold private information). Jovanovic et al. [3] presented the first version of PIXY that could only detect XSS vulnerabilities in simple PHP programs. The same authors published a later paper that describes an extension of PIXY that models PHP aliasing [7]. PIXY has also received several other upgrades throughout the years, such as support for detecting SQL injection attacks. B. Data-Flow Analysis Data-flow analysis algorithms follow the propagation of data throughout a program by traversing the CFG and marking where data values are generated and where they are used [3]. This information is used in security review tools to determine if private data leaks outside the program without applying proper sanitization. The analysis uses a lattice to represent the type of collected information, such that any information associated with a CFG node is an element of the lattice. Each CFG node is also associated with a transfer function that takes a lattice element as input and returns a lattice element as output. Each transfer function models the semantics of its corresponding CFG node with respect to the collected information. The analysis applies the transfer functions to propagate the information through the program, and combines information at merge points (e.g., after if conditions). Data-flow analyses have different trade-offs between precision and scalability. A flow-sensitive analysis considers the ordering of program instructions, whereas a flow-insensitive Fig. 3. The architecture of OOPIXY. The components with dashed lines represent modifications or additions to the original architecture of PIXY. analysis treats a program as a set of unordered instructions. Inter-procedural analyses handle function calls, while intraprocedural analyses operate within a single function. Contextsensitive analyses distinguish between different call sites to a function, unlike context-insensitive analyses that merge the information computed for different calls to the same function. Hence, the highest precision can be achieved by performing an analysis that is flow-sensitive, inter-procedural, and context sensitive [3]. However, inter-procedural analysis sometimes sacrifices precision to achieve scalability (e.g., handling recursive function calls).