DRC: A Detection Tool for Dangling References in PHP-Based Web Applications

INTRODUCTION To support multiple configurations and enhance user interactivity in a Web application, developers often write Web programs in a dynamic scripting language to generate different versions of the client page at run time. PHP is a widelyused language for creating such dynamic Web applications. However, being an interpreted language, PHP may induce certain types of programming errors that are difficult to detect before run time. In this paper, we are interested in the static detection of dangling reference errors in PHP programs. A reference to a program entity (e.g., a variable or a function call) becomes dangling at run time if the entity has not been declared in the current execution [1], resulting in run-time errors. Let us illustrate this type of error through an example. Figure 1 shows an example of a PHP-based Web application. The file Login.php is used to generate a login page with an HTML form named loginform (line 12), an input field userid (line 16 or 21), and a button (line 17 or 22) for a user to submit a user ID to the page VerifyUser.php for verification. The JavaScript (JS) function validate (line 6) is used to catch an empty input on the onsubmit event of the form (line 12). The text in the login form can be displayed in different languages depending on different configurations (defined via the variable $lang on line 13). If the language option is ‘de’ (line 14) and a dictionary for German is available (line 15), the dictionary variable $dict_de will be used to display the text in German. Otherwise, if the language option is ‘en’ (line 20), the text will be displayed in English. When the Login.php program is executed on the server side, the corresponding client-side code in HTML/JS will be generated. Figure 2 shows the clientside HTML/JS code, which is the PHP program’s output, corresponding to the German language option. When that client-side code is run on the client’s browser, the user could enter his/her user ID. The user ID is first validated by the JS function validate. If it is not an empty string, it will be submitted to the server and processed by the file VerifyUser.php (Figure 3). The PHP variable $_REQUEST (line 1) contains the submitted user ID. Then, a query is sent to the database (by a call to mysql_query at line 3) to look up the user’s information (using the fields firstname and lastname in the SELECT part of the SQL query). If that information is found, the user’s first name, middle initial, and last name will be displayed (line 6).

Dangling Reference Errors. The above program contains several dangling references that will be exposed in only certain program executions. In Figure 1, the PHP variable $input is declared either on line 16 (in the execution when the conditions $lang == ‘de’ (C1) and isset($dict_de) (C2) both evaluate to true) or on line 21 (in the run when the condition C1 is false and $lang == ‘en’ (C3) is true). It is then used on line 24 in all program executions. Thus, the PHP reference $input on line 24 will be undefined in the two cases of execution satisfying: 1) C1 is true and C2 is false, or 2) C1 is false and C3 is also false. Two other dangling errors are 1) the reference to the HTML input userid from JS code (line 7, Figure 1), and 2) the reference to the same input but from PHP code (line 1, Figure 3). These references exist in all executions whereas their corresponding declaration is defined for only certain configuration options (lines 16 and 21, Figure 1). In this case, we call them embedded dangling references since they are embedded in PHP string values (line 7, Figure 1 and line 1, Figure 3). Another error is that the reference to the database entity middleinitial (line 6, Figure 3) is undefined since the field middleinitial is not selected from the SELECT part of the SQL query (line 3). We also call it an embedded dangling reference. Those aforementioned dangling errors can lead to unexpected behavior of the program at run time: The first error results in an empty login form, the second and third errors change the behavior of the JS function validate and the file VerifyUser.php, and the fourth one either displays missing user information or produces unwanted error messages on the client page. In general, the issues caused by such dangling references can range from warning messages, incorrect behavior, unexpected run-time errors, to security vulnerabilities. Since dangling references in PHP programs can cause runtime errors, it is desirable to detect them early. However, as there exist numerous execution paths in a program, it is nontrivial to determine what execution paths contain a reference that does not have its corresponding declaration. In addition, some dangling references are embedded within PHP string literals or variables’ values. To identify such references, a tool will need to understand the semantics of the embedded code, which is often incomplete code fragments. Moreover, the declarations and references could be cross-language (e.g., a JS variable referring to an HTML input, or a PHP reference referring to an SQL entity as in the example).