Are PHP Applications Ready for Hack?

INTRODUCTION Php is a scripting language, dynamically typed with no static type checking or variable declarations, and is supported by an ecosystem of hundreds of pre-defined, ready to use libraries, modules and functions. Development in such a rich ecosystem is at a faster pace; there is no need to compile, no need for complicated build files, and PHP code can be interwoven directly with HTML code making it very easy to develop custom websites and web development frameworks. Avoiding the limitations of type safety is designed to increase flexibility and reuse – for example, we can concatenate strings, integers or mixes of them using the same dot operator, or we can assign an array to a variable that previously contained a boolean value. During execution or on different execution paths, a PHP variable may be assigned values of many different types, for example a variable can first be assigned an integer, and then a string or perhaps later an array.

However, such powerful language features also have a downside, and often come at the cost of late error identification, particularly in large codebases [5] PHP is no exception in this respect. The benefits of static type checking have been well discussed in literature [25], [27], [34]. A recent empirical study by Ray et al. evaluated the impact of programming language choice on the quality of software. The results of the study show that the quality of systems developed in statically type languages is significantly better than those in dynamically type languages. Though other studies show the negative impact of static type languages on development time and productivity [30], [18]. Despite the discussed benefits and empirical supports for statically type languages, popularity of dynamic languages are increasing [26], [6]. Meijer et al. elaborate the need for a language that provides both safety and flexibility [23]. The desirability of having both a fast development pace and the capability of early error detection recently motivated the Facebook staff to develop the Hack programming language [1]. Hack can be viewed as PHP with the addition of static typing, thus ensuring type safety. PHP code in the Facebook codebase is converted to Hack using a set of custom code modification tools [2]. However, the automated code conversion may not have good coverage if the project uses certain dynamic features, such as global, eval, Variable variables, and other aspects of PHP that are not recognized in Hack. While one may expect that due to its static typing Hack programs may have better quality, fewer bugs and easier maintenance than PHP programs, to the best of our knowledge there is as yet no experimental evidence that this is the case. Thus, many PHP application teams may not yet be convinced to convert their programs to Hack, fearing that the overhead of the conversion effort may not be worth it. In light of this, we are interested in understanding to what extent dynamic typing is actually used in PHP programs, and for what purposes. Our intuition is that dynamic typing makes a program more difficult to understand, and thus we suspect that in production PHP programs, very few variables may actually change type during execution. If this is indeed the case, then the effort to convert a PHP application to Hack by eliminating dynamic typing using source code refactoring may not actually be very high. Our goal is to provide a practical approach for identifying, classifying and reporting violations of type safety in a PHP application, and, armed with this knowledge, to leave the decision to the PHP developers on what action should be taken. We perform an automated hybrid static and dynamic analysis of dynamic typing, complemented by manual refactoring and validation.We rely on dynamic analysis to identify type changes at run time using program instrumentation. We perform a static analysis to identify the scope of the variables, and finally we manually verify whether we can ensure type safety through renaming by refactoring and repeating the dynamic analysis. We apply the technique to four production open source PHP applications: phpBB2, phpBB3, Drupal, and WordPress. All examples discussed are taken from systems we analyzed. The main contributions of this paper lie in the empirical taxonomy of type changes, the approach to automatically detect and classify type changes, as well as in the empirical validation on four production PHP applications. Paper structure. Section II begins with a brief description of types in PHP and our proposed classification of type changes. Section III describes our approach to identifying and classifying type changes in PHP assignment statements using a combination of static and dynamic analysis of PHP code. Section IV reports on our empirical study aimed at demonstrating the approach’s feasibility and analyzing the results with respect to our three research questions on four production PHP systems. Following a discussion of related work in Section VI, Section VII concludes the paper and outlines directions for future work.