Output-oriented Refactoring in PHP-based Dynamic Web Applications

INTRODUCTION Refactoring refers to source code restructuring for better software quality. It is important during the development of traditional programs as well as advanced Web applications. In a dynamic Web program, multiple versions of client-side code in HTML and JavaScript (JS) are dynamically generated at run time from server code written in a host language, e.g. PHP, for different usage scenarios. Client-side program entities are often embedded in string literals and/or computed via several statements in server-side code. While several existing tools support refactoring on the traditional programs, there is still limited automated refactoring support for dynamic Web code. Toward studying refactoring in dynamic Web code, we have conducted an empirical study on four PHP-based Web applications. We manually checked a total of 2,664 revisions. We have found that developers have performed a special kind of refactoring that is very specific to dynamic Web programs. After such a refactoring, the server-side code is more compact and modular with less amount of embedded and inline client-side HTML/JS code, or produces more standard-conforming client-side code. However, the corresponding output client-side code of the server code before and after the refactoring provides the same external behavior. For example, they often replace a portion of HTML/JS code embedded or inlined within PHP server code with new PHP code that produces the same client-side code. They also replace a long fragment of server code in both PHP and inline HTML/JS with more dynamic code in PHP that creates the same client-side code. Another popular type of refactoring occurs when developers refactor client-side code that is embedded within PHP strings. For example, to make their HTML/JS code conform to Web code standards [25], developers change a tag name or add/delete opening/closing tags in HTML code embedded in a PHP string. We call those operations output-oriented refactorings. In total, we found 11 output-oriented refactoring operations, which are classified into 5 categories depending on their purposes: 1) dynamicalization (e.g. replacing inline HTML/JS code with a PHP fragment or function), 2) re-structuring server-side and client-side code (e.g. extracting and moving server code and inline JS code), 3) renaming embedded HTML/JS elements, 4) standardizing embedded HTML code (e.g. adding proper tags), and 5) refactoring for separation of concerns (e.g. separating JS code for control logic from HTML code for presentation). Our finding calls for automated tools to support output-oriented refactorings. We also introduce WebDyn, a refactoring tool that supports dynamicalization refactorings. After a user selects a code portion in a PHP file (which might contain both PHP and embedded HTML/JS code), it will check the pre-condition of the dynamicalization refactoring such as whether it is syntactically correct and contains repetitive code. If the precondition holds, WebDyn will analyze the selected portion of code in order to partition and parameterize it. Then, it produces the resulting dynamic PHP code based on the detected partition and its parameterization. In our prior work [16], [17], we have developed a tool for embedded code renaming and standardization refactorings. Refactorings for re-structuring server and client code and for separations of concerns will be parts of our future work. Our empirical evaluation on real-world projects showed that WebDyn achieves 100% accuracy in automatic dynamicalization refactorings. Our key contributions include: 1. An empirical study that motivates tool support for outputoriented refactoring operations in dynamic Web applications, 2. WebDyn, an automated output-oriented refactoring tool, 3. An empirical evaluation to show WebDyn’s accuracy.