Skip to content

tsantalis/jdeodorant-commandline

Repository files navigation

This is an Eclipse plug-in that allows running JDeodorant for identifying refactoring opportunities and applying them in the batch mode.

Running the headless mode within Eclipse

You can run this application from within Eclipse. Please follow these steps:

  1. Download (or clone) jdeodorant-commandline and JDeodorant plug-in and import them as existing projects into your Eclipse workspace.

  2. Right-click on the JDeodorant-Commandline project and select Run As > Run Configurations...

  3. Click on Eclipse Application and then on the New launch configuration button. Give a name to the newly-created launch configuration.

  4. In the Main tab:

    • In the Workspace Data, setup the Location to point to the workspace containing the projects that you want to analyze in the headless mode. The projects which are going to be analyzed will be opened in this workspace. There are two options to open Java projects (that you are going to analyze) in the workspace:

      • The workspace directory is created by Eclipse. In this case, it can be created by clicking on File > Switch Workspace and specifying a new workspace directory, and then creating a new project (or importing the existing one) to Eclipse. You can import multiple projects that you want to analyze. After you are done, you should switch back to the original workspace where JDeodorant and jdeodorant-commandline plug-ins are imported.
      • You can ask the tool to try importing an existing Eclipse project automatically. In this case, the workspace is created in the given path by the tool, and the project is imported to it. You'll need to use the -pd switch to specify the path to the .project file of the project (See the table below).

      Note that, in any case, Eclipse project files should exist for the Java project that you want to analyze.

    • In the Program to Run select to Run an application and from the drop-down list select ca.concordia.jdeodorant.eclipse.commandline.application.

  5. In the Arguments tab specify the Program arguments (refer to the following table).

  6. Next, specify the VM arguments as -Xms128m -Xmx4096m -XX:PermSize=128m (you can increase the Xmx value, if more memory is available).

  7. In the Plug-ins tab first select plug-ins selected below only in the Launch with: drop-down list. Then select ca.concordia.jdeodorant.eclipse.commandline (1.0.0.qualifier) and click on Add Required Plug-ins button.

  8. Apply the changes in order to save the new Launch Configuration. Click Run to test whether the headless plug-in works properly. If you are getting BundleExceptions, go back to the Plug-ins tab (step 7) and select Launch with: all workspace and enabled target plug-ins. Apply the changes and Run again the headless plug-in.

Running as a standalone command-line application

We have provided the necessary means for generating an Eclipse product that can be run from the OS command-line as a standalone executable, without the need for opening Eclipse for running. This is particularly useful if, for instance, one needs to integrate JDeodorant in their current development workflow (e.g., using continuous integration).

The Eclipse product is an executable file along with the necessary plug-in dependencies. The entire package can be generated by Eclipse, one for each platform. We have tested the product on Windows and Mac.

To generate the executable for your target platform, follow these steps:

  1. Download (or clone) jdeodorant-commandline and JDeodorant plug-in and import them as existing projects into your Eclipse workspace (You will need Eclipse only to generate the Eclipse product, which runs from the OS commandline).

  2. In the commandline project, double click on ProductConfiguration.product. The Product Configuration Editor should be opened (If not, you might be missing necessary plug-ins installed on your Eclipse. We tested on Eclipse IDE for Java EE Developers).

  3. If you need to configure the generated product, you can use the Configuration and Launching tabs, which allow changing parameters for the generated product for different platforms. For instance, you might want to change the eclipse.ini file that the target product will use, or provide additional VM arguments.

  4. From the Overview tab, under the Exporting section, choose Eclipse Product Export Wizard.

  5. In the shown wizard, /JDeodorant-Commandline/ProductConfiguration.product should be selected as Configuration. Specify a directory under the Destination section, and click Finish.

  6. A folder containing the final Eclipse Product will be created. Look for the file \eclipse\eclipse.exe or \MacOS\eclipse, which is the executable for the product.

  7. Open a command line, switch to the folder containing the executable file (found in the previous step) and run the product's executable. You should provide necessary arguments, as mentioned in the following table. For instance, you can run (on Windows):

eclipse.exe -pd "TestProject/.project" -x "clones.xls" -m PARSE_AND_ANALYZE ...

Command-line arguments

These arguments can be passed in step 5 (headless mode within Eclipse) or step 7 (standalone mode).

Long option Short option Arguments Description
--help -? Displays arguments and their explanations
--mode -m analyze_existing
parse_and_analyze
parse
Mode of operation. See below for more information
--project -p {project name} Name of the project which currently exists in the Eclipse workspace
--project-description -pd {.project file} Alternative to `-p`; Path to the `.project` file of the eclipse project to be imported to the workspace
--excelfile -x {path/to/the/xls/file} Path to the input (output, in the PARSE mode) .xls file
--tool -t clone_tool_ccfinder
clone_tool_clonedr
clone_cool_conqat
clone_tool_deckard
clone_tool_nicad
Specifies the clone detection tool
--tooloutputfile -i {path/to/the/input/file} Path to the main output file of the clone detection tool
--extra-args -xargs {arg1, arg2, ...} Comma separated list of extra arguments which are needed in case if we use specific clone detection tools. See below for more information.
--row-start-from -r {row} Specifies the row number (starting from 2, row 1 is the header) of which the tool must start the analysis.
--append-results -a Specifies whether the existing outputs (Excel file, CSV files) must be appended by new results or they must be overridden.
--skip-groups -s {group_id1, group_id2, ...} A comma separated list of clone group IDs to be skipped from the analysis.
--test-packages -testpkgs {group_id1, group_id2, ...} A comma separated list of the fully-qualified names of the packages containing test code.
--test-source-folders -testsrcs {folder1,folder2,...} A comma separated list of the source folder names containing test code. This is similar to the previous argument.
--run-tests -rt Run tests after applying each refactoring.
--log-to-file -l Create a log file from console output.
--group-ids -g {id1, id2, id3, ...} A comma-separated list of clone group IDs to be analyzed. Other clone groups in the file will be skipped
--debugging-enabled -de Prevent Eclipse command-line tool to cancel jobs queued in Eclipse JobManager such as workbench job, etc., so that debugging is possible in Eclipse
--mail-server-ip -msrvr {Mail server address}
127.0.0.1
Email server for sending emails after analysis finished
--mail-server-port -mport {Mail server port}
25
Email server port, see previous option
--mail-server-security-type -msectype NONE
SSL
STARTLS
Security type for mail server
--mail-server-authenticated -mauth Is SMTP server authenticated
--mail-server-user-name -muser {Mail server user name} SMTP user name
--mail-server-password -mpass {Mail server password} SMTP password
--email-addresses -em {email1, email2, ...} A comma-separated list of email addresses to which the analysis notifications should be sent

Note: The bold-faced options are mandatory. Italic arguments are default values.

Mode of Operation

The headless application works in three different modes. These modes are explained in the following table. For running the tool in each of these modes, use appropriate value for --mode (or -m) argument.

Value for --mode argument Description
PARSE In this mode, the output file of a clone detection tool will be parsed to an Excel file. You mist give the path to the Excel file using -excelfile (or -x) argument. You must also provide the name of the clone detection tool (using the --tool argument), the path to the input file (the output of clone detection tool, using -i argument), and for some specific clone detection tools, extra argument (using --xargs). See below for more info.
ANALYZE_EXISTING In this mode, the tool analyzes an existing Excel file. Again, the path to the Excel file must be given using -excelfile (or -x) argument. The results of the analysis will be written in the same folder as the input Excel file.
PARSE_AND_ANALYZE This mode first parses the output of the clone detection tool, and then analyzes the parsed Excel file. All the arguments in the PARSE mode must be also provided in this mode.

The input (and output) Excel files

The input Excel file must be in Excel 97-2003 (.xls) format. Please note that, the tool cannot handle .xlsx files. The first row of the Excel file is used as header row. For the analysis, the input Excel file must contain the information for some of the columns, while for other columns, the cells will be filled during the analysis.

In the Excel file, each row is for one clone. Each clone is a code fragment which is detected to be duplicated in another part of the system. Several clones in the consecutive rows belong to one clone group. Hence, each possible pair of clones inside a clone group are code fragments that are duplicated. The row corresponding to the first clone of every clone group contains some information about the clone group, including values for Clone Group Size, Clone Group Info and Connected columns.

Column Description
Clone Group ID An integer assigned to every clone group. For all the clones inside one clone group, the value of this cell is similar, which is the ID of the clone group to which these clones belong.
Source Folder The source folder of the class file to which this clone belongs.
Package Fully qualified path to the package of the class file to which this clone belongs.
Class Name of the class file to which this clone belongs.
Method Name of the method in which this clone exists. Please note that, currently there is no support for the clones outside of the boundaries of methods.
Method Signature Signature of the method in which this clone exists, in the Bytecode format.
Start Line, End Line, Start Offset, End Offset Starting and ending lines and offsets of the clone fragment.
#PDG Nodes Number of PDG nodes in the method in which this clone exists. This column will be filled after analysis on this clone is done.
#Statements Number of statements in the clone fragment that is reported to be a clone. This column will be filled after analysis on this clone is done.
Line coverage Percentage of the number of lines of code fragment covered by unit tests.
Clone Group Size Number of the clones in the clone group. This value only comes in the first row of the clone group.
Clone Group Info Type of the clone group. It might be Repeated when the entire clone group is repeated, or Subclone when the clones in this clone group are sub-clones or super-clones of clones in another clone group. In these two cases, our tool will skip the clone group for analysis.
Connected If the value of the previous cell is Subclone, this cell contains the clone group ID of the clone group of which this clone group is a sub-clone (or super-clone).
Clone Pair Location Location of the clones in the clone group. Clones could be in the same in the same method, in the same class, or in different classes.
#Refactorable Pairs Number of refactorable pairs in the clone group, which is calculated after the analysis.
Details Each pair of clones in every clone group is analyzed by the tool. When the analysis finished, in this column, and the following columns in the same row, hyperlinks to the HTML reports of the analysis of the clone pair corresponding to this row and all other clones in the same clone group are given. The name of the hyperlink is in the format {clone group ID}-{first clone number}-{second clone number}.
If the background color for a cell is green, it means that the clone pair corresponding to this cell is refactorable, if it is red, it means that the clone pair is not refactorable. A white background color shows that the clone is not analyzed. This happens when:
  • A clone is a class-level clone, meaning that the clone that is reported by the clone detection tool goes beyond the boundaries of a method, or
  • A clone is a repeated clone, or
  • User has marked the clone group corresponding to this clone to be skipped (using -skip-groups (-s), or
  • No method was found in the given code region that was reported by the clone detection tool, or
  • No common nesting structure was found for the clone pair.

A sample empty Excel file is provided here.

Using the output of clone detection tools

The output of a clone detection tool must be first converted to the desired Excel file. For convenience, we have provided parsers for the popular clone detection tools, as an internal feature in the command-line tool.

When the tool is executed in the PARSE or PARSE_AND_ANALYZE modes, user has to provide the tool with the output file of the clone detection tool, using --tooloutputfile (-i) argument. Also, the name of the clone detector must be specified using --tool (-t) argument. For example, the following arguments can be used to parse and analyze an output from CCFinder for project Apache Ant:

-p apache-ant-1.7.0
-x "apache-ant-1.7.0-ccfinder.xls"
-m PARSE_AND_ANALYZE
-t CLONE_TOOL_CCFINDER
-i "ccfinder.ccfxd"
-xargs "C:\Results\CCFinder\apache-ant-1.7.0\src\.ccfxprepdir",""
-testsrcs "src/tests/junit"

For the moment the tool supports five different clone detection tools, as shown in the table below. The value for --extra-args (-xargs) argument depends on the tool, and provides necessary information for parsing the input file. For instance, in this example we have provided two additional strings through this argument, separated by comma.

Clone Detection Tool --tool (-t) --extra-args- (-xargs)
CCFinder CLONE_TOOL_CCFINDER
  1. Path to the special folder that CCFinder generates during analysis (named ccfinder.ccfxd). This folder is located in the examined directory.
  2. [optional] Path to the src folder of the project.
Deckard CLONE_TOOL_DECKARD Not needed
ConQAT CLONE_TOOL_CONQAT Not needed
CloneDR CLONE_TOOL_CLONEDR Path to the folder where the analyzed project was initially located
(This is important because these tools save absolute paths to the analyzed Java files)
Nicad CLONE_TOOL_NICAD

Output of the commandline tool

The commandline tool generates an Excel file, with the same name (appended by -analyze) and in the same path as the input Excel file which contains the results of the analysis. The HTML reports of the analysis can be found in a folder named html.reports which is located in the same folder as the input and output Excel files.

When the tool is used to parse the output of a clone detection tool, a folder named code-fragments in the same path as the input and output Excel files is created, which contains the real code fragments as reported by the clone detection tool. The names of these files are in the format {ID}-{CLONE_NUMBER}, where {ID}' is the ID of the corresponding clone group to which this clone belongs, and {CLONE_NUMBER}` is the clone's index in current clone group. This helps in mapping Excel file rows (clones) to these files.

For those who are interested in performing statistical analysis using tools such as R, Matlab, etc, the tool generates CSV files containing information gathered during analysis. Three CSV files are created, as explaned below. Please note that, the separator in these files is pipe ("|") character. The first row of these files is header.

{INPUT_EXCEL_FILE_NAME}.report.csv

Contains general information about the refactorability analysis results. Every row in these files corresponds to a single clone pair. The columns in the order they appear in the CSV files are:

Column Name Description
GroupIDID of the clone group of this clone pair
PairIDID of the clone pair, created by appending clone indices with a hyphen between them
ClonePairLocation Identifies the relative location of clones. One of these values:
  • 0 Clones are in the same method,
  • 1 Clones are declared in the same class,
  • 2 Clones are in the same java file,
  • 3 Clones are in different classes having the same super class,
  • 4 Clones are in different classes.
IsTestCode Identifies whether the clone is test code or not. It may have one of these values:
  • 0 Both clones are production code,
  • 1 First clone is test code and second one is production code,
  • 2 First clone is production code and second one is test code,
  • 3 Both clones are test code.
#StatementsInCloneFragment1 & #StatementsInCloneFragment2 Number of statements (AST nodes) in clones that were analyzed. Note that, this might be different from what was reported by the clone detection tool, as tool applies filtering on the AST nodes, as discussed in the paper.
#NodeComparisonsNumber of node comparisons that were done to assess the refactorability of the clone
#PDGNodesInMethod1 & #PDGNodesInMethod2Number of PDG nodes in the analyzed method bodies
#RefactorableSubtreesNumber of subtrees in the analyzed methods that can be refactored
SubtreeMatchingWallNanoTimeTime spent in finding the common nesting structures between the compared methods (in Nano seconds)
Status Identifies the status of the analysis, one of the following values:
  • 0 Happens when:
    • At least one of the ASTs didn't have any nodes,
    • Tool couldn't find either first or second methods in the reported regions,
    • Tool could not get the body of either first or second methods for any reason.
  • 1 The bottom-up subtree matching didn't find any common nesting structure, so mapping phase didn't happen,
  • 2 Analysis was done normally.

{INPUT_EXCEL_FILE_NAME}.trees.csv

For every clone pair, more than one subtree may be found which could be refactorable or not. This file contains the information about every subtree. The columns in the order they appear in the CSV files are:

Column Name Description
GroupID & PairIDUsed to identify to which clone pair this subtree belongs
TreeIDIndex of the subtree for this clone pair
CloneTypeType of the clone which could be 1, 2, 3 or Unknown (4)
PDGMappingWallNanoTimeTime spent to map PDG nodes,
#PreconditionViolationsNumber of Precondition Violations,
#MappedStatementsNumber of mapped statements. If this value is more than zero and also #PreconditionViolations is zero, the subtree is refactorable,
#UnMappedStatements1 & #UnMappedStatements2Number of unmapped statements in the first and second subtree,
#DifferencesNumber of differences in the mapped statements.
RefactoringWasOK Was refactoring successful?
TestsFailedAfterRefactoring Were any tests failed after refactoring?
HadCompileErrorsAfterRefactoring Did we have compile errors after refactoring?
CloneRefactoringType Type of the refactoring. One of the following values:
  • 0: Extract local method
  • 1: Pull up to existing superclass
  • 2: Pull up to new intermediate superclass extending common internal superclass
  • 3: Pull up to new intermediate superclass implementing common internal interface
  • 4: Pull up to new superclass extending common external superclass
  • 5: Pull up to new superclass implementing common external interface
  • 6: Pull up to new superclass extending object
  • 7: Extract static method to new utility class
  • 8: Infeasible
IsTemplateMethodApplicable Is template method refactoring applicable for this refactoring?

{INPUT_EXCEL_FILE_NAME}.precondviolations.csv

This file contains information about precondition violations for each subtree, if the subtree was not found to be refactorable, using the traditional . The columns in the order they appear in the CSV files are:

Column Name Description
GroupID, PairID & TreeID Identifies to which subtree this precondition violation belong
PreconditionViolationType Type of the precondition violation, one of the following values:
  • 0: Expression difference cannot be parameterized
  • 1: Expression difference is field update
  • 2: Expression difference is void method call
  • 3: Expression difference is method call throwing exception within matched try block
  • 4: Infeasible unification due to variable type mismatch
  • 5: Infeasible unification due to missing members in the common superclass
  • 6: Infeasible unification due to passed argument type mismatch
  • 7: Unmatched statement cannot be moved before or after the extracted code
  • 8: Unmatched statement cannot be moved before the extracted code due to control dependence
  • 9: Unmatched break statement
  • 10: Unmatched continue statement
  • 11: Unmatched return statement
  • 12: Unmatched throw statement
  • 13: Unmatched exception throwing statement nested within matched try block
  • 14: Multiple returned variables
  • 15: Unequal number of returned variables
  • 16: Single returned variable with different types
  • 17: Break statement without loop
  • 18: Continue statement without loop
  • 19: Conditional return statement
  • 20: Switch case statement without switch
  • 21: Super constructor invocation statement
  • 22: Super method invocation statement
  • 23: Multiple unmatched statements update the same variable
  • 24: Infeasible refactoring due to uncommon superclass
  • 25: Infeasible refactoring due to zero matched statements
  • 26: Not all possible execution flows end in return

{INPUT_EXCEL_FILE_NAME}.compileerrors.csv

This file contains compile errors, after refactoring is done on each subtree. The file has the following columns:

Column Name Description
GroupID, PairID & TreeID Identifies to which subtree this compile error belongs
FileHavingCompileError Relative path to the file that has compile errors after refactoring

{INPUT_EXCEL_FILE_NAME}.testdifferences.csv

This file contains the tests are failed, after refactoring is done on each subtree. The file has the following columns:

Column Name Description
GroupID, PairID & TreeID Identifies for which subtree this test difference exists
TestDifference Name of the test case that is failing after refactoring

{INPUT_EXCEL_FILE_NAME}.exprgapsinfo.csv

This file contains information about the expression differences between the clone pairs for each subtree; i.e., the differences which lead to lambda expressions that has a single expression as its body. The file has the following columns:

Column Name Description
GroupID, PairID & TreeID Identifies to which subtree this expression gap belongs
#Params Number of parameters for the created lambda expression
#ReturnType Return type of the lambda expression
#ThrownExceptions Number of the thrown exceptions by the lambda expression
#NonEffectiveFinalVars Number of non-effectively final variables for which JDeodorant has to make final variables (so that they can be used inside the lambda expression)

{INPUT_EXCEL_FILE_NAME}.blockgapsinfo.csv

This file contains, for each subtree, information about the block gaps, i.e., the gaps for which JDeodorant has to make lambda expressions with a block of statements as their body. The file has the same columns as {INPUT_EXCEL_FILE_NAME}.exprgapsinfo.csv; in addition, it contains two additional columns, namely #Statements1 and #Statements2, which include the number of statements inside the body of the created lambda expressions for the first and second clone pairs, respectively.