BACKGROUND

Coupled Changes

Beside structural, also hidden dependencies exist between files or other artifacts that changed together frequently. For the first time they were introduced as evolutionary couplings [BKPS] or logical dependencies [GHJ]. This kind of dependencies can be detected by analyzing the version history of the software. Developers change various source files with different frequency during software development. The notation of change couplings was introduced in [FGP] and later in [DGL08]. Coupled file changes describe a situation where the developer changes a particular file and also changes another file afterwards. The extraction of coupled file changes, the feedback of developers, the influence of coupled file changes on maintenance tasks as well as on the strategy of help seeking during maintenance tasks has been investigated in [RW1]. We combine the data from three different sources to build coupled file change suggestions: file changes and commit attributes from the Git versioning system, issue attributes from the issue tracking system and the attributes from the documentation archives.

Mining Frequent Itemsets

The discovery of frequent item sets is a common data mining technique. In the context of this work it is often used to extract logical couplings from versioning history. It was originally presented to analyze transaction data of customer buying behavior in for supermarkets [AIS93]. The FP-Growth algorithm allows frequent item set discovery without candidate item set generation. This algorithm is considered to be faster and more memory efficient than the Apriori algorithm [HPYM04]. FP-Growth scans the database only twice. This methods uses partition and divide-and-conquer methods [HPYM04]. First, it compresses the frequent itemsets database into data structure called frequent pattern tree or short FP-tree together with the relations between the elements. Afterwards, the database is divided in to conditional databases related to the frequent items, whereby each of the conditional databases is mined one by one to create the final set of frequent items. There are many frameworks and libraries to perform data mining on the data from transactional databases. We use on open source pattern mining framework called SPMF. It contains libraries and implements a large set of data mining algorithms. It is specialized in pattern discovery in transactional databases like frequent itemsets, sequential patterns and association rules mining. It is an Java based and cross platform framework and includes various algorithms and documentation. We use the FP-Growth algorithm with string support for the mining of the files in the Git commits of the projects included in our studies.