ArrayMiner's Algorithm, Practical Issues

    ArrayMiner uses the advanced optimization technique of Grouping Genetic Algorithm (GGA) to supply clustering solutions of high quality. The general GGA technique has been described in the book "Genetic Algorithms and Grouping Problems" (Wiley, 1998) by Dr. Falkenauer.

The GGA is a gradual improvement technique. In practice, this means that

  • as soon as the algorithm has been initialized, your data are shown in the graphic windows. However, no clustering has been performed yet.
  • when you click the "Optimize" button (red worker icon), the optimization process starts. During the optimization, ArrayMiner uses a semi-stochastic process to iteratively construct new clustering solutions, using the mechanism of the GGA enhanced by special heuristics targeted specifically to the clustering problem. This process can be shown to generate high-quality solutions with high probability. The clustering criterion selected by the user is used to evaluate each newly constructed solution and identify the best ones. Each time a solution is generated that is better than any of the solutions found up to that point, it becomes ArrayMiner's current solution.

   At any point of the optimization process, ArrayMiner can generate the optimal solution to the clustering problem, given the clustering criterion selected by the user. Since it is the optimal solution, it is the best one possible, i.e., it cannot be improved upon. However, as observed in the previous section, it is impossible to know with certainty whether a given solution is the optimal one, so the optimization continues trying to find a better one still.
Obviously, since it is impossible to identify the best possible solution when it is found, the optimization process could run forever. In practice, it is observed that the probability of improving upon any generated solution decreases with the number of solutions examined by the GGA, i.e. with the successive iterations of the algorithm.

   ArrayMiner takes advantage of this observation to signal when it is appropriate to stop the algorithm. When a solution is not improved upon over a number of iterations of the optimization process, the user is advised that the probability of further improvement is sufficiently low and the best solution so far can be taken for the final one, and the algorithm is stopped.

   Given the computational resources at his or her disposal, the user may decide to stop the algorithm at this point or to let it continue searching. In the latter case, the probability of finding the optimal clustering solution will be increased.