Using clustering and algorithms for PV systems

KiloWattsol develop some methodologies using clustering to evaluate the behavior of PV Plants. This allows it to provide technical studies and advises clients over the world.


The field of photovoltaic power data analysis is using hierarchical clustering. This methodolody is becoming popular because it allows to identify PV systems without extensive analysis and simulation. Clustering is a numerical method of observation sequence grouping according to their similarity. Each observation sequence observation can be described as one point in a multidimensional space. Using clustering can show cluster and it is difficult to grasp the similarity between clusters so algorithm is performed to calculate the relationship between clusters.

The clustering assigns a distance between each pair of points and groups or separates these points according to their proximities. Thus, observation sequences having similar behavior are identified with a strong proximity and thus grouped together. To illustrate this we invite you to watch a video made by kiloWattsol.


KiloWattsol uses clustering in its post-construction studies where a large number of signals need analysing. These signals can come from strings, MPP or other measurements. Signals belonging to a similar design must be coherent with one another.


View of the behavior of the strings of a PV plant before, during and after an environement event.

Each behaviour group will be isolated by the clustering and analysed independently from the rest of the plant. This method allows groupings on large volumes of data, thousands of signals at a time step of a few minutes, where a human-spreadsheet analysis would simply be impossible.


To observe behavioural changes over time, a clustering series is carried out over the analysed period. Each observation sequence is represented by a point in a three-dimensional graph. Two dimensions for observations at a given moment and one dimension for time.

The color of the points depends on the group in which they were clustered. This visualization allows a quick and synthetic description of the different behaviors.


After distinguishing the different behaviours in the PV plant, the healthy behavior is identified. The results are summarized in daily statistics with, among other things: sums, standard deviation, number of signals. The difference between the healthy behaviour and the remaining behaviours leads to the precise quantification and the evolution of the associated losses.

The other statistics inform us about the quality of clustering and the precision of the quantification of losses.