Using clustering and algorithms for PV systems 

KiloWattsol has developped methodologies using clustering to evaluate the behaviour of PV Plants. This allows it to provide  technical studies and advise clients all over the world. 

Hierarchical clustering is being used more in the field of photovoltaic power data analysis. This methodology is becoming popular because it allows the user to identify PV systems without extensive analysis and simulation. Clustering is a numerical method of observation sequence grouping according to their similarities. Each observation sequence observation can be described as one point in a multidimensional space. 

The clustering algorithm assigns a distance between each pair of points and groups or separates these points according to their proximity. Observation sequences with similar behaviours are identified with a strong proximity and grouped together. 

KiloWattsol uses clustering in post-construction studies where a large number of signals need analysing. These signals can come from strings, MPP or other measurements. Signals belonging to a similar design must be consistent with one another. 

View of the behaviour of the strings of a PV plant before, during and after an environement event.

Each behaviour group will be isolated by the clustering and analysed independently from the rest of the plant. This method allows groupings on large volumes of data, thousands of signals at a time step of a few minutes, where a human-spreadsheet analysis would simply be impossible.

To observe behavioural changes over time, a series of clustering series are carried out over the analysed period. Each observation sequence is represented by a point in a three-dimensional graph. Two dimensions for observations at a given moment and one dimension for time. 

The colour of each point depends on the group  in which it has been clustered. This visualization allows a quick and synthetic description of the different behaviours. 

After distinguishing the different behaviours in the PV plant, the healthy behaviour is identified. The results are summarized in daily statistics with, among other things: sums, standard deviation, number of signals. The difference between the healthy behaviour and the remaining behaviours leads to the precise quantification and the evolution of the associated losses. 

The other statistics inform us about the quality of clustering and the precision of the quantification of losses.