Home/Blog/Forecasting

How to Forecast Solar Production with Machine Learning: A Practical Guide

From numerical weather models to P10/P50/P90 probability bands — how modern solar production forecasts are actually built, and why point forecasts cost utility-scale operators serious money.

How to Forecast Solar Production with Machine Learning: A Practical Guide

The first thing a utility-scale solar operator notices when they put yesterday's day-ahead bid next to actual production is the gap. The forecast said 287 MWh between 09:00 and 17:00. Actual was 269 MWh. The plant operator did nothing wrong. The trader did nothing wrong. The single-point forecast was just wrong — and every megawatt-hour of imbalance hit the settlement window at the imbalance price, not the day-ahead price.

Solar production forecasting is the foundation of every revenue chain in a PV plant. It feeds day-ahead nominations, intraday rebalancing, BESS dispatch, O&M scheduling, and lender performance ratio reports. When the forecast is wrong, the plant doesn't lose generation — it loses revenue. This guide walks through what separates a production-grade forecast from a thin API wrapper, the principles that hold across every serious implementation, and the trade-offs that determine whether a forecasting layer actually moves revenue or just looks busy on a dashboard.

KEY TAKEAWAYS
  • A production-grade solar forecast is a layered system, not a single model — weather data, plant-specific calibration, and a probabilistic output stage all matter independently.
  • Blending multiple weather models materially reduces day-ahead irradiance forecast variance compared to relying on any single source.
  • Outputs should be P10/P50/P70/P90 quantile bands, not single values — single-point forecasts give traders no way to size imbalance hedges.
  • Day-ahead nMAE for utility-scale solar typically lands in the 4–6% range on stable continental sites. Vendors quoting sub-2% MAPE are almost always reporting on clear-sky hours only.
  • A forecasting platform's competitive edge is rarely in any single algorithm — it is in the discipline of data hygiene, calibration cadence, and probabilistic honesty.
Stacked architecture diagram of a solar production forecasting pipeline — three layers (weather data, plant-specific model, probabilistic output) with the revenue chain (DAM bid → intraday → imbalance) flowing downstream
A production-grade solar forecasting pipeline has three load-bearing layers — and the revenue chain downstream is only as good as the weakest one.

Why Single-Point Solar Forecasts Cost Operators Money

A point forecast says "tomorrow at 12:00 your plant will produce 16.3 MWh." It is one number. The trader bids that one number. When the actual is 14.1 or 18.4, the difference goes into imbalance settlement.

Imbalance pricing in most European markets — ENTSO-E zones, MEMO, EPEX, Nord Pool — is structured so that being long during a system-long hour and being short during a system-short hour both cost more than the day-ahead clearing price. The trader has no way to hedge a single-point forecast against that pricing structure. They are betting the entire day's nomination on a single estimate that has zero stated confidence.

A probabilistic forecast — one that says "the P50 is 16.3 MWh, the P10 is 13.7, the P90 is 18.6" — lets the trader bid the P50 and reserve imbalance budget against the P10–P90 spread. Same plant, same weather, different risk posture. That is the gap between forecasting that protects revenue and forecasting that quietly bleeds it through the settlement window.

The Three Layers Every Serious Solar Forecast Has to Get Right

A production-grade solar forecast is not "a model." It is a pipeline of three layers, each of which has to be done well or the whole pipeline degrades silently. Any vendor that talks about their forecast as a single deliverable is hiding most of where the work actually lives.

Layer 1 — The weather data backbone

Every solar forecast starts with a weather forecast. Tomorrow's irradiance, cloud cover, ambient temperature, and wind are the upstream variables that drive everything downstream. The accuracy ceiling of the entire pipeline is set here.

Numerical Weather Prediction (NWP) models are run by national meteorological services on supercomputers. A handful of them dominate European solar forecasting:

ModelRun byTypical resolutionWhere it tends to be strongest
ECMWF (IFS)European Centre for Medium-Range Weather ForecastsHigh-resolution global / regionalMedium-range forecasting across continental Europe
GFSNOAA (US)GlobalWide coverage, frequent refresh, useful complement
ICONDWD (Germany)European nest + globalCentral / Eastern European terrain, convective initiation
HRRRNOAA (US)High-resolution CONUSShort-horizon US deployments only

Each model carries persistent structural bias. Some under-forecast summer convective cloud development. Some over-smooth cloud fields in mountainous terrain. Some are stronger over the continental interior than the coast. These biases are well-documented in the meteorological literature and not removed by tuning — they show up in the same direction, in the same regions, in the same conditions, run after run.

The practical implication: a forecast that relies on one weather source inherits that source's bias surface in full.

Layer 2 — The plant-specific calibration

Weather is not power. The translation from irradiance, temperature, and wind into AC megawatt-hours depends on dozens of plant-specific factors: panel orientation, tilt, tracker behaviour, soiling history, inverter clipping curves, DC/AC ratio, inter-row shading, transformer losses, and the specific quirks of each inverter brand under partial-load conditions.

This is where site-specific machine learning earns its keep. A model trained on the plant's own SCADA history learns the actual measured transfer function — every clipping event, every shading shadow at 16:30 in November, every inverter that runs slightly under spec.

There is no shortage of credible ML approaches for this layer. The interesting question is rarely "which algorithm" but "how much plant history was used, how often the model is retrained, and how the model handles dirty inputs." A long calibration window of clean SCADA data, retrained on a regular cadence, beats almost any algorithmic novelty on the same site.

What matters in practice:

  • Per-block modelling on heterogeneous plants. A plant with mixed orientations, tracker types, or inverter brands is not one transfer function — it is several. Treating it as one averages over the differences and loses accuracy.
  • Treating the plant as a federation. The plant-level forecast is a roll-up of block- or string-group forecasts, not a single monolithic regression.
  • Honest handling of outages and curtailment. Telemetry has gaps. Inverters trip. Curtailment commands suppress output. Models trained without those events flagged learn nonsense.
  • A cold-start strategy. New plants have no history. A serious platform has a defensible answer for the first weeks of operation before the plant-specific model has stabilised.

Layer 3 — The probabilistic output stage

A regression model trained to minimise mean squared error outputs the conditional mean — a single number. That is the single-point forecast that loses traders money in imbalance settlement.

To get P10/P50/P70/P90 quantile outputs, the model has to be trained with an asymmetric loss function that penalises overestimates and underestimates differently at each target percentile. The 90th-percentile output is trained to be exceeded only 10% of the time; the 10th-percentile output to be exceeded 90% of the time. Each percentile is its own training objective, not a fudge factor applied to a central estimate.

The result is a forecast that explicitly states uncertainty. Wide P10–P90 bands mean "I'm not confident in this hour"; narrow bands mean "I'm confident." A trader looking at the ribbon can put a euro figure on the risk of every hour of the next delivery day.

Probability band visualization — actual production overlaid on P10/P50/P70/P90 forecast ribbons across a 24-hour day-ahead window, showing wider bands during partly cloudy hours and narrower bands during clear morning hours
Probabilistic forecast output: actual production (dashed) overlaid on the P10–P90 confidence ribbon. Band width = stated uncertainty per hour.

Why One Weather Model Is Dangerous

Single-source dependence is the most common failure mode in commercial solar forecasting. Every NWP model has a documented bias surface. A forecast layer that relies on one source inherits that surface in full — and the operator's revenue gets exposed to it day after day.

The well-established correction is ensembling: blending the outputs of multiple weather models, weighted in a way that reflects each model's track record at the specific plant location and the current weather regime. Published benchmarks across European solar sites consistently show ensemble fusion reducing day-ahead irradiance forecast variance by a meaningful double-digit percentage versus the best single member.

The simplest version — a flat average across two or three models — already captures most of the gain. More sophisticated implementations weight each contribution based on:

  • Site geography — terrain, climate zone, distance from significant water bodies
  • Forecast horizon — different models hold up differently from D+1 to D+7
  • Weather regime — frontal passages, high-pressure persistence, and convective initiation reward different model strengths
  • Recent track record — rolling skill scores per model per site, refreshed regularly

The takeaway for an asset owner evaluating a forecasting platform: ask which sources feed the forecast, not which single source. A vendor whose entire forecast traces back to one NWP feed is shipping that feed's bias to your trading desk.

Side-by-side bias comparison chart — three individual NWP models plotted against actual irradiance across one summer week, with a blended ensemble line tracking actual much more closely than any single model
Hour-by-hour irradiance forecasts: three individual NWP models (faded) vs a blended ensemble (solid) vs actual measurement (dashed). The ensemble tracks reality much more tightly than any single member.

From Weather to Power: What the Translation Layer Actually Does

Once a calibrated weather signal feeds the model, the next layer translates atmospheric variables into expected production. This is not a trivial multiplication; it is a sequence of physical conversions and learned corrections.

The input space is rich. A forecast model for a single hour typically consumes irradiance components, cloud decomposition, ambient and modelled cell temperature, wind, time-of-day and solar geometry features, recent plant-state lags, and a soiling proxy. The model learns the joint distribution of these features against measured output.

Two practical observations about getting this layer right:

Per-block modelling matters. A heterogeneous plant — mixed tilt, mixed tracker, mixed inverter brands — is not a single transfer function. A platform that treats it as one and runs a single plant-wide model is leaving accuracy on the table; one that treats it as a federation of blocks, each with its own model, rolling up to the plant total, captures it.

Cold start has to be addressed honestly. A brand-new plant has no SCADA history to train on. The defensible approach is a baseline trained on similar plants in the same climate zone, used until the local model has accumulated enough plant-specific history to take over. Without this, the first weeks of plant operation are forecast blind.

P10, P50, P70, P90: What the Percentiles Actually Mean

The percentile notation that dominates solar forecasting follows the probability of exceedance convention:

  • P10 — the production value that the plant will exceed 90% of the time (the conservative floor)
  • P50 — the median forecast; production will be above this 50% of the time
  • P70 — the value production will exceed 70% of the time (useful as a moderately conservative bid level)
  • P90 — the value production will exceed only 10% of the time (the aggressive ceiling)

The four outputs come from training the model with asymmetric loss at each target percentile, rather than deriving them post-hoc from a single mean prediction. The width of the P10–P90 band for any given hour is the model's stated uncertainty for that hour. Clear days produce narrow bands. Convective summer afternoons produce wide bands.

The point is not to give traders four numbers instead of one. The point is to give them a hedgeable distribution instead of a guess.

Measuring Forecast Accuracy: The Metrics That Matter

A forecasting platform that won't publish its accuracy metrics is hiding something. The four metrics worth tracking:

MetricWhat it measuresWhere it matters
nMAEMean absolute error normalised by installed AC capacityThe headline accuracy number; 4–6% is the day-ahead benchmark for stable continental sites
MAPEMean absolute percentage error against actual generationSkews high in low-irradiance hours; less useful than nMAE alone
Pinball loss (per quantile)Calibration of the probabilistic outputTells you whether your P90 band is actually a 90% band, or whether the model is overconfident
CRPSContinuous ranked probability score across the full distributionThe right metric for comparing two probabilistic forecasters head-to-head

A platform that quotes a 1–2% MAPE without context is either operating on a single perfect plant, cherry-picking sunny days, or computing the metric in a way that excludes the hard hours.

How a Probabilistic Forecast Feeds the Day-Ahead Bid

A solar production forecast is not a deliverable in itself. It is the input to the bid construction process that the BRP runs every morning before the day-ahead gate closes.

The typical flow on an SDAC zone (and equivalent on local exchanges):

  1. T-24h — Day-ahead forecast finalised for the next delivery day (D+1). Output is a 24-row vector of P10/P50/P70/P90 per hour.
  2. T-22h to T-13h — Bid construction. The trader takes P50 as the central nomination, sizes the P10–P90 envelope as the risk budget, and decides whether to:
    • Nominate at P50 and accept the imbalance exposure on either tail
    • Hedge the long side with intraday short positions if P10 is far below P50
    • Buy back the short side with intraday longs if P90 is far above P50
  3. T-12h — Gate closes (12:00 CET for SDAC). Schedule confirmed by 12:55 CET.
  4. D-day — Live dispatch. The intraday continuous market lets the BRP re-position every 15 minutes as actuals diverge from the schedule.
  5. D+1 — Imbalance settlement. The cost of being long or short during each settlement period is netted against the day-ahead clearing price.

The probabilistic forecast determines all of this. A trader bidding from a single-point forecast has no idea how to size hedges. A trader with the full P10/P90 ribbon can put a euro figure on the risk of every hour of the schedule.

For the full mechanics of day-ahead bid construction with probabilistic inputs, see the ENTSO-E day-ahead bidding playbook.

Common Pitfalls in Production Solar Forecasting

After working with utility-scale operators across the Balkans and Central Europe, the same handful of mistakes keep recurring:

  1. Training on dirty SCADA data. Clipping events, inverter outages, comms gaps, and curtailment all need to be marked and either excluded or modelled. A model trained on "the inverter was offline for three hours" without that flag will learn that the plant produces nothing at noon on cloudy days.
  2. Ignoring plane-of-array calculation. Feeding raw horizontal irradiance into the model instead of POA loses material accuracy on tilted or tracked systems.
  3. Treating a heterogeneous plant as a single transfer function. A mixed-tilt, mixed-tracker, mixed-inverter site needs to be modelled as a federation of blocks, not one big model.
  4. Letting the model drift. A model calibrated months ago is already wrong. Regular retraining against recent data is non-negotiable; the cadence depends on plant scale, but anything monthly or slower drifts noticeably through weather-regime transitions.
  5. Hiding the uncertainty. Outputting only P50 to keep the dashboard "clean" wastes the entire benefit of probabilistic modelling. Show the band — that is the whole point.
  6. No defence for cold start. A new plant without a documented baseline strategy is forecast-blind for its first weeks of operation.
  7. Mixing forecasting horizons in the metric. A vendor that quotes "97% accuracy" without specifying horizon is mixing easy and hard cases to flatter the headline.

Frequently Asked Questions

There is no single method — the highest-accuracy systems for utility-scale PV are layered pipelines that combine an ensemble of multiple numerical weather prediction sources, site-calibrated machine learning trained on each plant's own history, and quantile-loss training to produce P10/P50/P70/P90 probabilistic outputs rather than single-point estimates. Day-ahead nMAE for well-implemented systems typically lands in the 4–6% range on stable continental sites and 6–9% on sites with high convective variability.

Conclusion

A serious solar production forecast is not a model — it is a discipline. Three layers that have to be done well, an ensemble of weather sources to neutralise structural bias, site-specific calibration that respects the plant's actual transfer function, and a probabilistic output that lets the trading desk make hedgeable decisions instead of single-point guesses. Platforms that nail all four ship measurably better forecasts than those that nail one or two.

DYNVOLT's forecasting module implements this stack end-to-end, with day-ahead accuracy published live to your dashboard against your plant's own actuals. See the forecasting module for the architecture, the energy markets module for how the forecasts feed bidding, or request a 14-day pilot and benchmark your plant against your current solution.

See it on your plant.

30-minute walkthrough on your real assets. Bring an inverter brand and a country — we'll show SCADA, AI forecasting, and ENTSO-E market routing wired together.

Request a demoSee the forecasting module →