Volume 21, Issue 1 p. 3-11
RESEARCH ARTICLE
Free Access

Objective verification of World Area Forecast Centre clear air turbulence forecasts

Philip G. Gill

Corresponding Author

Philip G. Gill

Met Office, Exeter, UK

P. G. Gill, Met Office, FitzRoy Road, Exeter, Devon EX1 3PB, UK. E-mail: [email protected]Search for more papers by this author
First published: 30 January 2012
Citations: 43

ABSTRACT

The two World Area Forecast Centres (WAFC) are responsible for providing aviation hazard forecasts above 800 hPa (6000 ft) including clear air turbulence (CAT) to aviation customers around the world. A new automated gridded forecast for CAT is now being produced by the two WAFCs along with the traditional forecaster-produced Significant Weather (SIGWX) charts. Until now little objective verification has been available for the WAFC products. However, the increasing availability of high-resolution in situ aircraft observations now makes routine objective verification a possibility. The Global Aircraft Data Set (GADS) formed from the fleet of British Airways Boeing 747–400 aircraft is a particularly useful resource. This paper proposes an objective verification scheme using Relative Operating Characteristic analysis to investigate the skill in both the operational SIGWX and new gridded CAT forecasts from both WAFC London and WAFC Washington. Global verification results using GADS data are presented for 4 months during winter November 2008 to February 2009.

1. Introduction

Encounters of clear air turbulence (CAT) are still a major cause of weather related incidents involving commercial aircraft, accounting for around 65% of incidents (Sharman et al., 2005). In order to improve the forecasting of CAT it is important to have an objective method of verifying the forecasts.

In previous studies the verification of CAT forecasts has often been carried out against subjective pilot reports (PIREPS). While these provide useful information on occurrences of turbulence they are not always reliable in terms of spatial and temporal accuracy, contain few null reports of turbulence and are aircraft dependent (Kane et al., 1998).

The use of automated AMDAR reports from aircraft can also give useful information on turbulence for verifying CAT forecasts, but the reports are not regular and can vary between aircraft types (Drouin et al., 2008).

Some studies have been carried out using in situ automated vertical acceleration observations in addition to PIREPS to verify CAT forecasts (Brown et al., 2000). More recent studies have used high-resolution in situ Eddy Dissipation Rate (EDR) as an observation of aircraft turbulence (Takacs et al., 2005). At present, most of these EDR aircraft reports are limited to aircraft over the United States.

As the World Area Forecast Centre (WAFC) forecasts are global it was decided that it would be appropriate to use a verification data set with as near global coverage as practicable. As a result the Global Aircraft Data Set (GADS), described in Section , was chosen: Derived Equivalent Vertical Gust (DEVG) as a measure of aircraft turbulence can be calculated from GADS aircraft observations that are available over a wider area including Europe, Africa and Asia as well as the North Atlantic and the United States. Some case studies of CAT have been examined using DEVG from GADS aircraft data (Turp and Gill, 2008) and using DEVG from AMDAR reports (Overeem, 2002).

Until now only subjective verification of WAFC Significant Weather (SIGWX) forecasts has been possible for a few case studies. In this paper we investigate a more rigorous scheme for objective verification of both automated and forecaster-produced WAFC global CAT forecasts against automated aircraft observations.

2. GADS data

The GADS database was set up in the early 1990s and contains high resolution data over many data-sparse areas. It has increased resolution over the last few years and now includes data from all of the aircraft in the British Airways Boeing 747–400 fleet. The GADS data have been used in previous meteorological studies by several forecasting centres (Tenenbaum, 1991; Rickard et al., 2001; Cardinali et al., 2004).

The global coverage of the data is fairly good although the southern hemisphere is relatively poorly represented. The global distribution of GADS aircraft observations from a 10 day sample can be seen in Figure 1.

Details are in the caption following the image

Map showing a 10 day sample (10–19 January 2009) of GADS aircraft data. Note that because, in many parts of the world, aircraft follow fixed tracks, a single line on this map may represent multiple flights. This figure is available in colour online at wileyonlinelibrary.com/journal/met

Observations are reported every 4 s for several different parameters, including latitude, longitude, altitude, airspeed, aircraft mass and normal acceleration. The maximum and minimum normal acceleration values are obtained over the second preceding the reporting time from measurements made at 32 times per s. The data have some quality control checks applied to help remove erroneous readings. For the verification in this paper only observations at cruise level (altitude above 28 000 ft) were used in order to remove the misleading values that can be obtained whilst aircraft are manoeuvring. As noted in Sherman (1985), acceleration due to manoeuvres on large passenger aircraft is kept low to minimize passenger discomfort. In such cases acceleration deviations rarely exceed 0.3 g.

The derived equivalent vertical gust velocity (DEVG) is an accepted measure of turbulence that can be calculated from these parameters (Truscott, 2000). This velocity, in metres per second, is defined by the formula:
equation image(1)
where ||Δn|| = peak modulus value of fractional deviation of aircraft normal acceleration from 1 g in units of g, m = total aircraft mass in metric tonnes, V = calibrated airspeed at the time of occurrence of the acceleration peak, in knots, and A = an aircraft specific parameter which varies with flight conditions, and may be approximated by the following formulae:
equation image(2)
where
equation image(3)

H = altitude in thousands of feet and = reference mass of aircraft in metric tonnes. The parameters c1, c2, …, c5 depend on the aircraft and values appropriate for the B747–400 were used (Truscott, 2000).

Table shows how values of DEVG relate to the severity of turbulence. It should be noted that no attempt has been made in the present study to categorize the turbulence observations into wind-shear, convection and mountain-wave induced turbulence. Although it would be desirable only to verify wind-shear turbulence forecasts against wind-shear turbulence observations, the categorization of turbulence events can be as difficult as forecasting the turbulence itself. Also, methods of categorizing the cause of the turbulence often rely on a numerical forecast so the categorized observations cannot be considered to be independent of the forecast. The aim of the study is to compare the skill of different turbulence forecasts rather than produce an absolute score.

Table 1. Turbulence severity (Truscott, 2000)
Turbulence severity DEVG (m s−1)
None DEVG ≤2
Light 2≤ DEVG < 4.5
Moderate 4.5≤ DEVG < 9
Severe DEVG ≥ 9

3. WAFC forecasts

London and Washington WAFCs issue the WAFC SIGWX charts four times a day for up to 24 h ahead for various aviation hazards, including CAT. The forecasts are in the form of a chart (Figure 2). For an explanation of the interpretation of SIGWX charts see Lankford (2000). Forecasters delineate on the chart areas of moderate or greater CAT based on the latest model guidance. The co-ordinates of each CAT area are stored as BUFR (Binary Universal Form for Records) data.

Details are in the caption following the image

WAFC SIGWX chart. For an explanation of this chart see Lankford (2000)

New automated gridded forecasts (Figure 3) were introduced in 2006 and provide global CAT forecasts for forecast ranges from T + 6 to T + 36 at five pressure levels. The forecast shown in Figure 3 includes a mountain wave component (Turner, 1999): at the time the scaling of this component was too high so there are ‘hot spots’ of forecast mountain wave turbulence near mountains which are evident in the figure. The forecasts are updated four times a day at each model run (0000, 0006, 1200 and 1800 UTC) from global model data.

Details are in the caption following the image

WAFC London gridded mean CAT potential forecast. Darker areas represent regions where the potential of encountering more severe CAT is greater

Both WAFCs currently produce gridded forecasts using the Ellrod TI1 index (Ellrod and Knapp, 1992). This index combines the vertical wind shear (VWS) and deformation:
equation image(4)
where
equation image(5)
and Deformation (Def):
equation image(6)

The CAT forecasts from both centres have been linearly scaled to form a CAT potential.

The gridded forecasts are issued in a thinned Gridded Binary (GRIB) format. The forecasts were produced on a thinned 1.25° grid (up to 288 × 145 points) on five vertical levels (400, 300, 250, 200, 150 hPa). In late 2010 this format was superseded by (unthinned) GRIB2 format with an additional level at 350 hPa. In this study only the T + 24 GRIB forecasts are assessed as these correspond to the 24 h lead time of the SIGWX charts.

It is recognized that more sophisticated algorithms are in use to predict CAT operationally, see for example Sharman et al. (2005). However, as a baseline it was considered appropriate to verify the algorithms used operationally by the WAFCs.

4. Verification methodology

Due to the high resolution of the observational data and relatively low resolution of the forecast data each aircraft flight track is broken down into segments. A segment length of 10 min was chosen, as this corresponds approximately to 100 km of flight and also the size of the forecast model grid box.

The aircraft observation data along each flight segment are then analysed and the flight segment observation is determined to be turbulent or non-turbulent depending on whether the DEVG value exceeds the turbulent threshold (Table ) at any point in the flight segment.

Three equally spaced points on each segment are then examined and the values of the forecast grid interpolated to each of the points using bilinear interpolation in the horizontal and nearest neighbour interpolation to the nearest pressure level in the vertical. If the value of the model forecast at any of the three points exceeds a turbulence indicator threshold then the flight segment forecast is determined to be turbulent.

A time window of ± 3 h has been used from the validity time of each forecast. This enables all of the observations to be used in the verification to verify the 6-hourly forecasts and increases the sample size. Only observations occurring within the time window are used in the verification.

Using this approach a 2 × 2 contingency table can be set up (Table ) for yes/no forecasts and observations of turbulence. The entries in the table are summed over all segments in a flight and then over all flights in the verification period. The entries in the resulting table can then be used to calculate the hit rate and false alarm rate as follows:

Table 2. 2 × 2 contingency table
Turbulence observed Turbulence not observed
Turbulence forecast A B
Turbulence not forecast C D
equation image(7)
equation image(8)

By repeating this process for various thresholds on the forecast turbulence indicator a series of contingency tables can be built each with a corresponding hit rate and false alarm rate.

Relative Operating Characteristic (ROC) curves are then constructed for each observation threshold by plotting a point for each hit rate and false alarm rate pair. Each point corresponds to a threshold of the forecast turbulence indicator. More information on the use of ROC curves for assessing forecast skill can be found in Mason and Graham (1999). The diagonal represents a line of no skill with points above the line representing forecast skill. The closer the points are to the top left corner of the graph the more skilful the forecasts are at distinguishing between events and non events.

The area under the ROC curve can be calculated and used as a measure of forecast skill.

The SIGWX forecasts were also assessed in this way. The objects defined in the BUFR representation of the SIGWX forecast were first converted to a gridded forecast at the same resolution as the WAFC gridded forecasts. As the SIGWX forecasts are produced for a single threshold (moderate turbulence) rather than the continuous range of values available with the gridded forecast there would be a single contingency table and therefore a single corresponding point on the ROC curve. However, due to the use of bilinear interpolation of the forecast data to the observation this produces a series of points rather than a single point which is perhaps closer to the pilot interpretation of a chart.

Further comparisons were carried out to look at the consistency of areas forecast between the two centres and also to compare the frequency of moderate or greater CAT forecasts across the globe. These comparisons were carried out over the 4 months November 2008 to February 2009. For each forecast the area of the globe forecast by each centre was calculated along with the area of intersection where forecasts were issued by both centres.

5. Results

The performance of the WAFC London and WAFC Washington automated CAT forecasts for moderate or greater turbulence against GADS data can be seen in Figure 4. Moderate or greater turbulence is the severity of most interest to aviation.

Details are in the caption following the image

ROC curve comparing gridded and SIGWX WAFC CAT forecasts against global GADS data for moderate or greater turbulence (DEVG ≥ 4.5) between November 2008 and February 2009. Dark solid line represents WAFC Washington gridded Ellrod TI1 forecasts, light solid line represents WAFC London gridded Ellrod TI1 forecasts, dotted line represents WAFC London gridded Dutton forecasts, dark dashed line represents WAFC Washington SIGWX and light dashed line represents WAFC London SIGWX

The curves for both centres lie very close together and the set of points from the operational SIGWX charts is just below the curve for the corresponding automated forecast for both centres. Results for the Dutton indicator (Dutton, 1980) are also included as this is currently used as guidance by the forecasters at WAFC London. For each ROC curve the area under the curve (AUC) has been calculated and the values can be seen in Table and the number of events and sample size in each latitude band are shown in Table .

Table 3. Comparison of AUC for global CAT forecasts of moderate or greater turbulence
WAFC London WAFC Washington
SIGWX 0.601 0.623
Gridded 0.678 0.688
Ellrod TI1
Gridded 0.607
Dutton
Table 4. Number of moderate or greater turbulent events and sample sizes by latitude band
50–90°N 20–50°N 20°S–20°N 50–20°S
Number of turbulent events 43 137 51 6
Sample size 124 786 130 512 38 461 7979
Observed frequency of turbulence (%) 0.03 0.10 0.13 0.08

For the WAFC SIGWX forecasts, 95% confidence intervals for the AUC were produced by using a re-sampling method as described in Gilleland (2008). Using 5000 random samples the 95% confidence intervals were calculated to be (0.592, 0.655) for WAFC London and (0.568, 0.634) for WAFC Washington. The AUC for the WAFC London (0.678) and WAFC Washington (0.688) automated forecasts clearly lie outside of the corresponding SIGWX confidence intervals, implying that the WAFC automated forecasts are significantly more skilful than the WAFC SIGWX forecasts.

Figures 5-8 show the variation with latitude of the WAFC London and WAFC Washington CAT forecasts. Due to the reduction in sample size when filtering by latitude the results are shown for light or greater turbulence in these cases to maintain a reasonable sample size. The areas under the corresponding ROC curves have been calculated and appear in Table and the number of events and sample size in each latitude band are shown in Table .

Details are in the caption following the image

ROC curve comparing gridded and SIGWX WAFC CAT forecasts between 50 and 90°N against GADS data for light or greater turbulence (DEVG ≥ 2) between November 2008 and February 2009. Dark solid line represents WAFC Washington gridded Ellrod TI1 forecasts, light solid line represents WAFC London gridded Ellrod TI1 forecasts, dotted line represents WAFC London gridded Dutton forecasts, dark dashed line represents WAFC Washington SIGWX and light dashed line represents WAFC London SIGWX

Details are in the caption following the image

ROC curve comparing gridded and SIGWX WAFC CAT forecasts between 20 and 50°N against GADS data for light or greater turbulence (DEVG ≥ 2) between November 2008 and February 2009. Dark solid line represents WAFC Washington gridded Ellrod TI1 forecasts, light solid line represents WAFC London gridded Ellrod TI1 forecasts, dotted line represents WAFC London gridded Dutton forecasts, dark dashed line represents WAFC Washington SIGWX and light dashed line represents WAFC London SIGWX

Details are in the caption following the image

ROC curve comparing gridded and SIGWX WAFC CAT forecasts between 20°S and 20°N against GADS data for light or greater turbulence (DEVG ≥ 2) between November 2008 and February 2009. Dark solid line represents WAFC Washington gridded Ellrod TI1 forecasts, light solid line represents WAFC London gridded Ellrod TI1 forecasts, dotted line represents WAFC London gridded Dutton forecasts, dark dashed line represents WAFC Washington SIGWX and light dashed line represents WAFC London SIGWX

Details are in the caption following the image

ROC curve comparing gridded and SIGWX WAFC CAT forecasts between 50 and 20°S against GADS data for light or greater turbulence (DEVG ≥ 2) between November 2008 and February 2009. Dark solid line represents WAFC Washington gridded Ellrod TI1 forecasts, light solid line represents WAFC London gridded Ellrod TI1 forecasts, dotted line represents WAFC London gridded Dutton forecasts, dark dashed line represents WAFC Washington SIGWX and light dashed line represents WAFC London SIGWX

Table 5. Comparison of AUC for CAT forecasts of light or greater turbulence at varying latitude bands
50–90°N 20–50°N 20°S to 20°N 50–20°S
WAFC London SIGWX 0.696 0.614 0.504 0.519
WAFC London gridded Ellrod TI1 0.788 0.708 0.534 0.703
WAFC London gridded Dutton 0.679 0.660 0.537 0.636
WAFC Washington SIGWX 0.679 0.587 0.499 0.516
WAFC Washington gridded Ellrod TI1 0.804 0.720 0.565 0.732
Table 6. Number of light or greater turbulent events and sample sizes by latitude band
50–90°N 20–50°N 20°S to 20°N 50–20°S
Number of turbulent events 1398 3543 1619 159
Sample size 124 786 130 512 38 461 7979
Observed frequency of turbulence (%) 1.1 2.7 4.2 2.0

For the 50–90°N and 20–50°N latitude bands the curves of the automated forecasts are very close and again the corresponding points from the SIGWX forecasts are below the curves of the automated forecasts. The Ellrod predictor was designed for use in mid latitudes where it can predict CAT in areas of strong vertical wind shear associated with the polar and subtropical jet streams. In the northern hemisphere winter period in this study the polar jet is likely to be at its strongest resulting in more CAT events in the mid latitudes.

As noted in Section , the observations of turbulence may be for convectively induced turbulence, which may help to explain the poor skill in the 20°S to 20°N latitude band. Generally the wind speeds in this latitude band will be much lower. The subtropical jet and associated vertical wind shear is likely to lie outside of this band and so the Ellrod predictor will not perform as well as in the mid latitudes. However, there can be severe CAT caused by strong horizontal wind shear around folds in the sub tropical jet sometimes extending to 10° of the equator (Roach and Bysouth, 2002). These areas are less likely to be forecast with the Ellrod predictor although alternatives such as the Brown predictor (Brown, 1973) might have more skill.

The results for the 50–20°S latitude band do show some differences in the curves from the automated forecasts. The points from the SIGWX forecasts however are still below the corresponding curves of the automated forecasts. The period of the study corresponds to a summer period in the southern hemisphere where the jet stream is likely to be weaker and, therefore, fewer shear related CAT events will be recorded. However, there are likely to be more convectively induced turbulent events which would influence the results.

The highest observed frequency of turbulence was found to be in the tropics (Tables and ), which is also the area of poorest forecast skill (Table ). It is likely that convection is a significant factor in this area, which is not currently included in the forecast. It is also possible that the actual frequency of CAT at higher latitudes is greater than that observed. As forecasts of CAT improve, aircraft will avoid the areas where CAT is forecast and there will be fewer events recorded.

A comparison of the average area of the globe covered by the SIGWX and gridded moderate or greater turbulence forecasts for November 2008 to January 2009 can be seen in Table . Figures 9-12 show the frequency of moderate or greater CAT forecasts for each centre for both the SIGWX and automated forecasts. In all four figures the areas where CAT has been forecast lie in two latitudinal bands, one in each hemisphere. The band in the northern hemisphere is broader, as expected in winter, corresponding to the regions of the polar and sub tropical jet. In the southern hemisphere the band is narrower, as expected in a southern hemisphere summer, and again corresponds to the regions of the sub tropical and polar jets in the southern hemisphere.

Details are in the caption following the image

WAFC London SIGWX moderate or greater CAT forecast frequency between November 2008 and January 2009. Darker areas correspond to areas where CAT is forecast at a higher frequency

Details are in the caption following the image

WAFC Washington SIGWX moderate or greater CAT forecast frequency between November 2008 and January 2009. Darker areas correspond to areas where CAT is forecast at a higher frequency

Details are in the caption following the image

WAFC London gridded forecast moderate or greater CAT forecast frequency between November 2008 and January 2009 at 250 hPa. Darker areas correspond to areas where CAT is forecast at a higher frequency

Details are in the caption following the image

WAFC Washington gridded forecast moderate or greater forecast CAT frequency between November 2008 and January 2009 at 250 hPa. Darker areas correspond to areas where CAT is forecast at a higher frequency

Table 7. Percentage of globe covered by moderate or greater CAT forecasts
WAFC London (%) WAFC Washington (%) Intersection (%)
SIGWX 5.9 3.3 16.7
Gridded 4.9 3.1 18.1

When comparing the frequency of moderate or greater CAT forecast from the SIGWX charts (Figures 9 and 10) it can be seen that there is a slightly higher frequency of CAT in the WAFC London figure, perhaps indicating that the forecaster interpretation of moderate CAT differs between the two centres.

Looking at the frequency of moderate or greater CAT from the automated gridded forecasts (Figures 11 and 12) the frequency of CAT from WAFC London is again greater than that from WAFC Washington. This is likely to be due to differences in calibration of the algorithms at the two centres. The other notable feature is the additional high frequency of CAT forecast over mountainous areas in the WAFC London charts corresponding to the mountain wave component present in the WAFC London forecast but not the WAFC Washington forecast. This is particularly noticeable over the Andes in South America.

6. Concluding remarks

The results show that the skill of the new automated gridded forecasts is better than the skill of the SIGWX forecasts for both light or greater and moderate or greater turbulence.

The use of ROC curves is intended to give the users of the forecasts information so that they can choose the most suitable threshold. The ROC curves clearly present the trade off between hit rate and false alarm rate for varying forecast thresholds. It would be possible to assign cost and loss values to the entries of the contingency table in Table and then calculate the economic value of the forecast as described in Richardson (2001). Calculating the economic value of the forecast for varying cost/loss ratios would enable the user to pick the forecast threshold to maximize the value of the forecast for their operations.

The percentage coverage results also show that although the automated forecasts from WAFC Washington and WAFC London generally have similar skill, the areas that they forecast for are not always the same either in the SIGWX or automated forecasts. The new automated forecasts do show a reduction in area forecast whilst maintaining similar skill and the consistency has improved slightly when compared to the SIGWX forecasts.

In an attempt to both improve consistency between the two centres a trial is currently taking place to produce a single blended forecast by combining the WAFC London and WAFC Washington forecasts. Initial results have shown that a blended forecast also shows an overall increase in skill over the two single centre forecasts. It is likely that in the future both WAFCs will produce a single blended product.

In addition, in the future it is hoped to:
  1. improve consistency of forecasts by analysing verification data and using probabilistic forecasting systems, and,

  2. use verification to test future model upgrades and to assist in algorithm development.

Acknowledgements

This work was funded by the Civil Aviation Authority (CAA). The author would also like to thank Jennifer Mahoney and her team at NOAA for supplying the WAFC Washington GRIB and BUFR data and to Bob Lunnon, Paul Agnew, Dave Forrester, Dave Jones and the two anonymous reviewers for many helpful comments when reviewing this paper.