Can “K-Means” Analysis Help Anticipate Future Hurricane Season Activity? 1

A preview of a cluster chart showing TNA values (read article for details below)

A preview of a cluster chart showing TNA values (read article for details below)

The Atlantic hurricane basin has the highest year to year variability, in terms of tropical cyclone frequency and strength, of any basin in the world. Dry air from Africa, a large amount of land (relative to other basins), indirect impacts from the Pacific ocean and many other factors make seasonal activity predictions difficult.

Much work has been done to describe the year to year variance in tropical cyclone frequency. Dr. Gray and Phil Klotzbach (most notably) from Colorado State, as well as many other researchers, have done a tremendous amount for hurricane science by using regression analysis. Their goal: to find statistical, predictive relationships between climate variables and subsequent seasonal hurricane activity.

My background is in marketing and predictive modeling, and I have always been fascinated with these approaches to solve business problems. Now that I have my own business, I have decided to invest some time in looking at hurricane data in the same way I’ve looked at business data in the past, to see if these techniques can help add something to the discourse, and help us understand key factors that can impact hurricane activity in the Atlantic.

To that end, I have been passing some preliminary data sets (acquired from this source) through an analysis technique called K-means Clustering. My idea is to take measurements of key predictors (many discovered by Gray and others) and see if I can “describe” the differences in the data by grouping like seasons into clusters.

I still have a long way to go before I have anything final, but I am encouraged by some initial work looking strictly at Sea Surface Temperatures (SST’s) in two key regions: the Tropical North Atlantic (TNA) and the Tropical South Atlantic (TSA).

I used the 12 months before the start of individual hurricane seasons, making 24 measurements (12 for each predictor) to describe the set of initial conditions used for clustering. For example, I started the 2012 data line with the trailing 7 months from the 2011 season (June to December) and the first 5 months of 2012 before the Atlantic season started (Jan to May). This way, the season I am attempting to describe (2012) doesn’t have any data from that hurricane season, only data available before the start, to see if these descriptive techniques can “tell” us anything about the season to come.

I haven’t filtered for warm/cold phases of the Atlantic Multi-decadal Oscillation (I might in future passes, but I think the SST data will do that without creating other predictors) or any additional variables other than SST’s in these two regions.

Here are the results across my first pass at the North Atlantic SST groupings:

Prelim clusters across the 1952-2012 seasonal data sets.  Note 5 and 6 represent inactive and hyperactive seasons

Prelim clusters across the 1952-2012 seasonal data sets. Note 5 and 6 represent inactive and hyperactive seasons

How to read: Monthly values across predictors on the left (months) before the start of the hurricane seasons. As you read down the clusters, warm values across the cluster averages (means) are shaded red, cold values in blue. At the bottom is the number of seasons (members) within the clusters, and the high/lows for member seasons in terms of ACE (Accumulated Cyclone Energy) are noted at the bottom for reference, followed by the standard deviation of ACE within the member groups.

Clusters 1-4 are not very descriptive. Notice the high variance (and large standard deviations) in the ACE for the “predicted” seasons. These suggest that while the seasons did fall into clusters, the predictors I picked didn’t do a good job describing the variance between the clusters. However, Cluster 5 found a group of very inactive seasons, and Cluster 6 identified a group of extremely active seasons (with little variance between them, ACE wise).

I grouped on two variables, though, so the North Atlantic data only tells part of the story. Here’s the second half:

Clusters of data using the TSA (Tropical South Atlantic) water temp indices.  Note less temp differences between Clusters 5 and 6

Clusters of data using the TSA (Tropical South Atlantic) water temp indices. Note less temp differences between Clusters 5 and 6

So, and again this is very preliminary, there seems to be a key profile that leads to very active Atlantic seasons in the data set: very warm North Atlantic SST’s (not a new finding) coupled with very warm South Atlantic SST’s (not as new). That’s Cluster 6 – very high North Atlantic values with a pronounced pre-season warming in the run-up to the season, and very warm/warming temps in the Tropical South Atlantic.

Cluster 5 is even more interesting (to me) because the profile for those inactive seasons isn’t much different in the South Atlantic (those cluster averages are above normal, but not quite as warm). However, when coupled with cold North Atlantic pre-seasons, very quiet overall Atlantic hurricane years have resulted (low ACE and low variance within the group).

There is still a lot of work left to do. I plan to load up many other variables, including upper level wind anomalies, the SOI (or Nino 3.4 anoms), QBO and other descriptive and available data sets, to see if more data will yield better descriptions of the characteristics of inactive and active Atlantic hurricane seasons.

I am very open to feedback, however. I want to get to something that could help us better understand what drives the seasonal hurricane variation in the Atlantic basin…so if anyone reading this has a suggestion either on process/approach, or different data to consider, please send me a note on the Facebook page, tweet at me, or leave a comment and I will respond as soon as possible.

About Michael Watkins

Mike Watkins is the founder of Hurricane Analytics, a private organization specializing in data visualization and predictive analytics, with a special focus on tropical meteorology. They analyze complex meteorological data and communicate that information in easy-to-understand terms, to help clients prepare and anticipate the disruptive impact of Atlantic hurricanes.

One thought on “Can “K-Means” Analysis Help Anticipate Future Hurricane Season Activity?