Understanding how species are distributed in the environment is increasingly important for natural resource management, particularly for keystone and habitat forming species, and those of conservation concern. Habitat suitability models...
moreUnderstanding how species are distributed in the environment is increasingly important for natural resource management, particularly for keystone and habitat forming species, and those of conservation concern. Habitat suitability models are fundamental to developing this understanding; however their use in management continues to be limited due to often-vague model objectives and inadequate evaluation methods. Along the Northeast Pacific coast, canopy kelps {Macrocystis pyrifera and Nereocystis luetkeana) provide biogenic habitat and considerable primary production to nearshore ecosystems. We investigated the distribution of these species by examining a series of increasingly complex habitat suitability models ranging from process-based models based on species' ecology to complex Generalised Additive Models applied to purpose-collected survey data. Seeking limits on model complexity, we explored the relationship between model complexity and forecast skill, measured using both cross-validation and independent data evaluation. Our analysis confirmed the importance of predictors used in models of coastal kelp distributions developed elsewhere {i.e., depth, bottom type, bottom slope, and exposure); it also identified additional important factors including salinity, and interactions between exposure and salinity, and slope and tidal energy. Comparative results showed that cross-validation can lead to over-fitting, while independent data evaluation clearly identified the appropriate model complexity for generating habitat forecasts. Our results also illustrate that, depending on the evaluation data, predictions from simpler models can outperform those from more complex models. Collectively, the insights from evaluating multiple We began with an HSI model using Depth, bottom type, Salt, and exposure, representing habitat suitability for canopy kelp as: decreasing non-linearly with Depth; increasing linearly with the probability of rocky reefs and salinity {within a defined range); and optimal across a range of exposures {Fig. 2). We used the three variants of bottom type {RF, RMSM, and BoP) to assess the sensitivity of the HSI models to this predictor. We assumed all factors contributed equally, and that all were essential {i.e., low on any factor led to low habitat suitability). We then examined the strength of association between each potential predictor variable and our survey data using univariate GLMs. We examined correlations between linear, quadratic, and cubic forms of all predictors with the presence-absence of each species {i.e., giant and bull kelp) as well as the combined canopy to inform how to best to generalise the observations {see Appendix 2 in Supplemental Material for details). We used the significant predictors from this univariate analysis in a structured variable selection approach to create four multivariate GLMs with increasing complexity. We began by defining a GLM to represent a parametric form of the HSI model, and a second using only the linear forms of the predictors identified as significant for the combined canopy kelp response variable. In the third model we increased model complexity by considering higher-order polynomial forms of the predictors. These polynomials always entered the models with all its terms. We used higher order polynomials to allow the GLMs to approach the complexity of GAMs and facilitate model comparison. Finally, for the fourth GLM, we considered interaction terms. At each step, the best model was selected using Akaike's Information Criterion {AIC), after considering both single term addition and single term removal. AIC also served as a stopping