Soil Survey: Prediction of Sum of Bases Using K-Nearest Neighbor Approach
In soil survey there is no pedotransfer function available to estimate sum-of-bases (SBs) for the range of soils that occur in the United States. The objectives of this study were to develop a SBs model using the k-nearest neighbor (k-NN) approach and validate this model against an independent dataset. The nearest-neighbor approach passively stores the development (or reference) dataset until the time of application, and then the dataset is searched for the 10 (k) most similar soils to that of the target soil, based on selected attributes (i.e., OC, cation exchange, pH, extractable acidity). The reference dataset was developed from the National Cooperative Soil Survey characterization database in Lincoln, Nebraska. Taxonomic order is used as strata within the reference dataset. The overall model prediction error (or RMSEp) was 2.104 cmol (+) kg-1 with a ME of -0.15 cmol (+) kg-1. Among the soil order groups, the RMSEp ranged from 1.169 to 5.943 cmol (+) kg-1, with the Histosols order having the largest RMSEp. Because of the underrepresentation of organic layers (compared to mineral layers) in the reference database, prediction errors tend to be higher. The overall low prediction errors suggest that the four properties (i.e., cation-exchange, pH, extractable acidity, and organic carbon) were effective in finding the nearest soils (to the target soil) in the reference dataset. In soil survey, the k-NN SBs model provides an efficient and reasonably accurate tool for estimating sum of bases (up to 100% base saturation) when measured data are not available for soils of the US.