Soil Mapping Using SoLIM

SoLIM Solutions supports two types of soil mapping: rule-based mapping and sample-based mapping. Both of them are based on the idea that soil types and/or soil properties can be inferred from soil-related environmental conditions (environment covariates). Thus, both require a set of environmental layers (or covariates) (stored in a GIS Database) which depict the environmental conditions indicative of soil conditions. The difference between rule-based and sample-based is that in rule-base project, users need to define a set of rules (Knowledge Base) describing the soil-environment relationship explicitly, while in sample-base project, users need only to provide field sample points (Field Samples). The sample points can be those collected either based on a well-designed sampling strategy or based on ad-hoc sampling activities (meaning that these samples are not collected based any specific sampling design at all).

There are three key ideas underlining the rule-based soil mapping. The first is that SoLIM maps soil type (taxonomic class or user-defined soil concepts), not soil mapping units, under the assumption that soil properties are fairly homogeneous over a small spatial extent (such as 10 meters by 10 meters area). Thus, it takes a raster-based approach which means that it divides the area to be mapped into small pixels and determines the soil type for each pixel. The second idea is that the soil at each pixel is expressed in terms of its similarity to a set of prescribed soil types (user defined categories, also referred to as prototypes in the literature). This idea is often referred to as "fuzzy soil mapping". The object of such mapping is to avoid assigning a single soil type to a given location, but instead to assign similarity values (fuzzy membership values) expressing the similarity of the local soil to each of the prescribed soil types (categories or prototypes). For this reason the user must know the types of soil existing in the area to be mapped. The third idea is that it predicts the similarity value of a local soil to each prescribed soil type (category) by assessing the environmental conditions at the location according to knowledge on how these conditions are related to the development of the each soil type. In other words, SoLIM takes a knowledge-based approach to predict the similarity values. The two key inputs to SoLIM are: data on the selected environmental variables (covariates) related to soil conditions in the area (stored in GIS database), and knowledge that describes the relationships between soils and the environmental variables (referred to as the soil-landscape model and stored in knowledge base).

SoLIM uses an inference engine to link the GIS database with the knowledge base to calculate the similarity values (See Appendix D).For example, a piece of soil-environment relationship knowledge could be "If the elevation is 1000 feet and slope is 12%, then the soil there is a typical soil type A". In this case, the inference engine will use the GIS database to identify all the locations where elevation values are 1000 feet and slope values are 12%, then assign full membership (similarity) to those locations as the soils at these locations are typical cases of soil type A. While in this example we used two environment variables for calculating the membership values, SoLIM can calculate membership values based as many variables as the user wants to use.

Not all locations in the area meet the conditions perfectly. For example, "soil type A occurs in areas with elevation from 500 ft to 1500 ft and slope from 6% to 20%". This does not mean that all places with this range of values will have the same soil (A in this case). Instead, SoLIM acknowledges that places within that range will be more or less similar to another soil type (B for example) depending on the particular values of the environmental variables. In addition, soils in areas just a bit outside of the range may still bear some similarity to soil type A. They will not be perfect examples of typical soil type A, but they will not be totally dissimilar to that type either. For these locations (actually they constitute majority of the landscape), SoLIM will assign partial membership values based on how similar the environmental conditions at other locations to the conditions stated above. SoLIM does this by adopting a rule which is expressed as a function that defines how changes in an environmental variable affect the optimality of that environment variable for a specific soil type. If a range of values of an environmental variable is less optimal for a soil type, i.e., the environment condition is "sub-optimal", then the optimality value would be low for that range of values. If the optimality value is high, on the other hand, we would expect the local soil under that environmental condition to have higher similarity to the typical soil of that type. In other words, an optimality function describes how the similarity to the typical case of a given soil type changes as the local environmental conditions deviate from the ideal conditions.

The inference engine then looks at the optimality values of all environment variables related to the soil and determines an overall measure indicating the similarity of the local soil to a named soil type. This procedure is repeated for all defined soil types, yielding a vector of similarity values for each pixel.

Soil-landscape model (knowledge on soil-environment relationships) can be obtained using different methods. Soil scientists may provide their knowledge in the form of optimality functions directly, or they may use some words to express the knowledge on the relationship between soil type and environmental conditions, or they may identify locations as places where the soil is typical for the given soil type using a topographic map, DEM, or orthophoto. Alternatively, soil-landscape model can be obtained through spatial data mining or purposive sampling techniques (Appendix D).

The basic output of SoLIM is a set of fuzzy membership maps, one for each soil type. If one wants to tag a pixel with a single soil for producing a soil type map, it would be natural to do so by selecting the class with the highest membership. This process is called hardening. It produces a single "best guess" soil type for each pixel. Hardening can be done through the production derivation menu of SoLIM Solutions. If one wants to estimate the value of a given property for each pixel for producing a soil property map, one can use the fuzzy membership values as weights through a weighted average approach. This can be achieved through the Property Map function of the production derivation menu.

SoLIM uses a fuzzy representation for soils. That is, SoLIM allows a pixel to have membership in multiple soil types. Each membership takes on a value between zero (no membership) and unity (full membership). We call these the fuzzy membership values or similarity value of a pixel. The membership values need NOT sum to unity.

Some environmental settings are more conducive to a particular soil type than others. We define the optimality as an indicator of the degree to which a soil is favored by a particular value of an environmental variable. For example, if a soil is less likely to be developed on a slope of 30%, the optimality would be low for that slope value. Like fuzzy membership values, the optimality is a number between zero and unity.

Knowledge on soil-environment relationships is the key to successful soil mapping. SoLIM can accept different kinds of knowledge and express them as rules. Each rule corresponds to one environmental variable and characterizes the relationship between the optimality of one soil type and that environmental variable. Knowledge are divided into two types: global knowledge and local knowledge.

Global knowledge refers to the knowledge that is effective in the whole mapping area. In SoLIM, global knowledge is expressed as instances. Each instance is a representation of the soil scientist's knowledge on the relationship between a soil type and its environmental conditions. An instances contain one or more rules. One rule defines the relationship between one environmental variable and the soil type.

Local knowledge refers to the knowledge that takes effect within limited area. In SoLIM, local knowledge is expressed as occurrences and exclusions. An occurrence is a positive exception, which means a particular soil type will occur in places where the global knowledge does not cover. An exclusion is a negative exception, which means a particular soil type will be very unlikely to occur in some places where the global knowledge does cover. An occurrence/exclusion contains one or more rules plus spatial setting which defines the area under influenced.

Different from rule-based soil mapping, sample-based soil mapping, also referred to as point-based soil mapping, relies on the field samples instead of explicit knowledge (rules). The key ideas underlying sample-base mapping is each field sample reflects an underlying relation between soil and its relative environmental conditions, and this relation would recur over the space. It is assumed that locations with similar environmental conditions will have similar soil type/property. Therefore, each sample can be considered representative over locations with similar environmental conditions. That is, each sample has an individual representativeness. Moreover, the representativeness level of an individual sample for an unsampled location can be approximated by the environmental similarity between the sample location and the unsampled location. Based on this concept, the soil property value or soil type at unsampled locations can be predicted by referring to environmentally similar samples. Besides, the uncertainty introduced by the samples’ representativeness can be quantified by analyzing the nature of environmental similarity values.

SoLIM uses inference engine to link the GIS database with the field samples to estimate soil properties or soil types. Soil information at unsampled locations can be predicted by referring to environmentally similar samples. Through comparing environmental condition of existing samples and that of unsampled locations, environmental similarities between them can be estimated. Soil information at unsampled locations then can be predicted through integrating environmental similarities and the attributes of the corresponding samples. The uncertainty of prediction at each location due to the limitation of samples’ representativeness can be quantified through analyzing the environmental similarities.

When inferring soil property values for an unsampled location, the soil property of the unsampled location is determined by weighting the soil property values of all the environmentally similar samples and the similarity values to these samples. With soil type inference, for each soil type the field samples with that soil type are selected, the environmental similarities between an unsampled location and the locations of the selected samples are measured and then the maximum similarity among the computed similarities is assigned to the unsampled location for the soil type. Therefore, the final results are a set of similarity files, one for each soil type, which is the same as rule-based soil mapping.

Each field sample has its own individual representativeness. The field samples do not need to be based on any sampling designs. You can use samples of any number with whatever spatial distribution as long as the coordinates of the sample location are determined accurately and the observations at that location are made correctly. The coordinates of sample locations should be converted into the same coordinate systems as the GIS data layers which should be in one coordinate system.

The final output (soil property map or soil type similarity files) of the mapping area is obtained by linking the field samples with the GIS database. The basic idea is to infer soil property/type based on similarity between an unknown position and existing field samples in terms of their environmental conditions. Therefore, how environmental conditions are characterized and how similarity is calculated are most critical in the inference.

There are two basic characterization methods in this implementation: single value and probability density: Single value means that the mean value of a given environmental variable over the inference resolution is used in calculation of similarity when the inference resolution (the pixel size) is large than that of the environmental data layer. If the resolutions of the environmental data layers are the same and the inference resolution is also that of the environmental data layers, the single value is simply the original value of the pixel. Probability density means that a probability density function is derived from the values of the pixels over the area of inference pixel when the inference pixel size is greater than the pixel size of the environmental data layer. Probability density method is much more computationally expensive than the single value approach.

Similarity calculation is conducted at two levels: variable level and sample level: At the variable level, the similarities between an unknown position and sample points are calculated using each environmental layer in GIS database. At the sample level, the similarities derived at the variable level are integrated to yield the final similarity between each unknown position and each sample point.

_	For sample-based soil property inference, soil property of the unsampled location is inferred by integrating the soil property values of the environmentally similar samples.

For sample-based soil type inference, SoLIM uses a fuzzy representation for soils types. That is, SoLIM allows a pixel to have membership in multiple soil types. Each membership takes on a value between zero (no membership) and unity (full membership). We call these the fuzzy membership values or similarity value of a pixel. The membership values need NOT sum to unity.

For sample-based soil type inference, the basic output of SoLIM is a set of fuzzy membership maps, one for each soil type. If one wants to tag a pixel with a single soil type, it would be natural to do so by selecting the class with the highest membership. This process is called hardening. It produces a single "best guess" soil type for each pixel. One can use the functions in the Product Derivation menu to produce soil type map and/or property map based on the derived fuzzy membership maps.

SoLIM Solution provides a graphic user interface to construct GIS database, to define the knowledge on soil-environment relationships and import field samples, and to utilize the SoLIM inference engines described above. In addition, it includes modules for users to prepare the environmental data layers (GIS Layers). The procedures are described in detail in the Reference Manual.