Raster Local Moran P Value

Posted on  by 



The tool calculates the Moran's I Index value and both a a z-score and p-value to evaluate the significance of that Index. P-values are numerical approximations of the area under the curve for a known distribution, limited by the test statistic. View additional mathematics for Global Moran's I. I am using raster package to calculate the local Moran's I. The example gives the range of Moran's I between - 1 to 2.47. On my own data I have see the value range -3.070423 - 7.228558 How can Moran' I value be larger than 1? Most of the literature point out the value of global Moran's I is between -1 and 1. A Moran's I of 0.94 and the p value of 0.0000 indicate that there is NO significant spatial autocorrelation in the data. Uses a sample of known points to estimate an unknown value to estimate Local or short-range variation. Defined around each raster cell center, and the number of points that fall within the neighborhood is totaled. Each raster item can have its own function chain, which may cause the statistics to be altered significantly (thereby affecting the rendering); for example, the NDVI function, Arithmetic function, or Stretch function can alter the pixel values and change the statistics. 10.1 Local operations and functions. Local operations and functions are applied to each individual cell and only involve those cells sharing the same location. For example, if we start off with an original raster, then multiply it by 2 then add 1, we get a new raster whose cell values reflect the series of operations performed on the original raster cells.

  1. Raster Local Moran P Value Guide
  2. Raster Local Moran P Value List
R tmap spdep sf spData sp
4.0.1 3.2 1.1.5 0.9.6 0.3.8 1.4.4

For a basic theoretical treatise on spatial autocorrelation the reader is encouraged to review the lecture notes. This section is intended to supplement the lecture notes by implementing spatial autocorrelation techniques in the R programming environment.

Sample files for this exercise

Data used in the following exercises can be loaded into your current R session by running the following chunk of code.

The data object consists of a SpatialPolygonsDataFrame vector layer, s1, representing income and education data aggregated at the county level for the state of Maine.

The spdep(Roger S. Bivand 2013) package used in this exercise makes use of sp objects including SpatialPoints* and SpatialPolygons* classes. For more information on converting to/from this format revert back to the Reading and writing spatial data in R Appendix section.

Introduction

The spatial object s1 has five attributes. The one of interest for this exercise is Income (per capita, in units of dollars).

Let’s map the income distribution using a quantile classification scheme. We’ll make use of the tmap package.

Define neighboring polygons

The first step requires that we define “neighboring” polygons. This could refer to contiguous polygons, polygons within a certain distance band, or it could be non-spatial in nature and defined by social, political or cultural “neighbors”.

Here, we’ll adopt a contiguous neighbor definition where we’ll accept any contiguous polygon that shares at least on vertex (this is the “queen” case and is defined by setting the parameter queen=TRUE). If we required that at least one edge be shared between polygons then we would set queen=FALSE.

For each polygon in our polygon object, nb lists all neighboring polygons. For example, to see the neighbors for the first polygon in the object, type:

Polygon 1 has 4 neighbors. The numbers represent the polygon IDs as stored in the spatial object s1. Polygon 1 is associated with the County attribute name Aroostook:

Its four neighboring polygons are associated with the counties:

Raster local moran p value proposition

Next, we need to assign weights to each neighboring polygon. In our case, each neighboring polygon will be assigned equal weight (style='W'). This is accomplished by assigning the fraction (1/ (# of neighbors)) to each neighboring county then summing the weighted income values. While this is the most intuitive way to summaries the neighbors’ values it has one drawback in that polygons along the edges of the study area will base their lagged values on fewer polygons thus potentially over- or under-estimating the true nature of the spatial autocorrelation in the data. For this example, we’ll stick with the style='W' option for simplicity’s sake but note that other more robust options are available, notably style='B'.

The zero.policy=TRUE option allows for lists of non-neighbors. This should be used with caution since the user may not be aware of missing neighbors in their dataset however, a zero.policy of FALSE would return an error.

To see the weight of the first polygon’s four neighbors type:

Raster Local Moran P Value Guide

Each neighbor is assigned a quarter of the total weight. This means that when R computes the average neighboring income values, each neighbor’s income will be multiplied by 0.25 before being tallied.

Finally, we’ll compute the average neighbor income value for each polygon. These values are often referred to as spatially lagged values.

The following table shows the average neighboring income values (stored in the Inc.lag object) for each county.

Computing the Moran’s I statistic: the hard way

We can plot lagged income vs. income and fit a linear regression model to the data.

The slope of the regression line is the Moran’s I coefficient.

To assess if the slope is significantly different from zero, we can randomly permute the income values across all counties (i.e. we are not imposing any spatial autocorrelation structure), then fit a regression model to each permuted set of values. The slope values from the regression give us the distribution of Moran’s I values we could expect to get under the null hypothesis that the income values are randomly distributed across the counties. We then compare the observed Moran’s I value to this distribution.

The simulation suggests that our observed Moran’s I value is not consistent with a Moran’s I value one would expect to get if the income values were not spatially autocorrelated. In the next step, we’ll compute a pseudo p-value from this simulation.

Computing a pseudo p-value from an MC simulation

First, we need to find the number of simulated Moran’s I values values greater than our observed Moran’s I value.

To compute the p-value, find the end of the distribution closest to the observed Moran’s I value, then divide that count by the total count. Note that this is a so-called one-sided P-value. See lecture notes for more information.

Value

In our working example, the p-value suggests that there is a small chance (0.018%) of being wrong in stating that the income values are not clustered at the county level.

Computing the Moran’s I statistic: the easy way

To get the Moran’s I value, simply use the moran.test function.

Note that the p-value computed from the moran.test function is not computed from an MC simulation but analytically instead. This may not always prove to be the most accurate measure of significance. To test for significance using the MC simulation method instead, use the moran.mc function.

Moran’s I as a function of a distance band

In this section, we will explore spatial autocorrelation as a function of distance bands.

Instead of defining neighbors as contiguous polygons, we will define neighbors based on distances to polygon centers. We therefore need to extract the center of each polygon.

The object coo stores all sixteen pairs of coordinate values.

Next, we will define the search radius to include all neighboring polygon centers within 50 km (or 50,000 meters)

The dnearneigh function takes on three parameters: the coordinate values coo, the radius for the inner radius of the annulus band, and the radius for the outer annulus band. In our example, the inner annulus radius is 0 which implies that all polygon centers up to 50km are considered neighbors.

Note that if we chose to restrict the neighbors to all polygon centers between 50 km and 100 km, for example, then we would define a search annulus (instead of a circle) as dnearneigh(coo, 50000, 100000).

Now that we defined our search circle, we need to identify all neighboring polygons for each polygon in the dataset.

Run the MC simulation.

Plot the results.

Display p-value and other summary statistics.

Introduction

Global spatial analysis or globalspatial autocorrelation analysis yields only one statistic to summarize thestudy area. In other words, global analysis assumes homogeneity. If that assumptiondoes not hold, then having only one statistic does not make sense as thestatistic should offer over space.
But if there is no globalautocorrelation or no clustering, we can still find clusters at a local levelusing local spatial autocorrelation. The fact that Moran's I is a summation of individual crossproducts is exploited by the'local indicators of spatial association' (LISA) to evaluate theclustering in those individual units by calculating Local Moran's IRaster Local Moran P Value for each spatial unit and evaluatingthe statistical significance for each IValuei,the Local Moran's I statistic of spatial association is given as:
[{I_i} = frac{{{x_i} - overline x }}{{S_i^2}}sumlimits_{j = 1,j ne i}^n {{w_{i,j}}left( {{x_j} - overline x } right)} ]
where xi is an value of location i, xbar is the mean of the corresponding x, wij is thespatial weight between i and j, and
[S_i^2 = frac{{sumlimits_{j = 1,j ne i}^n {{{left( {{x_j} - overline x } right)}^2}} }}{{n - 1}} - {overline x ^2}]
[{z_{{I_i}}} = frac{{{I_i} - {rm{E}}left[ {{I_i}} right]}}{{sqrt {{rm{V}}left[ {{I_i}} right]} }}]
[{rm{E}}left[ {{I_i}} right] = - frac{{sumlimits_{j = 1,j ne i}^n {{w_{ij}}} }}{{n - 1}}]
[{rm{V}}left[ {{I_i}} right] = {rm{E}}left[ {I_i^2} right] - E{left[ {{I_i}} right]^2}]
[A = frac{{left( {n - {b_{{2_i}}}} right)sumlimits_{j = 1,j ne i}^n {w_{i,j}^2} }}{{n - 1}}]
[B = frac{{left( {2{b_{{2_i}}} - n} right)sumlimits_{k = 1,k ne i}^n {sumlimits_{h = 1,h ne i}^n {{w_{i,k}}{w_{i,h}}} } }}{{left( {n - 1} right)left( {n - 2} right)}}]
[{b_{{2_i}}} = frac{{sumlimits_{i = 1,i ne j}^n {{{left( {{x_i} - overline x } right)}^4}} }}{{{{left( {{{sumlimits_{i = 1,i ne j}^n {left( {{x_i} - overline x } right)} }^2}} right)}^2}}}]
[p = {rm{erfc}}left( {frac{{left| {{I_i} - {rm{E}}left[ {{I_i}} right]} right|}}{{sqrt {2{rm{V}}left[ {I{}_i} right]} }}} right)]
无奈,上面的这些公式计算结果就是不能与ArcGIS计算结果一致,只好放弃Matlab对栅格文件的直接操作,转而调用ArcPy操作。

Example

Fig.1
This tool creates a new OutputFeature Class with the following attributes for each feature in the InputFeature Class: Local Moran's I index (LMiIndex),z-score (LMiZScore), p-value (LMiPValue), and cluster/outlier type (COType). The field names of theseattributes are also derived tool output values for potential use in custommodels and scripts. (GRID_CODE: Original Input Value)
Fig. 2
The z-scores and p-values aremeasures of statistical significance which tell you whether or not to rejectthe null hypothesis, feature by feature. In effect, they indicate whether theapparent similarity (a spatial clustering of either high or low values) ordissimilarity (a spatial outlier) is more pronounced than one would expect in arandom distribution.
A high positive z-score for afeature indicates that the surrounding features have similar values (eitherhigh values or low values). The COType field in the Output Feature Class willbe HH for a statistically significant (0.05 level) cluster of high values andLL for a statistically significant (0.05 level) cluster of low values.
A low negative z-score (forexample, < -1.96) for a feature indicates a statistically significant (0.05level) spatial outlier. The COType field in the Output Feature Class willindicate if the feature has a high value and is surrounded by features with lowvalues (HL) or if the feature has a low value and is surrounded by featureswith high values (LH).
代码运行结果如Fig. 3所示,红框之处的COType显示非空,表示该点通过显著性检验(α=0.05)。

Raster Local Moran P Value List

Fig. 3






Coments are closed