Classification  E-mail
Tutorials

Data classification is a method of generalization that we use in our GIS work when preparing thematic maps. When we make choropleth maps we are taking what is in reality a huge volume of data and aggregating it to some kind of areal unit- say taking all the responses to a survey of houses and aggregating it (summing it) to a zip code so we can display it clearly.  This is a very powerful visualization tool but we shoul dhave some idea how the software is classifying our data when we produce these types of maps. This example uses ArcGIS as the software.

 

  1. The data we are using represents the thousands of foreclosed properties in our city, we have added these points to the census tract areas and we have over 100 polygons in our data and thus 100 on our map. If we just give every number a different color we end up with a mess that looks like this and means nothing at all:

    Random

  2. This map is worthless- it communicates nothing, just a big mess. The colors dont correspond to anything decipherable. To make this map useful we must classify our data and generate a thematic/choropleth map. We right click our layer of interest, select Properties, and go to the Symbology tab. Now we select the variable (column name) that contains our interesting data, such as the count of foreclosures. The system will automatically use a method of classification called Natural Breaks and this works fine for a lot of data:

    Classify
  3. If we want to use a different method of classification we hit the button called Classification and select a new method from the dropdown as shown below. A common method is called Equal Interval and this technique looks for the lowest and highest values, then divides the range by the number of classes you select- it is generally more effective to use between 3-6 classes- less is not helpful at showing much at all and more gets difficult for folks to see any difference in the colors. Below we can see how the blue lines that correspond to the ranges are equally spaced. The grey lines represent the individual data values- we can see they are spaced mostly near the Y axis.

    Equal Interval

    This method produces the following map: it doesn't show much variation at all and only one tract stands out in the highest range. When you only have one (or very few) areas in a specific range you are not making very good use of that range in general. This method's strength is in identifying outliers in your data.

    Equal Map
  4. Another method of classifying is the Quantile method, the ranges look like this for our data:

    Quantiles

    This map looks as follows: it does give us more idea about what is happening where but perhaps this is not so easy to really infer patterns from. This method groups the values together so there are the same number of values in each range. This often gives us an even spread of values but can obscure imortant details in our data- for example the top range goes from 64 to 143: a large range that may be obscuring important detail- there is quite a difference between 64 and 143 and not such a helpful difference between the values of 0 and 7 which make up the first range.

    Quantile Map

  5. The other method used sometimes is called Standard Deviation, more info on what they are all about here. Thie technique gives us this range distribution:

    Std Dev

    The resulting map is as follows: it shows us how much the data varies from the mean (average) value and this method has some valid uses. Great for demonstrating variation from an average, say for house sale prices, rates of disease prevalence.
    Std Dev Map
  6. Back to the default option now: Natural Breaks.

    Natural Breaks

    Natural Breaks Map
  7. This map and classification work well for this dataset: it does split the large number of values in the low end into a couple of groups so they don't all get lumped together- this preserves some clarity amongst those values. It also gives us a good range size at the top end which highlights those with very high values.  Visually we can see the areas with very high values and also those that fall into the next range down quite clearly: we can get a good, clear picture of what is happening and where- the point of this whole excercise!
  8. sdf

 

Penn State has a great in-depth tutorial here if you want more detail on this topic.

 


Powered by Joomla!. Designed by Spike> XHTML and CSS.