SOS9029 - Spatial Data Analysis
The use of spatial data in the social sciences has an established position, dating back to the first studies of urban poverty in the 1800s, and studies of elections in the 1900s. New sources of spatial data, obtained through self-reporting – for example geotagged tweets and volunteered information – both blur the distinction between quantitative and qualitative data, and between researcher and informant. Using the kinds of data that are becoming available does, however, depend on researchers obtaining an adequate knowledge of the challenges involved in representation and analysis, including spatio-temporal data.
The course is intended to provide a survey of topics in the representation and analysis of spatial data in the social sciences. Research practices vary across disciplines, with opportunities for learning from one-another when using data derived from similar sources. It has been claimed that geographical information is becoming pervasive, that digital representations of our surroundings are increasingly entering into daily life. But are these representations unproblematic? Do the available representations impact our choices with regard to understanding and/or inference? Should we expect to extend aspatial methods of analysis to spatial data without modifying, or at least challenging, their assumptions? These are the key topics to be addressed in the course, which will necessarily be open-ended, because the various disciplines using spatial data may reach different conclusions.
Roger Bivand is a British geographer educated at Cambridge and the London School of Economics, and is Professor of Geography in the Department of Economics at Norwegian School of Economics. He is active in development of contributed software for analysing spatial data using the R statistical language, and is an Ordinary Member of the R foundation.
At the end of the course, participants will:
- have a grasp of sources and types of spatial and spatio-temporal data;
- understand how the choice of observational or aggregation entities may affect analysis;
- be able to choose between different techniques for visualizing spatial data;
- understand the concept of spatial autocorrelation, and how it should (and should not) be measured;
- have an overview of typical modelling approaches used with spatial data.
Ph.d.-students at the Department of Sociology and Human Geography register for the course in Studentweb.
Participants outside the Department of Sociology and Human Geography shall fill out this application form.
The application deadline is 15th November 2017.
Formal prerequisite knowledge
Students should have some basic familiarity with elementary statistical and mapping concepts.
Some knowledge of R and RStudio is required. An introduction to R, RStudio and spatial data will be offered on Tuesday 12th December (room 301 Harriet Holters Building) for those who would benefit. If you want to participate on this one-day pre-course, you need to fill out this application form.
The course will be taught as lectures with practical examples, most of which may be reproduced using R, RStudio and contributed R packages. If you wish to track the examples as well as the lecture presentation, and/or would like to use the practical examples to assist in absorbing the material and in planning your written assignment, please bring a laptop with R and Rstudio installed. A script to permit required contributed packages to be installed will also be made available.
The course is in room 301 Harriet Holters Building all days.
Wednesday 13 December:
1. Representing spatial data:
Spatial data is keyed to reference systems just as temporal data is keyed to time zones. Knowledge of how they are constructed is important in integrating data by spatial (temporal) position. Spatial objects have position in space (and time), and use of spatial data implies understanding of these objects. The objects used often inherit characteristics from their sources, be they point locations from GPS or geocoded addresses, administrative boundaries used to aggregate observations, transects or trajectories, or pixels captured by earth observation satellites.
2. The support of spatial data:
Support is the term used to describe the link between the observation and the spatial entity used for observation. Often the entities are not chosen to suit the data generation processes, but are those “to hand”. Using data such as tweet locations opens up the risk of ecological fallacy, frequently also seen as the modifiable areal unit problem (MAUP). Would the observed values change if the shape and placing of the entity were manipulated? Support is closely tied to the design of observations, sampling schemes, and of course electoral re-districting.
Thursday 14 December:
3. Visualizing spatial data:
Mapping may be used to associate names or statements with spatial position, often also implying contextualization. Choices in visualizing spatial data affect the ways in which users perceive content, so creators of visualizations should be aware of alternatives. The ease with which content can be exposed and manipulated on base maps from GoogleTM Map and Earth, or OpenStreetMap is ensnaring, but may deserve caution. Thematic mapping complements topographic mapping by adding visual representations of observations of attributes or variables, which may be measured on various scales, such as presence/absence, intensity, or rate. Colour keys may also influence the perception of users, for example of crime hotspots, which can be made to look more or less alarming, depending on the intentions of the content creator.
4. Spatial processes and autocorrelation:
In situations in which values of a variable of interest can be predicted from its near neighbours, the assumption of independence of observations is not sustained. The presence of spatial processes expressed as spatial autocorrelation may be used to enhance models. However, their presence also affects inference in models which do not take them into account. There are a number of ways to represent relationships between observations, expressing approximations to unobserved spatial processes. These may be used for testing for spatial autocorrelation, but such tests assume that our understanding of the data generation process is adequate, without omitted covariates or inappropriate functional forms.
Friday 15 December
5. Modelling spatial data:
Finally, the course will survey the fitting of spatial regression models for continuous and discrete response variables, as applied in spatial econometrics, political science, and other disciplines. Extensions to hierarchical spatial regression will also be mentioned. It will be pointed out that, in applied work, the best model may be one in which no residual spatial process is found; the parsimonious model may be one in which the correctly specified model has no “spatial story” of spillovers or other unobserved causal factors. However, on occasion, spatial processes are helpful in modelling, sometimes because it is not possible to observe the covariates that are proxied by relationships between neighbouring observations.
Students will be assessed on (1) Attendance and active participation, and (3) A final paper uses a data set of interest to the student to address a spatial research question agreed upon with the instructor in advance of submission.
The paper is to be submitted by 26 January 2018 to firstname.lastname@example.org.
Grades are awarded on a pass/fail scale. Read more about the grading system.