The Mass Observation Project (MOP) archive has a wealth of data on the experiences, thoughts and opinions of people in the UK. So far most of these data have been analysed only qualitatively. Now, as part of the ‘Defining Mass Observation’ project, we are exploiting the richness of the MOP archive further, by analysing these data quantitatively.
Developing a database
We are quantifying the MOP data for two purposes. The first purpose is to develop an interactive, easy to use MOP database. This free and easily accessible database will be online and it will allow any researcher to search for anonymised people’s responses to the MOP directives, either by demographical characteristics (such as age cohort, gender, relationship status, occupation or county) of the writers; or by whether they have responded to a certain directive. Once a sample of responses is identified, the researcher will be able to download a list of archive references for the material, and order material when they visit the archive
In order to develop this database we have begun to clean and merge two data files currently held at the MOP: data on writers’ demographic characteristics; and data on their responses to the directives. It has not been as fast a ride as expected. These data are well-suited to searching for directives, and responses to selected directives – the purpose for which these data have been used so far. However, the data are currently recorded too inconsistently for a straightforward database-formation or quantitative analysis. For example, the word ‘Co-habiting’ alone was spelled in seven different ways and we have over 200 different county names – most of which are just variations of formatting and spelling. Nevertheless, although challenging and time consuming, this data cleaning work has been very useful. It has helped us to think of ways in which data collection, recording and keeping at the MOP could be improved in future. It has also raised questions about the practice of archiving metadata (data that supports the discovery, understanding and management of other data) within a research environment.
The second purpose of quantifying the MOP data on writers’ demographics and their responses to directives, is to examine the demographic characteristics of the writers and to analyse their writing behaviour (e.g. response rate, longitudinal response patterns, and responses or non-responses to different themes) and to answer the question that has repeatedly been asked for a very long time ‘How representative MOP writers are of the broader UK population’. In this project we also ask how representative MOP writers are of the active UK volunteering population?’
We have begun to work on this task by cleaning, and recoding the different demographical variables and doing a basic descriptive analysis. At this point in time we know that among the 3,734 individuals who have written to the MOP at least once and who have provided information on their gender, there are nearly twice as many women (n=2,354) than men (n=1,379) – something that researchers that use the MOP should take into account when selecting their sample and making conclusions. We have also found that a typical respondent is born in 1950 but the birth year varies from 1895 to 1997, with the majority of respondents having been born between 1928 and 1972. As can be seen in Table 1, writers are most likely to have been born in the 1920s, and between the 1950s and 1970s; other birth cohorts are well represented too.
Table 1. Writers by birth cohort
We also know that approximately one third of writers were employed and 17%- retired when they first responded to the MOP directive. Interestingly some of the MOP writers have indicated that their employment status is ‘volunteer’.
Table 2. Writers by employment status
Work on other demographical characteristics of writers still needs to be done. For example, data on occupation contain 1,564 unique occupation entries. Some of this uniqueness is due to occupational diversity, and some is due to typing errors and variations in the format of entry. Therefore preparing occupation variables will be a laborious process. To speed it up, and to ensure quality of recoding, we are going to use CASCOT software.
Initially occupational data will be recoded into the Standard Occupational Classification (SOC) (as used by the Office of National Statistics) corresponding to the year when the writers begun to write for the MOP. This will provide us with the data we need to compare the characteristics of MOP writers to the characteristics of the general and volunteer populations in the UK.
We have also identified various national datasets of demographic data, such as Censuses and Citizenship Surveys that can be used in order to analyse the representativeness of MOP writers.
In a few months’ time when this work is finished, researchers will not only be able to search for writers’ responses online, but also see who has been writing and when, and how their characteristics compare to general population and population of volunteers in the UK. In essence, people visiting the MOP website, and the new online interactive database will be able to answer the question ‘Who are the MOP writers?’