Ethical issues in WiSP 1: Building the corpus

Maria Leedham

The WiSP project has sought and obtained ethical approval from The Open University, the ESRC and relevant Local Authorities to collect texts from social workers. But this is only a starting point in considering the ethical issues involved (e.g. see discussion in Rier, 2004).

This blogpost considers some of the ethical issues in building the corpus, and how we dealt with these dilemmas in our quest for a constant reflective approach (Kubanyiova, 2008).

Issue 1: Consent

Who should we ask for consent?

Standard ethical procedure is to ask for consent of research participants. But who should we ask? – The social worker and writer of the text? The Local Authority responsible for assigning the case? Or the service user and focus of the text?

What we did

  • gained consent from each participating Local Authority to access texts written by social workers.
  • gained permission from the social workers-writers of the texts.
  • We did not, however, ask service users for their consent to use texts written about them as this would be impractical. In this we are acting in a similar way to medical researchers (Mann et al, 2016). However, all texts are carefully anonymised, which leads us to….

Issue 2: Anonymisation

Why is this important?

The 4,600 casenotes, emails, assessment reports and other documents in the WiSP corpus were written by social workers as part of their work in safeguarding often vulnerable individuals. These texts thus contain personal data, some of this highly sensitive.

What is personal or sensitive data?

  • Personal data relates to data that can identify a service user or other individual such as their home address or their date of birth.
  • Sensitive personal data is information that has a greater risk of damage to the individual if it is misused or mishandled, such as criminal convictions, and physical or mental conditions (Elliot et al, 2016).
  • Each Local Authority carried out the initial redaction of personal identifying features (name, GP name, etc) on site before the research team were allowed access to the texts. Within this it became apparent that different redactors had different views on what and how much data was or could be considered personal or sensitive, and so should be coded out. Equally, as anonymisation took place on authority sites, redactors created codes as they found content that needed to be anonymised. Therefore some anonymisation codes were shared across redactors, sites and domains (e.g. [OP] for Other Professional), whereas others were quite specific to a given context (e.g. [SG] for ‘special guardian’ in children’s services, and [CPN] for ‘community psychiatric nurse’ in adult services).

What did we do?

  • asked redactors to replace personal details with codes in order to preserve information
  • added further codes (e.g. for rare medical conditions).
  • removed all dates
  • See Figure 1 for an example text
 Title:

Hospital Discharge Team Involvement: Discussion with [SUD] – re: concerns.

Contact date: [DATE]

I explained that [OT] on Ward has highlighted further concerns about [SU]’s safety at home.

[…]

[SUD] also explained that she feels that [SU] is now seeing things and having hallucinations.

Figure 1: An example of an anonymised text

[SUD] = service user’s daughter

Issue 3: Allowing access to the data

As part of the conditions around our ESRC grant, the research team have to archive the corpus data in the UK Data Archive (UKDA). We grew increasingly concerned, however, about how detailed the metadata should be, who might access the data and how it might be used.

What we did

  • decided to archive the anonymised corpus at the most secure level (‘permission’ level), meaning researchers are required to first register with the UKDA and then email the project lead, Theresa Lillis
  • created two versions of the corpus: one for the research team, and one for archiving. The latter has undergone further anonymisation and delinking, has reduced metadata, and has had some highly sensitive texts removed.

What texts are not archived?

  • Texts which social workers did not want to be shared beyond the research team.
  • A small number of texts which contained multiple reference points to individual’s lives (e.g. a 10,000 word chronology of a child’s life).

Issue 4: Jigsaw identification

What does this mean?

‘Jigsaw identification’ refers to the process of piecing together information from different sources to re-identify an individual. If you have several data items on one individual, you might eventually be able to find out who they are.

We became concerned about the chance of an individual social worker or service user being re-identified simply through the sheer number of data items (e.g. corpus texts, interview transcripts, fieldnotes).

What did we do?

  • changed from a system of labelling social workers as ‘SW001’, ‘SW002’ and so on, where the number given would not only identify an individual social worker but also tell us which authority a social worker belonged to, to ‘SW’. This meant data users could no longer link a social worker interview with a casenote or a researcher fieldnote, or compare texts and other data from social workers in different participating authorities. Similarly all service users became ‘SU’.
  • gave each text a randomly-generated 4-digit number so that each text is ‘standalone’ and cannot be linked to other texts

We will monitor the views and downloads of the dataset from the UKDA and will ask all users to inform us of how they use the data.

References

Elliot, M., Mackey, E., O’Hara, K. & Tudor, C. (2016). The Anonymisation Decision-Making Framework. Manchester: UKAN. http://ukanon.net/wp-content/uploads/2015/05/The-Anonymisation-Decision-making-Framework.pdf. Accessed on 190618.

Kubanyiova, M. (2008). Rethinking research ethics in contemporary applied linguistics: the tension between macroethical and microethical perspectives in situated research. Modern Language Journal, 92(4): 503 -18.

Mann, S.P., Savulescu, J. & Sahakian, B. (2016). Facilitating the ethical use of health data for the benefit of society: Electronic health records, consent and the duty of easy rescue. Philosophical Transactions of the Royal Society A: Mathematical, Physical & Engineering Sciences, 374 (2083), 1-17.

Rier, D.A. (2004). Publication visibility of sensitive public health data: When scientists bury their results. Science & Engineering Ethics, 10 (4), 597-613.

Leave a Reply