Limitations of these techniques
LIMITATIONS OF THESE TECHNIQUES Predictive policing and risk assessment tools rely, to varying extents, on flawed data, criminological assumptions and human choices (Bennett Moses and Chan, 2018). An example of choice is the need to decide whether to have more false positives (where a person or location is flagged as high risk despite actually representing a low risk) or more false negatives (which is the inverse), since it is impossible to completely eliminate error. Further, these tools create probabilistic inferences as to what is likely to occur, not perfect predictions of the future. In other words, these tools have limitations. Some of these are specific to particular techniques: for example, linear regression performs poorly where the relationship between variables is non-linear, while more flexible, non-linear tech- niques have a tendency to over-fit the training set data. Comparison of different techniques, and the circumstances in which each performs well or poorly, requires a more detailed grounding in machine learning (see, for example, Mitchell, 1997) and thus can- not be covered in this chapter. Instead, what we describe here are some general issues that need to be addressed in relation to the use of data analytics for crime prediction.
Limitations in data Some of the limitations to techniques such as predictive policing and offender risk assessment are inherent in the data on which they draw. Data may be inaccurate for various reasons. Data will often be an unrepresentative sample of a larger whole, as where crime data is used as a proxy for criminal activity or social media data for com- munity opinion. There may be systemic bias in the collection of data, as where police are more likely to record crime events in locations where they are deployed or in places where they are trusted by the local community (Harcourt, 2007). Crime data for some types of offence are more likely to be complete than for others, whether due to underreporting or allocation of police resources. There may be differences in how crimes are recorded, both between particular police officers and between different precincts. There may be quirks in particular data sets, as where devices with unidenti- fied internet protocol (IP) addresses are recorded as located in a ‘default’ location. Where there are small, random inaccuracies in Big Data, and analysis focuses on drawing out trends rather than identifying outliers, this may not be important. However, where the inaccuracies are systemic or where the goal is to identify indi- vidual outliers as potential suspects, errors can have important consequences. Where data is drawn from more than a single source, there are further issues. Links will be drawn between particular entities based on the probability that the entities are the same. Further, terms may have different meanings or interpretations when data is collected by different individuals or agencies; this interpretive diversity can be lost when data sets are merged. Data collected or held by different organizations may also vary in terms of reliability and completeness. These issues can be minimized when the data analytics conducted is able to preserve information about provenance, being the source and reliability of the underlying data. Where provenance information is retained, it is possible for an analyst to not only see the inferences drawn but also relevant information about the underlying data on which those inferences were based.
Limitations of approaches based on correlation An important limitation of many data-analytic techniques is that the information or inferences drawn from them are based on correlation rather than causation. For example, in the context of a risk assessment tool, the data may indicate that people with particular characteristics (such as poor educational outcomes) or relationships (such as family members who have been convicted of offences in the past) are more likely to re-offend. These inferences are deduced from patterns in the available data, based on the conduct of different individuals at different times. As explained above, in the case of predictive policing and offender risk assessment tools, many techniques are not concerned with understanding these patterns, finding causal explanations or understanding mechanisms. Where correlations are relied on, without building a causal model, there is a risk that an intervention (such as changing police deployments) will not have the pre- dicted effect (Pearl, 2009). This is because the correlation may be spurious or the two variables may be related in a different way (such as common cause or reverse causa- tion). For example, the assumption behind predictive policing is that having more police in a particular area will deter crime. However, other effects are possible depending on a full causal understanding of the situation. If the cause of high crime in an area is associated with stigmatization and hostility towards police, then increas- ing police deployments may have the effect of increasing crime in that area. If the cause of high crime rates is a lack of suitable activities for young people, then a youth centre might be a more effective use of resources. A causal model is thus helpful when planning an intervention, such as changing the deployment of police officers, and yet is rarely produced by most predictive policing tools and techniques. Reliance on correlation rather than causation is not only problematic when con- templating an intervention. In the context of research, an important driver is understanding as well as merely observing or predicting (Chan and Bennett Moses, 2016). In criminology, Big Data techniques can yield important insights but will only answer particular kinds of research questions. In particular, most such techniques tell very little about why (causation) or how (mechanisms). Where inferences are drawn from correlations, and used as the basis for an inter- vention (such as changes in police deployment), there is also a possibility of feedback loops. For example, if police spend more time in particular locations due to predic- tions that crime is more likely to take place in those locations than elsewhere, then they will also observe more crime in those locations than elsewhere (Harcourt, 2007). Their presence may also encourage a greater reporting of crime. Because predictive policing software generally draws data from police databases, future pre- dictions may be based on the assumption that more crime is taking place at the same locations as previously, even where this is no longer the case. Another example is where criminals change behaviour in order to avoid getting caught in circumstances where the basis for police deployment driven by an algorithm itself becomes predict- able. Similar problems arise for risk assessment tools, particularly since many tools rely on responses given by offenders in interviews.
Ultimately, the effectiveness of interventions based on these kinds of tools can only be demonstrated through evaluation (Bennett Moses and Chan, 2018).
Воспользуйтесь поиском по сайту: ©2015 - 2024 megalektsii.ru Все авторские права принадлежат авторам лекционных материалов. Обратная связь с нами...
|