Главная | Обратная связь | Поможем написать вашу работу!
МегаЛекции

Assumptions embedded in techniques




Assumptions embedded in techniques

Any technique used to analyse data involves assumptions. In machine learning, all approaches involve inductive bias in the sense that they will favour some types of solutions over others (Mitchell, 1997: 39–45). For example, they may prefer simpler to more complex solutions, include or exclude particular variables, give more or less weight to outliers, or they may assume a particular model or theory of crime. An example of the latter is PredPol. PredPol assumes that the likelihood that a crime will take place at a particular location depends on a combination of a static background crime rate at that location combined with an increased risk connected to recent local events. Similarly, in deploying risk assessment tools, a decision must be made as to the extent of any preference for false positives or false negatives. The point here is that an understanding of the particular tool, and in particular the assumptions and choices embedded within it, will affect how well it performs according to any chosen metric. It is also important to bear in mind that different approaches, relying on dif- ferent data sources and variables, will often produce different results. In this sense, there is no purely ‘objective’ prediction or inference, but rather one based on choices made in the process of collecting and accessing data, selecting an analytic approach or technique, and determining the metrics on which it will be judged (and on the basis of which its performance will be optimised).

 

Differential impact and discrimination

When used in predictive policing and risk profiling, Big Data analytics can have a differential impact on particular subpopulations. There are three kinds of potential discrimination that may occur (Schauer, 2003).


First, the discrimination may be due to bias in the data (as where police record a higher proportion of crimes committed by one group relative to another) or the way in which the data is analysed. In this case, the discrimination is a result of finding a correlation that does not really exist.

Second, the discrimination may be due to overly coarse generalization. Data is analysed to find correlations among selected variables. If relevant variables are omitted, then the process may identify proxy variables that are less well corre- lated. Insurance companies do this when they require higher insurance premiums for young people, rather than for those who drive dangerously. The former vari- able is a rough approximation of the latter, but data on the latter variable is significantly harder to procure. As a result, insurance companies discriminate against young drivers rather than the more accurate category of dangerous driv- ers. In this case, the correlation exists, but it is not the best variable on which to base a prediction. And it is potentially unfair to young drivers who are not danger- ous drivers.

The third type of discrimination occurs where a correlation not only exists (in the world as well as in the data) but is also a strong predictor. Even in this case, there remain controversial questions as to whether it is unjust to target or profile an indi- vidual or neighbourhood based on the factor concerned, particularly where it involves a sensitive attribute such as race or religion such that discrimination may be stigmatising or produce excessive separation (Schauer, 2003: 147–51, 189–90). In other words, just as it may be unacceptable to discriminate in employment, even where this could be statistically justified, so too it can be argued that it is unjust to discriminate in the context of law enforcement and criminal justice on the basis of predictive tools, at least where this would negatively impact traditionally stigmatized groups. Predictive power may not be the only relevant factor in determining the appropriateness of using a particular tool. One may also be concerned, for example, about negative social impacts and, in particular, differential impacts on particular subpopulations.

If one is concerned about these impacts, it is insufficient simply to remove particu-

lar variables. The alleged differential impact of the COMPAS tool on African Americans (Angwin et al., 2016) is said to occur despite the fact that the company, Northpointe Inc. (now equivant), claimed not to use race as a variable. This is because if A correlates with both B and C, then the omission of A from the data does not prevent the identification of a correlation between B and C. So, if race is omitted but correlates with both education levels and entries in a crime database (say), then the algorithm may identify that those with low education are more likely to commit crimes. This may have a differential impact on the relevant racial group. Indeed, in some circumstances, omitting race as a variable makes the problem worse. For exam- ple, if a minority racial group has poor educational outcomes but (for that group) this does not correlate with re-offending, while in the majority racial group there is such a correlation, then the minority group may be assessed as higher risk if race is omitted as a variable.


There are no simple answers to how to manage the issue of discrimination in the use of these kinds of tools. There is, however, work being done to develop discrimination- sensitive approaches to prediction (for example, Kamiran et al., 2013). However, as explained in Verwer and Calders (2013: 255), without a causal model explaining why a particular correlation occurs, the techniques will result in a reduction in accuracy, positive discrimination (which may itself be controversial), or both.

 

Поделиться:





Воспользуйтесь поиском по сайту:



©2015 - 2024 megalektsii.ru Все авторские права принадлежат авторам лекционных материалов. Обратная связь с нами...