Skip to main content
Dylan Wiliam

    Dylan Wiliam

    For anyone who understands the logic of null-hypothesis significance testing, the so-called “replication crisis” in the behavioural sciences (Bryan et al., 2021) would not have come as much of a surprise. Since the pioneering work of... more
    For anyone who understands the logic of null-hypothesis significance testing, the so-called “replication crisis” in the behavioural sciences (Bryan et al., 2021) would not have come as much of a surprise. Since the pioneering work of Carlo Bonferroni (1935) – and subsequent work in the 1950s by Henry Scheffé (1953), John Tukey (1953/1994), and Olive Jean Dunn (1961) – statisticians have repeatedly pointed out the logically obvious fact that the probability of making a Type I error (mistakenly rejecting the null hypothesis) increases when multiple comparisons are made. And yet, studies in leading psychology and education journals commonly present dozens if not hundreds of comparisons of means, correlations, or other statistics, and then go on to claim that any statistic that has a probability of less than 0.05 is “significant”. However, as Gelman and Loken (2013) point out, even when researchers do not engage in such “fishing expeditions”, if decisions about the analysis are made after the data are collected – “hypothesizing after results are known” or “HARKing” (Kerr, 1998) – then the probability of Type 1 errors is increased. At each stage in the analysis, the researcher is presented with many choices – what Gelman and Loken call “the garden of forking paths” after a short story by Argentinian author Jorge Luis (Borges, 1941/1964) – that can profoundly influence the results obtained. Some of these, such as cleaning data, or eliminating outliers, seem innocent, but nevertheless, because these decisions are taken after the results are seen, they are inconsistent with the assumptions of nullhypothesis significance testing. Other, more egregious, examples include outcome switching, collecting additional data, or changing the analytical approach when the desired level of statistical significance is not reached. A good example of how these issues play out in practice is provided by Bokhove (2022) in his replication of a study on gender differences in computer literacy, where he found that different, reasonable, analytical choices lead to very different conclusions.
    We may already be in the era of ‘peak humanity’, a time where we have the greatest levels of education, reasoning, rationality, and creativity – spread out amongst the greatest number of us. A brilliant result of the massification of... more
    We may already be in the era of ‘peak humanity’, a time where we have the greatest levels of education, reasoning, rationality, and creativity – spread out amongst the greatest number of us. A brilliant result of the massification of universal basic education and the power of the university. But with the rapid advancement of Artificial Intelligence (AI) that can already replicate and even exceed many of our reasoning capabilities – there may soon be less incentive for us to learn and grow. The grave risk is that we then become de-educated and de-coupled from the driving seat to the future. In all the hype about AI, we need to properly assess these risks to collectively decide whether the AI upsides are worth it and whether we should ‘stick or twist’. This paper aims to catalyse the debate and reduce the probability that we sleepwalk to a destination that we don’t want and can’t reverse back out of. We also make 13 clear recommendations about how AI developments could be regulated - to slow things down a little and give time for informed choices about the best future for humanity. Those potential long-term futures include: (1) AI Curtailment; (2) Fake Work; (3) Transhumanism; and (4) Universal Basic Income – each with very different implications for the future of education.
    Mais ce n’est là qu’une partie du problème. Si l’on veut qu’elle soit vraiment efficace, l’évaluation devrait être également « formative ». En d’autres termes, il s’agit de recenser les besoins des élèves en matière d’apprentissage et de... more
    Mais ce n’est là qu’une partie du problème. Si l’on veut qu’elle soit vraiment efficace, l’évaluation devrait être également « formative ». En d’autres termes, il s’agit de recenser les besoins des élèves en matière d’apprentissage et de les satisfaire. Dans les salles de classes où l’on a recours à l’évaluation formative, les enseignants procèdent fréquemment à des évaluations interactives des acquis des élèves. Ils peuvent ainsi adapter leur enseignement pour répondre aux besoins de chaque élève, et pour permettre à tous les élèves d’atteindre des niveaux élevés. Certains enseignants font en outre participer activement les élèves à ce processus, ce qui les aide à développer des compétences pour faciliter leur apprentissage.
    Preface The ideas for Improving Practice contained in this book are underpinned by high quality research from the Teaching and Learning Research Programme (TLRP), the UK's largest ever coordinated investment in... more
    Preface The ideas for Improving Practice contained in this book are underpinned by high quality research from the Teaching and Learning Research Programme (TLRP), the UK's largest ever coordinated investment in education enquiry. Each suggestion has been tried and tested ...
    ... entrance – because of trying to have the advantages of constructed-response items, with a dash of school-based assessment thrown in ... replaced by a combination of external tasks taken when the teacher judges that children are ready... more
    ... entrance – because of trying to have the advantages of constructed-response items, with a dash of school-based assessment thrown in ... replaced by a combination of external tasks taken when the teacher judges that children are ready and classroom assessment carried out by ...
    ... His main research areas include policy and practice in assessment and information technology in education. Since 1990, he has been principal investi-vii Page 10. Assessment and Learning gator in over 20 large-and small-scale projects... more
    ... His main research areas include policy and practice in assessment and information technology in education. Since 1990, he has been principal investi-vii Page 10. Assessment and Learning gator in over 20 large-and small-scale projects involving over £1.6 million including ...
    Atlas d'anatomie humaine Tome 1 + Tome 2 (5° Éd.) Avec livret inséparable. Depuis plus d'un siècle et à travers ses nombreuses éditions successives, le SOBOTTA a su toujours s'améliorer et reste aujourd'hui la... more
    Atlas d'anatomie humaine Tome 1 + Tome 2 (5° Éd.) Avec livret inséparable. Depuis plus d'un siècle et à travers ses nombreuses éditions successives, le SOBOTTA a su toujours s'améliorer et reste aujourd'hui la référence mondiale en anatomie. La qualité exceptionnelle de son ...
    Open University Press McGraw-Hill Education McGraw-Hill House Shoppenhangers Road Maidenhead Berkshire England SL6 2QL email: enquiries@openup.co.uk world wide web: www.openup.co.uk and Two Penn Plaza, New York, NY 10121 - 2289, USA First... more
    Open University Press McGraw-Hill Education McGraw-Hill House Shoppenhangers Road Maidenhead Berkshire England SL6 2QL email: enquiries@openup.co.uk world wide web: www.openup.co.uk and Two Penn Plaza, New York, NY 10121 - 2289, USA First ...
    This paper suggests ways in which the tension between the summative and formative functions of assessment might be ameliorated. Following Messick, it is suggested that the consideration of social consequences is essential in the... more
    This paper suggests ways in which the tension between the summative and formative functions of assessment might be ameliorated. Following Messick, it is suggested that the consideration of social consequences is essential in the validation of assessments, and it is argued that most assessments are interpreted not with respect to norms and criteria, but by reference to constructs shared amongst communities of assessors. Formative assessment is defined as all those activities undertaken by teachers and learners which provide information to be used as feedback to modify the teaching and learning activities in which they are engaged, and is characterised by four elements: questioning, feedback, sharing quality criteria and student self-assessment. Assessment is then considered as a cycle of three phases (eliciting evidence, interpreting evidence, taking action), and ways in which the tensions between summative and formative functions of assessment can be ameliorated are considered for each of these phases.

    And 248 more