Saturday, May 04, 2013

Assessment in MOOCs

I was asked: "I was wondering how they might work with the Humanities, as I teach Seventeenth-Century Literature, Shakespeare and other related subjects, which require research papers and final examinations. I can see using MOOCs for people who simply have a (non-credit) interest in these subjects, but I can't see myself marking 5,000 term papers, and a similar number of exams. Multiple-choice evaluation, as in science, is easily taken care of electronically, but not in humanities. I am sure this looks like a naive question, but I think MOOCs are a wonderful idea for people who simply wish to enrich their knowledge, and would like to know a little more about them."

First of all, the MOOCs I have worked on have not focused on assessment - they have been courses, yes, with a small number (20 or so) taking them for credit, but the vast majority of participants auditing. So the question of marking term papers never came up. And like you, I would not contemplate multiple-choice exams in humanities and literature courses.

If you really need assessment, a few solutions have been proposed and, to a limited extend, tried out:

- automated essay assessment - this is not as far-fetched as it may seem, though it's not necessarily a cure-all. Automated essay assessment needs to be seeded with a large number of already-marked essays; on being given this seed, it extracts the properties of high-quality essays, and then matches new essays to those properties. There's a really good essay describing the process here: http://mfeldstein.com/si-ways-the-edx-announcement-gets-automated-essay-grading-wrong/  Here's another article: http://tlt.its.psu.edu/2013/04/12/mooc-moments-essay-grading-software/

- another form of automated assessment is based on task-completion or success-based metrics. The best example of this is Codeacademy. http://www.codecademy.com/#!/exercises/0 It's a bit like programmed learning http://www.gsis.kumamoto-u.ac.jp/en/opencourses/pf/3Block/07/07-2_text.html where people can be stepped through the material soliciting active learner responses; "The extent of a learner's understanding is ascertained from what is demonstrated in the responses." In many casdes, this can be supported through a form of self-assessment, using simple techniques such as flash cards and more complex techniques such as sample responses to questions; the participant can determine for themselves whether they passed and can move on.

- peer assessment - essays are graded not by professors but by other course participants. This would require that each essay be graded by a largeish number of other participants, otherwise, the grading would be no better than random. How large is enough? It might be too large, especially when you account for people who grade without reading, people who grade based on poor criteria, etc. Peer grading can work really well for blog posts and discussion lists, where it can be managed with a simple thumbs-up thumbs-down metric. Here's an example: http://www.nytimes.com/2012/11/20/education/colleges-turn-to-crowd-sourcing-courses.html?_r=0  And the fact of being graded by peers often spurs people to greater accomplishment. There's some discussion of peer grading here: http://degreeoffreedom.org/moocs-and-peer-grading-1/   And here's a critique of the technique: http://www.insidehighered.com/views/2013/03/05/essays-flaws-peer-grading-moocs

- network-based grading - in this model, individuals are not graded by means of grading individual pieces of work, but rather are graded according to network metrics; the idea is that quality work will produce quality network metrics. The model is not unlike that pioneered by Klout http://klout.com/home which counts the number of Twitter followers, Facebook likes, and similar indicators, to produce a single Klout score.

The problem with Klout is that it is simplistic and easily gamed. Nonetheless there is potential for a more fine-grained assessment to look at how ideas created by one person propagate through a network, to look at whether a person's reading recommendations have become influential, and similar less obvious measures. These can be pretty fine-grained, based on semantic analysis. Here's a simple example, of a Twitter scanner that looks for instances of bullying (obviously, something that would lower the person's score): http://phys.org/news/2012-08-machines-scour-twitter-bullying.html And here are some links to research by my colleagues at NRC on the analysis of sentiments and emotions in online postings: http://www.umiacs.umd.edu/~saif/

Each of the last two methods run some risks:

-  the "blind-leading-the-blind" phenomenon, whereby the collection of participants in a course can elevate myths about the subject matter to the status of fact (I remember instances from my childhood, where the position of "backcatcher" became a baseball position, and "touching iron" became a foul in basketful).

- the "charlatan" phenomenon - Students not already expert in a subject matter may mistakenly believe that one of their member is an expert.

For these reasons, I have always recommended that a MOOC seek to attract not only students and novices in a discipline, but also practitioners and experts in the discipline. Such people will quite rightly gain the greatest 'authority' according to peer or assessment measures, and as a result, their actions (such as relaying an idea, passing on a link, etc.) will gain more weight, shifting the outcome of peer or network based assessment to one based more on credibility.

The difficulty lies in attracting these people, who are often very busy, to a MOOC. This is one of the benefits of scale; a very large MOOC is more likely to attract experts (and the presence of experts is more likely to increase the size of the MOOC). But experts aren't likely to attend a carefully choreographed "Intro to Victorian Literature" course; it's all very old and familiar to them. The model of learning needs to changed to involved the experts.

This is what we attempted, with some success, in our connectivist MOOCs - rather than set it up as a series of lessons, we set it up as a series of discussions. The experts would participate at a high level, often interacting mostly with each other, while participants at other levels observed and were able to emulate this practice. Yes, we did provide scaffolding, to help the novices get into the flow of the discussion, but the scaffolding did not become the course.

Another model of MOOC addresses the issue by sharing the assessment.

The "distributed MOOC" is essentially a MOOC that is shared by a number of institutions (again, this was something we attempted in the earliest connectivist MOOCs). Today this is sometimes being called a 'wrapped MOOC'. The idea is that some of all of the MOOC contents are shared by members of classes from any number of institutions. Participants interact with each other, and follow online events and resources together. Each, though, is subject to individual assessment by their home institution, which may attach whatever rubric to the material they wish.

A final option is to bypass grading entirely, and let a person's outcomes stand on their own as evidence of accomplishment in the course. this is the objective of portfolio-based courses, more common in the arts and writing, but also increasingly popular in the sciences, and especially design, engineering and computing. The idea is essentially that a person presents an artifact that can be studied directly by potential employers. This artifact may or may not be subject to peer grading, which may produce a course score. But the course score is secondary to the artifact itself. Here's a quick guide to portfolio-bases assessment http://www.unm.edu/~devalenz/handouts/portfolio.html

One of the advantages of portfolio-based assessment is that it removes the ambiguity inherent in grades that result from tests and assignments. Here, for example, is an article describing how portfolio-bases assessment can help parents see directly how well their children are performing. http://www.earlychildhoodnews.com/earlychildhood/article_view.aspx?ArticleID=495 Portfolio-based assessment is often based on matching production to rubrics; this example http://www.ncbi.nlm.nih.gov/pubmed/17457074 demonstrates portfolio-based assessment in medicine.