A Python script for grading partial credit multiple choice: Part 4—The script

Here I thought I’d go through the script and explain its logic. The idea is that it could be modified to fit whatever format you get your MC output in or to change the information that it returns, hopefully even without a lot of programming experience.

Let me say at the outset that this was one of my first little Python projects. I make no claims that it’s particularly elegant or pythonic; in fact, I have plans for a newer version that will be a bit better organized. Criticisms always appreciated. That said, it does get the job done.

Edit 1/17/2015: The script is now posted in a GitHub repo. Future versions will be found there.

#! /usr/bin/env python3
"""A simple python script that grades MC output from Scantron forms.

Requires two files. (1) The grading key, a matrix with point values for each
answer deliminated by spaces, one row per question. Zeroes must be included. (2)
The raw output from IT. Correctly handles blank answers  ("."). The Scantron key
must be the first line (for verification), usually with uniqueID 00000000.
Optionally, a scramble file can be specified to allow multiple forms to be
handled. The scramble file is a simple matrix with columns of matching question
numbers. It does have to include a column for Form 1 if it is being used.

"""
import numpy as np

# Top and bottom cutoffs for analysis of answers, as fraction. IT uses 27% for
# some reason.
ANALYSIS_THRESHOLD = 0.27

“ANALYSIS_THRESHOLD” is the only constant in this script. It’s the fraction of students at the top and bottom of the class that’s used when analyzing the results. 27% was the default used in the output here at Miami.

def generate_responses_array(answers):
    """Takes the raw answers from the output and returns the student solutions
    as an array with "1" indicating the selection.

    Args:
        answers (string): A string in the format "010123120...".

    Returns: 
        An array of solutions, with "1" indicating the selected answer. For
        example, a row of [0,1,0,0,0] indicates an answer of B. This array
        yields the total score when multiplied by the key and summed.

    """
    responses = []
    for qnum in range(len(answers)):
        response = [0 for n in range(5)]
        # Check if blank response, indicated by ".". If not, change list
        # to 1.
        if answers[qnum] != ".":
            response[int(answers[qnum])] = 1
        responses.append(response)
    return np.array(responses)

This function actually converts the raw MC responses—which are just a string of format “01410120…” with 0 representing “A” and 4 representing “E”—into an array that can actually be scored by the key. I chose to use NumPy arrays for the responses data because they’re just easier to manipulate (see below). The array has one row per question, with five entries per row. The chosen entry is indicated by a “1”, the blanks are “0”s.

In principle, this format could be adapted to allow multiple selections per question, which I think is an interesting possibility. Of course, the whole input format would have to be changed.

def descramble(responses, formnum, scramble):
    """Unscrambles responses depending on their form.

    Args:
        responses (numpy.array): Array of responses generated by the
            generate_responses_array function.

        formnum (int): The form number for the student.

        scramble (numpy.array): The scramble used for the test. Of form
            scramble[0] = [1, 2, 3, ...] for form 1 (note index difference),
            scramble[n] = [2, 1, 3, ...] for form n+1.

    Returns:
        An array of responses that has been sorted according to the scramble
        array, so that everyone can be graded with the same key.

    """
    descrambled_responses = []
    for n in scramble[formnum]:
        descrambled_responses.append(responses[n-1])

    return np.array(descrambled_responses)

This function descrambles a set of responses according to the form number. Fairly self-explanatory and short.

The following function carries out most of the operations. Stylistically, I think this could stand to be divided up a bit more in the next version.

def main(key_file_name, answers_file_name, title="Graded Exam", 
         scramble_file_name=None):
    """Processes raw Scantron output and returns the grades and statistics.

    Args:
        key_file_name (string): Name of the file with the key. The key is a
            matrix in the format [ 0 1 0 0 0; 0 0 1 0 0; ... ] where the first
            row would award 1 point for B. Partial credit and double answers are
            fine.

        answers_file_name (string): Name of the file containing the raw Scantron
            output.

        title="Graded Exam" (string): Title to be printed at the top of the 
            output.

        scramble_file_name=None (string): Optional filename of file containing
            the scramble. If not included, everyone is graded as the same form
            as laid out in the key. Format is a matrix in the format [ 1 2; 2 3;
            3 1; ... ] where form 2 would have questions 1, 2, 3 from form 1 as
            numbers 2, 3, 1.

    Returns:
        A string containing formatted output with exam statistics and student
        scores.

    """

    # Load the key. Result is a numpy array.
    with open(key_file_name) as key_file:
        ans_key = np.loadtxt(key_file)

    num_questions = len(ans_key)

    # Load and process the scramble file, if available. Transposes so that
    # scramble[0] returns an array of questions for form 1, scramble[2] returns
    # array of questions for form 2, etc.
    if scramble_file_name:
        with open(scramble_file_name) as scramble_file:
            scramble = np.loadtxt(scramble_file).transpose()

    # Load the student info. Characters 0-7 in the input file are the student's
    # uniqueID. Character 9 is the form number. Characters 10-? are the recorded
    # answers. Descrambles the responses. For student n, students[n]['name'] is
    # the student's name, students[n]['responses'] is the set of responses, as
    # an array. Only collects the responses portion of the string (up to
    # 10+num_questions), because the Scantron system can append extra
    # characters.
    students = []
    with open(answers_file_name) as answers_file:
        for line in answers_file:
            uniqueid = line[0:8]
            if line[9] == " " or line[9] == ".":
                form_num = 0
            else:
                form_num = int(line[9]) - 1
            responses = generate_responses_array(line[10:10 + 
                        num_questions].replace("\n", ""))
            if scramble_file_name:
                responses = descramble(responses, form_num, scramble)
            students.append({'name': uniqueid, 'responses': responses})

The section immediately above is the part that will need to be modified if you’re using an input source that follows a different format. Here, each student is represented by a single line in the raw scanned output. The first 8 characters (0–7) are the student’s uniqueID (basically their e-mail address). There’s a space, then the 10th character (9) is the form number (1–4). The remaining characters are the actual selected responses. If you’re using an input file with a different, but similar, format, you may just have to make some small tweaks to which character is what.

The responses are descrambled by form as soon as they are read, and finally the student info is stored in “students”, a list of dictionaries. In future versions I plan to change this to a list of student objects. For now, “students[n][‘name’]” will give the name of student n, and “students[n][‘responses’] will return their array of responses.

    num_students = len(students[1:])
    num_students_analysis = round(ANALYSIS_THRESHOLD * num_students)

    # Actually determines score for each student. Multiplies sets of responses
    # by the key, then sums over whole array. Score is stored as
    # students[n]['score']
    for stu_num in range(len(students)):
        students[stu_num]['score'] = (students[stu_num]['responses'] * 
                                      ans_key).sum()

The actual grading of each student is fairly simple. Multiplying two NumPy arrays using the “*” operator returns the Hadamard product, which is to say each element of array 1 multiplied by the equivalent element of array 2. With the responses and key arrays formatted as they are, a student’s score can be easily obtained by simply summing the result of this operation.

I think an interesting variation on this would be to introduce dependencies between answers. For example, make the correct response to question n dependent on the response to question n−1. You could set up questions in which students need to rationalize their results, for example.

    # Generates a new array, students_sorted_grade, that is just sorted by
    # grades. Does not include key (students[0]).
    students_sorted_grade = sorted(students[1:], key=lambda s: s['score'], 
                                   reverse=True)

    # Determines number of each response, by question, for all students, and for
    # the top and bottom students in the class. Values are given as fractions.
    all_answers_frac = (sum(n['responses'] for n in students_sorted_grade[:]) 
                        / num_students)
    top_answers_frac = (sum(n['responses'] for n in 
                        students_sorted_grade[:num_students_analysis]) 
                        / num_students_analysis)
    bot_answers_frac = (sum(n['responses'] for n in 
                        students_sorted_grade[-num_students_analysis:]) 
                        / num_students_analysis)

This section analyses the student success rates for the class as a whole, the top scoring students, and the low scoring students. The logic is just based on the materials we get from Miami’s Test Scoring service, but modified to include each of the five options. Very handy for judging whether a question was fair or not (since you’d expect the top students to do substantially better than the bottom students).

    # List of all grades. Students only (key not included).
    all_grades = [n['score'] for n in students[1:]]

    # The score for the Scantron key, as a check to make sure it was assessed
    # properly.
    max_score = students[0]['score']
    print("\nCheck: the Scantron key (uniqueID = {}) scores {:.2f}.\n".format(
          students[0]['name'], max_score))

I added this section because I invariable screw up when creating the key file. If the key form doesn’t get a perfect score, there’s obviously something wrong. Outputs to stdout, not to the actual saved file.

    # Variable output_text is the actual textual output of the function.
    output_text = ""

    output_text += "{}\n".format(title)
    output_text += "{}\n\n".format("=" * len(title))

    # The overall averages, max, and min.
    output_text += "   Overall average: {:.2f} out of {:.2f} points \
        ({:.2%})\n".format(np.mean(all_grades), 
                    max_score, np.mean(all_grades) / max_score)
    output_text += "Standard deviation: {:.2f}\n".format(np.std(all_grades))
    output_text += "              High: {}\n".format(max(all_grades))
    output_text += "               Low: {}\n".format(min(all_grades))
    output_text += "\n"

    # Breakdown by question. Includes both average score for the question, the
    # overall performance, and the performance of the strongest and weakest
    # students.
    output_text += "Average Scores by Question\n"
    output_text += "{}\n\n".format("-" * 26)

    for n in range(num_questions):
        output_text += "{:3}: {:.2f}         Key: ".format(n + 1, 
            sum(all_answers_frac[n] * ans_key[n]))
        for m in range(len(ans_key[n])):
            if ans_key[n][m] != 0:
                output_text += "{:6.1f}  ".format(ans_key[n][m])
            else:
                output_text += "    -   "
        output_text += "\n            Frequency:  "
        for m in range(len(all_answers_frac[n])):
            output_text += "{:5.1f}   ".format(all_answers_frac[n][m] * 100)
        output_text += "(%)\n              Top {:2.0f}%:  ".format(
            ANALYSIS_THRESHOLD * 100)
        for m in range(len(top_answers_frac[n])):
            output_text += "{:5.1f}   ".format(top_answers_frac[n][m] * 100)
        output_text += "\n              Bot {:2.0f}%:  ".format(
            ANALYSIS_THRESHOLD * 100)
        for m in range(len(bot_answers_frac[n])):
            output_text += "{:5.1f}   ".format(bot_answers_frac[n][m] * 100)
        output_text += "\n\n"

    # Actual student scores.
    students_sorted_name = sorted(students[1:], key=lambda s: s['name'])
    output_text += "\nStudent Scores\n"
    output_text += "{}\n".format("-" * 14)
    for student in students_sorted_name:
        output_text += "{:8}\t{:.1f}\n".format(
            student['name'], student['score'])

    return(output_text)

This section creates the actual output that is saved to the file, stored in the variable “output_text”. Should be pretty self-explanatory, but obviously this would be the place to add any additional output that you’d want.

The biggest challenge is getting everything lined up properly. Incidentally, the output file is best viewed using a fixed-width font, if you’re not already (like Courier or Menlo).

if __name__ == '__main__':
    key_file = input("Filename for key: ")
    raw_file = input("Filename for student responses: ")
    is_scrambled = input("Are multiple forms used (y/n)? ")
    if is_scrambled in ["y", "Y", "yes", "YES", "Yes"]:
        scramble_file = input("Filename for scramble: ")
    else:
        scramble_file = None
    title = input("Title for results: ")
    output_filename = input("Output filename (blank for output to terminal): ")
    output = main(key_file, raw_file, title, scramble_file)
    if output_filename:
        with open(output_filename, 'w') as output_file:
            output_file.write(output)
    else:
        print(output)

The rest is just the main part of the program that collects the filenames and title from the user and creates the output file.

So there you have it. Questions? Feel free to leave a comment or drop me a line.

Using technology to teach and do chemistry

A Python script for grading partial credit multiple choice: Part 4—The script

Leave a Reply