Skip to content

Python autograder tips and tricks

Neal Terrell edited this page Aug 29, 2021 · 5 revisions

PL's Python autograder works really well in two types of problems:

  1. The student writes a function whose return value can be tested.
  2. The student writes a program that sets a global variable to a value, and that value can be tested.

Where the system struggles:

  • Programs using input() will hang because there is no user to provide input.
  • Programs that use print() will print to stdout, which cannot be tested using PL's methods.
  • Programs that set a global variable, but whose logic needs to be tested with multiple iterations/inputs to the program. Since PL loads the student's Python only file once, and global code executes immediately when being loaded, we can't run the code multiple times with different inputs to test the implementation.

Luckily all three problems can be addressed.

Patching input()

Two ways!

Ugly, but flexible

This method works best for a program without functions, where only a single test case is run. (The alternative method below, using @patch, will not work because the student's code will be loaded and executed before we actually enter any of the test case functions where a patch might occur.)

import builtins

def my_input(message="", **kwargs):
    return 0 # return whatever you want here

builtins.input = my_input

Any call to input() in student code will call my_input, which can be made as sophisticated as you like. One helpful trick is for the setup() function in server.py to set a data["params"]["test_inputs"] variable to a list of inputs to use to test the student's code, which could be generated based on the random parameters for the question variant. We can then return those one at a time from my_input:

# THIS IS IN "setup_code.py"
import builtins

tests = data["params"]["test_inputs"][:] # clone the test cases list
def my_input(message="", **kwargs):
    val = tests.pop(0)
    return val

builtins.input = my_input

Much prettier, but only when the student writes a function

This method only works if the student has written a function that you will test (or you faked a function using the other techniques in this document).

Python's unittest.mock library lets us provide an alternate source of input for a program, by specifying a string to use as a keyboard buffer. We can incorporate this into a unit test quite easily:

from io import StringIO
from unittest.mock import patch

@points(1)
def test_1(self):
    with patch('sys.stdin', StringIO("123\n456")):
        # '123' will be the first string returned from input(); then '456'.
        actual = # call the student function, either directly as self.st.function_name(....), or with Feedback.call_user

        # NOTE: both the student and the reference programs will share the patched input. So if your reference answer also needs input(),
        # you will have to double-up the values in the patched StringIO.
        expected = # call self.ref.function_name or otherwise compute the expected answer

        if actual == expected:
            Feedback.set_score(1)
        else:
            Feedback.set_score(0)
            # I like to give this kind of feedback, assuming you set 'value' to the input that was provided.
            Feedback.add_feedback(f"When the input is {value}, we expect x to be set to {expected}, but you set it to {actual}.")

Because this function is part of a test.py PLTestCase class, it has access to the self.ref object which can capture parameters passed from server.py in data['params'][___]. That means the test case inputs can depend on the random parameters of the question variant. Example:

from io import StringIO
from unittest.mock import patch

@points(1)
def test_1(self):
    value = self.ref.lower_bound - 1 # data['params']['lower_bound'] is random generated in server.py, and captured in setup_code.py. 
    with patch('sys.stdin', StringIO(f"{value}\n{value}")): # use the value twice as input, so the reference answer gets it too.
        # the rest as before...

Patching print()

Also two ways!

Ugly, but flexible

Use builtins to override the print() function, so it appends to a global string instead of stdout. Just like when patching input, this works best for a question that doesn't define functions and only needs to be run once.

# in setup_code.py

import builtins
output = ""
def my_print(*args, **kwargs):
    global output
    end = kwargs.get("end", "\n")
    s = '' if len(args) == 0 else args[0]
    output += s + end

builtins.print = my_print

If you add output to names_for_user in server.py, you can then directly access self.st.output and self.ref.output to compare them in a unit test.

Prettier, with @patch

Similar to input, we can patch stdout to capture printed output:

from io import StringIO
from unittest.mock import patch

def test_1(self):
    with patch('sys.stdout', new_callable=StringIO) as buffer:
        # call student code, which uses print()
        # grab the output
        st_output = buffer.getvalue()

    # do the same, but with the reference.
    with patch('sys.stdout', new_callable=StringIO) as buffer:
        # ...
        ref_output = buffer.getvalue()

    # use one or both strings to determine success.

Tip: are you struggling to get students to stop printing in their functions? Patch their output and fail a test case if the buffer's len is not 0!

Running function-less code more than once

To address the problem that global statements are executed immediately when loaded into a Python test, we can silently place the student's code into a secret function. Suppose a question requires a student to assign a value to the global variable x, and functions are not being used. We can modify the submitted file so the contents are indented one level:

# in server.py
  
def base64_encode_string(s):
    # do some wonky encode/decode because base64 expects a bytes object
    return base64.b64encode(s.encode('utf-8')).decode('utf-8')

def base64_decode_bytes(b):
    return base64.b64decode(b).decode('utf-8')

def parse(data):
    # The pl-file-editor and pl-order-blocks elements will have already constructed a b64-encoded byte array of the user's program.
    # We have to decode that, split up the lines, then insert an indent before each line.
    files = data['submitted_answers'].get('_files', None)
    if files:
        answer_str = base64_decode_bytes(files[0]['contents'])
        answer = ['']
        answer.extend(answer_str.split('\n'))
        answer.append('')
        answer = '\n    '.join(answer) + '\n' # add 4 spaces in front and a newline at the end of each line
        # ship the answer off to the autograder
        data['submitted_answers']['_files'] = [
            {
                'name': 'user_code.py',
                'contents': base64_encode_string(answer)
            }
        ]

Then we can use leading_code.py to silently insert a function definition:

# in leading_code.py
def answer():  # name this whatever you want

and trailing_code.py to return the variable the student's code has set:

# in trailing_code.py
    return x  # the indentation must match how the student code was indented.

The function answer can now be called in a unit test to validate the student's code.