Evaluation

We will have two tracks of competition, each evaluated with a different goodness measure.

Track 1: Mean Average Precision (MAP)
Mean Average Precision is a popular evaluation criterion in many ranking problems. Your hypothesis g should provide the probability estimate that each enrollment would be a dropout. Then, we will take the average precision of your top-M estimates (with ties mysteriously broken). That is,

We specify M = 9000.

Track 2: Weighted Accuracy
Several students take multiple courses on the platform. Thus, we have more information about their behavior, which makes it easier to predict whether they will drop a course. We decide to make it harder for you by re-weighting those enrollments to be of less importance. Your hypothesis g should provide the binary (0 or 1) prediction on whether each enrollment will be a dropout. Then, we will take

Notice that we only use 50% of test data to do evaluation on the leader board. The final results will based on another 50%, so the final standing may be different from the leader board.