Data

Download data: ML_final_release.tar.bz2

Data Description

There are several files which contain all the information we have about these logs:

enrollment_train/test.csv: match the enrollment id to student and course.

enrollment_id: enrollment ID
username: student ID
course_id: course ID

log_train/test.csv: Logs for each enrollment.

enrollment_id: enrollment ID
time: the time of the event
event_source: event source (server or browser)
object: the object related to the event (see object.csv)
event_type: the type of the event

problem: operations on course problems
video: operation on course videos
access: accessing other courseware objects
wiki: accessing the course wiki
discussion: accessing the course forum
navigate: navigating to other part of the course
page_close: close the web page

object.csv: Contain information about courses. Each course is represented as a tree of modules. For instance, a course contains multiple chapter modules, a chapter contains sequentials, and a sequential contains verticals and videos.

course_id: the course to which the module belongs
module_id: the ID of a courseware module
category: the category of the courseware module
children: the children modules’ id of the courseware module
start: the time that the module was released to students

sampleSubmission.csv: The required submission file should be a 24109×2 matrix, with no header or other information, like this file. The first column should be the enrollment ID, and the second column is your prediction (float or 0/1). The two columns should be split by a comma. An error will be reported if a submission file is of a wrong format.

For your convenience, the TAs have kindly provided you some basic features which can be directly used in training your models. The features are extracted from within log_train.csv. But of course, the basic features are not what you should be satisfied with—feature engineering is also an important issue when solving real world problems. Thus, you are highly encouraged to conduct your own feature extraction in order to get better performance.

sample_train/test_x.csv

ID: enrollment ID
user_log_num: total number of logs of the user (student) in all the courses
course_log_num: total number of logs belongs to the course
take_course_num: number of courses the user takes
take_user_num: number of users who take the course
log_num: number of logs belongs to the enrollment
(event_source)-(event_type): 9 dimensions, number of logs with different event_sources and event_types (refer to log_train/test.csv)
(chapter/sequentail/video)_count: 3 dimensions, number of logs with certain objects