CMU Final Exam Scheduling, Slides of Statistics

'-CMU Registration Office. Page 23. 23. Appendix. A. Initial Class Schedule. Course Size Slot Room#. 48250. 020. 01. 01. 48315.

Typology: Slides

2022/2023

Uploaded on 02/28/2023

gaurish
gaurish 🇺🇸

4.7

(15)

235 documents

1 / 55

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
CMU Final Exam Scheduling
Operation Research II Project
By Zeheng Xu, Yige Zhang, Zhehe Qiao, Samuel Jia
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37

Partial preview of the text

Download CMU Final Exam Scheduling and more Slides Statistics in PDF only on Docsity!

CMU Final Exam Scheduling

Operation Research II Project

By Zeheng Xu, Yige Zhang, Zhehe Qiao, Samuel Jia

Table of Content

  • Abstract……………………………………………………………………………..
  • Introduction ………………………………………………………………………..
  • Assumptions………………………………………………………………………..
  • Formulation……………………………………………..………………………….
  • Algorithm……………………………………………..…………………………...
  • Implementation ……………………………………………..…………………….
  • Conclusion……………………………………………..………………………….
  • Reference……………………………………………..…………………………...
  • Appendix……………………………………………..……………………………………..

Introduction

Exam scheduling is a challenging task that universities and colleges face several times every year. There is a moderate possibility that students will deal with final exam conflicts at some points in their college life. In this project, we aim to explore the final exam scheduling at CMU for 2017 Fall by generating a possible optimal exam schedule that minimizes the conflicts. Given a large amount of information(students, classroom availability, etc.) in a short period, exam scheduling problem can be an NP-hard problem. The integer program we built is based on several assumptions on inputs to construct the linear integer programming. We constructed reasonable constraints to this problem that allow us to solve this problem with mathematical tools we learned in class. Without loss of generality and be as realistic as possible, the model is built on information of Carnegie Mellon University. We generate data on exams each student should take, size of classrooms, exams and corresponding classrooms. With the adoption of python programming, we are able to implement different methods of scheduling final exams. After trying various algorithms and methods of implementation, we are able to reduce the percentage of conflicts from 11.16% to 6.67%.

We also recognized some potential field that could be further improved if given more time. First of all, we should spread out students’ exams to provide students with more review time for each exam. Secondly, if we take into consideration the fact that Students within each department tend to take similar courses, conflicts can be further reduced. Finally, the capacity of each classroom could be more precise comparing to having three big categories we had for integer programming.

2. Students

From basic statistics of CMU, we set the number of undergraduate students to be 6100 students. We assume that no one will overload and thus each student will take less than 5 exams. Furthermore, we are not dividing students into different departments. Since we are unable to retrieve the real data on what classes each student is taking, we randomly assign no more than 5 classes for each student to take.

3. Exams

From CMU 2017 Fall final exam information we found online, we set the number of exams to be 465 exams for the semester. Since the exams are randomly taken by students, we expect the number of conflicts will arise from this assumption. In addition, we use real data of max enrollment on those 465 exams as our bar for maximum number of students that can be enrolled for each exam.

4. Classrooms

From public published data and resources we’ve found on past final exam classrooms, we decide there will be 93 classrooms available during the final exam period. Since each exam has difference enrollment, we also divide the classrooms into three categories based on capacity: ○ < 50 people: 70 ○ 50 - 100 people: 10

○ > 100 people: 13

Exams taken by a large amount of students (e.g. more than 100 students) will only be taken in large classrooms. So one exam will not be broken into several smaller classrooms.

Objective Function

1. Variables

k: classroom 0 < k ≤ 93 i: student 0 < i ≤ 6100 j: course number 0 < j ≤ 465 t: time slots 0 < t ≤ 18 Ej: capacity of course j Ck: max capacity of classroom k Ai,j = 1, if student i takes exam j 0, otherwise Pj,k,t = 1, if exam j takes place in classroom k at time t 0, otherwise Xi,j,k,t = 1, if student i takes exam j in classroom k at time t 0, otherwise We aim to define the most important variables: index of classrooms, index of students, index of course number and index of time slots. We also listed the capacity of course j and max capacity of classroom k for the ease of future expression. In addition, we define 3 conditional variables. Ai,j stands for the possibility that student i takes exam j. Pj,k,t stands for the possibility that exam j takes place in classroom k at time

t. Xi,j,k,t stands for the possibility of student i takes exam j in classroom k at time t. With all the variables we have defined, we will move into constructing objective function.

2. Objective function -- Conflict

As mentioned previously, the first conflict is that one student takes multiple exams at the same time slot, which is the most common conflict. Xi,j,k,t undertakes students, exams, classrooms and times, while 0 < i ≤ 6100, 0 < j ≤ 465, 0 < k ≤ 93, 0 < t ≤ 18 being the individual bounds of variables. We sum Xi,j,k,t over j (course) and k (classroom) to get the number of exams each student has for at each time slot, noted by Yi,t. We further define a maximum relationship between Yi,t - 1 and 0 by adopting another variable, namely Zi,t. Zi,t takes the larger of 0 and (Yi,t - 1). Given the conflict of each student at each time slot, we them sum up Zi,t over i (students) and t (times slots) to get the total number of conflicts of students having multiple exams at the same time slot.

Conflict #1:

Ri, on the contrary, is introduced to compute the minimum between 1 and Vi. If Ri takes the value 1, it means student i does have conflict #2. Finally, we sum Ri over i to get the total number of conflicts of students having multiple exams within 24 hours.

Conflict #2:

In all, we have the objective function:

Constraints

Constraint

We define the first constraint to be at time t, the total number of exams should not exceed total number of classrooms. Pj,k,t , as defined before, takes the value 1 if exam j takes place in classroom k at time t. We sum Pj,k,t over j and k to get the total number of exams happened at time t. The sum should never exceed the maximum number of classrooms, 93.

Constraint

We define the second constraint to be at time t, in classroom k, there’s at most one exam. In this way, for all k, t, we sum over j (taking the value 1 to 465) will get the number of exams happened at time t and classroom 1. The sum should never exceed 1, which means we could only have a classroom with one exam or no exam.

Algorithms

Initial schedule 1: In this schedule, we assume that largest-size courses will most likely cause conflicts. The idea is that we first assign time slots to largest classes to avoid conflicts, then fit smaller classes in the rest of time slots and classrooms. The specific steps are as following: Fit all courses with over 100 students in classrooms that can accommodate over 100 students first. Then we fit courses of size 50-100 students, then courses with less than 50 students. It turns out that we only used 17 time-slots. The total number of conflicts is 3349. Improvement 1: Since we only used first 17 time-slots, we can fit classes with most conflicts to the last slot without causing further conflicts. On average, each day should have 26 classes. So we choose the 26 classes that create the most conflicts and then put them to time slot 18. The result is 2933 conflicts in total. Improvement 2: The problem with this algorithm is that if we choose two classes in the same time slot, their conflicts will remain if we put those two classes together to time slot 18. On the second try, for each time slot, we pick up the class with most conflicts and put them to the 18th time slot. And this leads us to a more optimal result. Consequently, we end up with 2729 conflicts in total. Initial Schedule 2:

For the trials on initial schedule 1, we found that for schedules with classes arranged more evenly, there are fewer conflicts. Then our 2nd initial schedule is to assign classes to time slots randomly and uniformly. To our surprise, we only have 2504 conflicts for this schedule. Improvement 1(Swap): Since classes should always be put evenly over the 18 time slots, we choose to swap classes that causes the most schedules. The algorithm is to swap classes that cause the most conflicts and the second most conflicts. After getting an improved schedule, we apply this algorithm again till cannot proceed. The final result is 2349, which is a much better result than the previous trials. Improvement 2: We checked the deadlock causing this issue. The problem is that in the last result schedule that causes the deadlock, after swapping classes, we will have a resulting schedule with the same two classes that are causing the most and the second most conflicts. Then we are trapped in an infinite loop. To get out of this infinite loop, we choose the class that causes the most number of conflicts to swap with the max-conflict class. With this algorithm, we get 2051 conflicts and go to some infinite loop again. Improvement 3: The problem is almost the same. Although we are guaranteed that we would not have two same swap courses over and over again, we would face a problem of having the same three schedules going to infinite loop. Our solution is to keep a checker out of the loop, and every time we found infinite loop on three schedules, we swap the second and the third classes that are causing the

Implementation

Given the large amount of variables and inputs involved in our linear programming problem, we could not fully execute it as a math problem. We seek the help of python for sample scheduling. For the sake of clearness, we will use the same convention for variables to represent number of students, number of classrooms, etc., as discussed in Objective Function section.

  1. Data Structure We represent each student as a list of Boolean values. Then, we store all the lists that represent students in a higher level list. For each list S[i], S[i] = [id, C, b1, b2, b3…, b465]. Id is student’s index in the table. C = total number of classes that the student is taking. For b1, b2...b465, each bi = True if student is taking class i, and false otherwise. The total length of 2D list is 6100, as the number of students. For schedules, we have another 2D list G. Class i’s information is stored in G[i], respectively. G[i] = [class ID, number of students, time slot, i]. Time slot is a number from 0 to 18. Class ID is the class number, such as 21393 for OR2.
  2. Data Generation Since students’ information is not accessible, we used functions to randomly assign classes to each student according to available information about courses. The algorithm is to go through each class and in list G. In each iteration i, randomly choose b students

with classes less than 5, where b is the total number of students taking class G[i]. Then mark S[d][i] with false. Notice S[d] is student’s b class information. As a result, we have a quite randomly distributed data set for students’ class information. As for late use in computation, the student’s information is stored in another list S’, where S’[d] represent dth student. The first 2 entries are kept the same. For class taken, instead of Booleans, we replace them with class indexes in C. For example, if student 4 is taking 5th class, 10th class, 13th class, 56th class, his list in S’ will be [4, 4, 5, 10, 13, 56].

  1. Find conflicts and Change Schedule This data structure makes it relatively simple to find the total number of conflicts for given schedule. To compute total conflicts, we simply go through the list S’ for each student. On ith iteration, we will access class by class information stored in S’[i]. Referring back to example of student 4: the time slots of his classes are: H = G[S’[4][3]][2], G[S’[4][4]][2], G[S’[4][5]][2], G[S’[4][6]][2] Note that G[S’[4][3]] is 5, which is class index in G. and G[5][2] is its time slot. Then sort the list H by time slot. We increment total number of conflicts by 1 if there is a equal time slot in class j, class j+1 in sorted student’s classes, since conflicts can only happen between continuous classes after sorting. As for the use of algorithms, it is usually need to keep track of how many conflicts each class is causing. Then we just have a separate list of length 465 and adding conflicts respectively to each index of each class. In order to change schedule, we will just change time slot numbering on schedule list G. If the size need to be changed, we assign new classroom as well.