Although I work for a mostly remote work company, a couple of times a year we meet together for a week in some nice location of the world. When that happens, multiple meetings across people in different teams are organized. Sometimes the number of requested sessions gets out of hands and the scheduling team is overwhelmed. Quoting them in one of our latest gatherings: “We currently have over 160 proposed breakout sessions which will be impossible to schedule unless we work until midnight every day! We only have 3-4 hours available for breakout sessions each day.”
The problem with scheduling these meetings is that you get a mix of people from different teams so you have incompatibilities between meetings, and some persons are invited to many meetings and it becomes hard to solve all of these conflicts so all the invited people can join their meetings. So, it seemed an interesting problem and I wondered if I could find a way to solve this in an automated way.
The way I thought about the problem was that I needed to assign meetings to time slots, minimizing the number of these slots while at the same time making sure that meetings at the same time slot were compatible.
The first approach one can think about is simple brute force: go across all possible options and select one that produces the minimal number of time slots (note that it is possible that there is more than one optimal solution). To find all combinatorial options, first we can create all possible sets containing a given number of meetings. If we have \(N\) meetings, we can create all the \(K\) possible sets of meetings by first making sets that contain only one meeting, which is \({N\choose 1}\), then creating all possible sets that contain two meetings, which is \({N\choose 2}\), and so on, having finally \(K\) sets that is
\[K=\sum_{i=1}^N {N\choose i}\]
These sets could represent meetings that would happen at the same time slot. Of them, only the ones containing compatible meetings will be valid. Compatibility is easy to find out: if no meeting invitees for two meetings are shared, those meetings are compatible. We will call the number of valid meeting sets \(K_c\). Once we have them we can create other sets that contain, similarly, all possible combinations of these meeting sets. We can call them time slot sets. All possible combinations would give us the number \(L\):
\[L=\sum_{i=1}^{K_c} {K_c\choose i}\]
Each of these time slot sets represent possible solutions to the problem, being the cardinal of the set the number of time slots that would be needed. However, not all combinations of time slot sets will be valid solutions: only the ones where no meeting is repeated across its meeting sets while at the same time all meetings are present will be valid. Among those, the combinations that have a minimal cardinal will be optimal solutions we want to find.
This is a correct way of finding a solution, however the number of combinations that we need to explore when we have more than just a few meetings can be daunting. Assuming a very optimistically low \(10\%\) of valid sets of meetings, so \(K_c=0.1 K\), we would have for, say, number of meetings \(N=5, 8\), and \(10\) a number of sets of time slots \(L=7, 3.36 \cdot 10^7\), and \( 5.07 \cdot 10^{30}\). We can see how the numbers grow extremely fast, and that searching through all of them would take a lot of time and computer memory and become unfeasible very quickly.
So, we really need something better than pure combinatorics. Something that can help with this sort of problem is to try to visualize it to obtain some insight on how it could be solved, and I followed that route. In this case we can use non-directed graphs in a straightforward way: we can make each meeting a node, and connect with edges compatible meetings. I will explain this with an example. Let’s say that we have 8 people to which we will assign letters A to H and they attend 8 different meeting with different attendees. Attendees per meeting are as shown in the following table:
Meeting 1 | A, E |
Meeting 2 | B, F |
Meeting 3 | C, G |
Meeting 4 | D, H |
Meeting 5 | B, C, D |
Meeting 6 | A, C, D |
Meeting 7 | A, B, D |
Meeting 8 | A, B, C |
From the table we can calculate the adjacency matrix of the graph, where each row will show the compatibility between a meeting and the others, having a \(1\) if compatible or \(0\) if incompatible. We will obtain a symmetric matrix. In our example:
\[A=\begin{bmatrix}0 & 1 & 1 & 1 & 1 & 0 & 0 & 0 \\
1 & 0 & 1 & 1 & 0 & 1 & 0 & 0 \\
1 & 1 & 0 & 1 & 0 & 0 & 1 & 0 \\
1 & 1 & 1 & 0 & 0 & 0 & 0 & 1 \\
1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 & 0 & 0 & 0 & 0
\end{bmatrix}\]
For instance, from the first row we see that meeting \(1\) is compatible with meetings \(2, 3, 4\) and \(5\). The graph that would result from here is:
The problem is transformed now into trying to find out how to group nodes that are one edge away of each other in sets, in a way that minimizes the number of sets while all nodes are included in some set and there is no overlap across sets. These groups of nodes would be in a single time slot of the schedule, and we will call such sets simply “slots”. Possible slots for this example would be \(\{1,5\}\), \(\{1,2\}\), or \(\{1,2,4\}\), but not, for instance, \(\{1,2,5\}\) as \(2\) and \(5\) are incompatible.
Different strategies con be tried to create such sets. The one that gave me best results was what I called a “removal strategy”. We start with each node being assigned to a different slot, and then we try to remove the maximum number of slots. We do that by iterating on the slots and checking if it is possible to move all nodes inside a slot to other already existing slots. If that is the case, we assign the nodes to those slots and remove the slot that contained them. The iterations will repeat until there is a point in which it is not possible to remove more slots. In the previous example, a possible evolution would be (each color represents a different time slot):
1. Each node gets assigned its own slot, that we will designate by colors in the image. |
2. We look at the “blue” slot. We find that its only node (1) can be assigned to other slots (“red”, “green”, “beige”, or “violet”). Arbitrarily, we assign node 1 to the “red” time slot and remove “blue”. |
3. We look at the “beige” slot. We find that its only node (4) can be assigned to other slots (“red”, “green”, or “brown”). Arbitrarily, we assign node 4 to the “green” slot and remove “beige”. We then look at “violet”, but it is not removable because of the only connected slot, “red”, one of its nodes (2) is not compatible with 5. We go through “orange”, “yellow”, and “brown”, but we are in similar situations and they cannot be removed. |
4. We look now at “red”. We see that its two nodes can be assigned to other slots (1 can be assigned to “violet” or “green”, 2 can be assigned to “orange” or “green”). Arbitrarily, we assign 1 to “violet” and 2 to “orange”, and we remove “red”. |
5. We look now at “green”. We see that its two nodes can be assigned to other time slots. Arbitrarily, we assign 3 to “yellow” and 4 to “brown”, and we remove “green”. We start another iteration and look at the four remaining time slots, but we find that none of them can be removed as we cannot assign all its nodes to alternative slots. Therefore, the algorithm finishes at this point. We went down from 8 to 4 slots, each of one with two meetings. |
We have find a solution in just two iterations, instead of having to go through millions of combinations!
I went forward with this idea and implemented it in go. The implementation is not complicated and can be found in this github repository, which also contains some instructions on how to run it. The program accepts input in CSV format and outputs CSV as well, so the data can be easily exported/imported from any spreadsheet. More details on the format can be found in the README.md file.
The complexity of the algorithm is \(O(N^2)\) because of the calculation of the adjacency matrix, but the maximum number of iterations of the removal strategy is just \(N\). Convergence is assured as you obviously cannot remove more than \(N-1\) slots. However, the algorithm does not always find the optimal solution, as we can find counter examples for this. For instance, if by using the removal strategy we end up with the following graph (again colors represent slots):
We can see that the algorithm will not process further the graph (no slot can disappear by assigning nodes to others). But, this one graph:
does have less slots (5 vs 6). There are some things that I think could improve the algorithm though. The first idea I have is, instead of assigning arbitrarily nodes to other slots when we remove, we could choose the one with less nodes amongst the possibilities, so distribution is more “even”. The second would be to proceed to further optimizations after applying the algorithm once by selecting neighboring slots, and then assign where possible nodes from them to the rest of the slots. With the remaining nodes, proceed with the removal strategy, and check if some slot consolidation happen. If that’s the case we would end up with less slots that those selected first. Doing this with the graph from Fig. 5 would actually lead to Fig. 6 after applying these steps to the green and orange slots. However, I doubt that arriving to an optimal solution is guaranteed even with these changes.
To finalize, I applied the algorithm to the original problem that inspired this blog post, and I was able to squeeze 158 meetings in 22 time slots in a fraction of a second, in just 3 iterations (it was nice to see that the very fast convergence happens also in more complex cases than the example). So it turns out that assuming we could have 4 hours for these meetings per day, we would not have been able to organize the schedule without some conflicts… although for a very small margin!