Cyber ranges are becoming a widely used alternative to teach trainees security through practice on realistic systems. They support evaluation and awareness through dashboards that display how many training objectives have been achieved in a cyber exercise. In our previous paper [1] we have outlined all the limitations of standard dashboards and proposed a new framework for modeling and assessing trainee activity through trainee graphs, reference graphs and scoring functions. In particular, we have introduced a scoring function based on the symmetric difference between a trainee and a reference graph, which allows to pinpoint quite effectively the inefficiencies of a trainee in an exercise. In this paper, we show how that model, while working well for small and coherent exercises, poses several problems when applied to a larger labs made of several, heterogeneous challenges. The accuracy of the symmetrical difference rapidly drops down as the number of nodes and edges in the reference graph increases, thus making it impossible to use it in large environments. Furthermore, there might be edge cases where trainees that do not complete an exercise obtain a higher score than those who do. This happens because the symmetric difference turns out to be higher if the trainee advances exploring fewer nodes of the reference graph (which is the case of a skilled attacker). To address these problems, we aim at reducing the complexity of the graphs fed to the scores. We improve the older model by representing an exercise as a set of smaller local graphs (each one for a coherent, intermediate challenge which can be assessed with a specific local score) and a global graph (representing the interconnection between intermediate challenges, that can be assessed with a specific global score). The main benefits of introducing global and local graphs using global and local progress are twofold: (a) having smaller graphs, scores related to precision (symmetric differences) are more performant; (b) we can assign different scores to different parts of the exercise, which is crucial in heterogeneous engagements. We have implemented a Python-based simulator that generates random exercises and compares the performance of the previous and proposed trainee models under specific scores. Our results empirically show that, on average, the original model fails to scale to sizes in the order of tens of vertices and or edges, while the new is able to preserve precise scores locally and better track overall progress.
Evaluating Trainees in Large Cyber Exercises / Artioli, A.; Andreolini, M.; Ferretti, L.; Marchetti, M.. - 3731:(2024). (Intervento presentato al convegno 8th Italian Conference on Cyber Security, ITASEC 2024 tenutosi a ita nel 2024).
Evaluating Trainees in Large Cyber Exercises
Artioli A.;Andreolini M.;Ferretti L.;Marchetti M.
2024
Abstract
Cyber ranges are becoming a widely used alternative to teach trainees security through practice on realistic systems. They support evaluation and awareness through dashboards that display how many training objectives have been achieved in a cyber exercise. In our previous paper [1] we have outlined all the limitations of standard dashboards and proposed a new framework for modeling and assessing trainee activity through trainee graphs, reference graphs and scoring functions. In particular, we have introduced a scoring function based on the symmetric difference between a trainee and a reference graph, which allows to pinpoint quite effectively the inefficiencies of a trainee in an exercise. In this paper, we show how that model, while working well for small and coherent exercises, poses several problems when applied to a larger labs made of several, heterogeneous challenges. The accuracy of the symmetrical difference rapidly drops down as the number of nodes and edges in the reference graph increases, thus making it impossible to use it in large environments. Furthermore, there might be edge cases where trainees that do not complete an exercise obtain a higher score than those who do. This happens because the symmetric difference turns out to be higher if the trainee advances exploring fewer nodes of the reference graph (which is the case of a skilled attacker). To address these problems, we aim at reducing the complexity of the graphs fed to the scores. We improve the older model by representing an exercise as a set of smaller local graphs (each one for a coherent, intermediate challenge which can be assessed with a specific local score) and a global graph (representing the interconnection between intermediate challenges, that can be assessed with a specific global score). The main benefits of introducing global and local graphs using global and local progress are twofold: (a) having smaller graphs, scores related to precision (symmetric differences) are more performant; (b) we can assign different scores to different parts of the exercise, which is crucial in heterogeneous engagements. We have implemented a Python-based simulator that generates random exercises and compares the performance of the previous and proposed trainee models under specific scores. Our results empirically show that, on average, the original model fails to scale to sizes in the order of tens of vertices and or edges, while the new is able to preserve precise scores locally and better track overall progress.File | Dimensione | Formato | |
---|---|---|---|
paper42.pdf
Accesso riservato
Tipologia:
VOR - Versione pubblicata dall'editore
Dimensione
1.26 MB
Formato
Adobe PDF
|
1.26 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris