Solving Cyber-Alert Allocation Markov Games with Deep Reinforcement Learning
MetadataShow full metadata
Most large scale networks employ some kind of intrusion detection software on their computer systems (e.g., servers, routers, etc.) that monitor and flag suspicious or abnormal activity. When possibly malicious activity is detected, one or more cyber-alerts may be generated with varying levels of significance (e.g., high, medium, or low). Some subset of these alerts may then be assigned to cyber-security analysts on staff for further investigation. Due to the wide range of potential attacks and the high degrees of attack sophistication, identifying what constitutes a true attack is a challenging problem – especially for organizations performing critical operations, like military bases or financial institutions, that are constantly subjected to high volumes of cyber-attacks every day. In this thesis, we develop a framework that allows us to study what game-theoretic behavior looks like from both the attacker and defender’s perspective. Our approach considers a series of sub-games between the attacker and defender in which a state is maintained between each sub-game. We first derive optimal allocation strategies via the use of dynamic programming and Q-maximin value iteration based algorithms. We then move into approximation techniques, using deep neural networks and Q-learning to derive near-optimal strategies that allow us to explore much larger models. We assess the effectiveness of our allocation strategies by comparing them to other sensible heuristics (e.g., random and myopic) with our results showing that we consistently outperform these other strategies at minimizing risk.