Some Pragmatic Relationships between Machine Learning & Cybersecurity
Seminar TU Delft -- Cybersecurity Webinar series
Virtual
Oneliner: Anticipation of [DLS22] and [EuroSP22] @ TU Delft!
Abstract: In this talk, I will discuss two recent research efforts that will be presented in the next weeks at IEEE SPW DLS, and IEEE EuroSP.
The first involves adversarial attacks targeting both humans and Machine Learning (ML). The intuition is to generate adversarial samples by “shifting” an original sample towards a target sample, so that humans can perceive some difference between the original and the adversarial sample. Such assumptions are in stark contrast to common attacks involving perturbations of single pixels that are not recognizable by humans. This approach is relevant in, e.g., multi-stage processing of inputs, where both humans and machines are involved in decision-making because invisible perturbations will not fool a human.
The latter analyzes the impact of unlabelled data in cyberthreat detection (CTD). Despite abundant efforts propose to use ML to solve CTD problems, the realistic integration of ML methods is hindered by the difficulty in obtaining the large sets of labelled data to train ML detectors. A potential solution to this problem are semisupervised learning (SsL) methods, which combine small labelled datasets with large amounts of unlabelled data. In this talk, I investigate the utility of unlabelled data by (i) proposing a formal cost model for SsL in CTD; and (ii) formalizing a set of requirements for evaluation of SsL methods, which elucidates the contribution of unlabelled data. I will then show that the state-of-the-art does not allow to assess the impact of unlabelled data in CTD. By performing some experiments, I will then demonstrate “how” to empirically assess the role played by unlabelled data in SsL methods for CTD.