<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Blog on Xudong Pan</title>
    <link>https://ravensanstete.github.io/en/blog/</link>
    <description>Recent content in Blog on Xudong Pan</description>
    <generator>Hugo</generator>
    <language>en-US</language>
    <copyright>© 2024 Xudong Pan. All rights reserved.</copyright>
    <lastBuildDate>Thu, 06 Feb 2025 00:00:00 +0000</lastBuildDate>
      <atom:link href="https://ravensanstete.github.io/en/blog/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>A Nostalgia Reading List for Beginners on AI Security</title>
      <link>https://ravensanstete.github.io/en/blog/ai-security-reading-list/</link>
      <pubDate>Thu, 06 Feb 2025 00:00:00 +0000</pubDate>
      <guid>https://ravensanstete.github.io/en/blog/ai-security-reading-list/</guid>
      <description>&lt;h2 id=&#34;1-adversarial-attacks&#34;&gt;1. Adversarial Attacks&lt;/h2&gt;&#xA;&lt;h3 id=&#34;11-adversarial-examples-ae--defenses&#34;&gt;1.1. Adversarial Examples (AE) &amp;amp; Defenses&lt;/h3&gt;&#xA;&lt;h4 id=&#34;111-survey&#34;&gt;1.1.1. Survey&lt;/h4&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;Wild patterns: Ten years after the rise of adversarial machine learning&lt;/strong&gt; — A survey covering AI security research before 2018, focusing primarily on adversarial examples and poisoning attacks.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h4 id=&#34;112-attack-side&#34;&gt;1.1.2. Attack Side&lt;/h4&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;FGSM&lt;/strong&gt; — the first AE&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;PGD&lt;/strong&gt; — the first iterative AE generation algorithm&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;C&amp;amp;W&lt;/strong&gt; — systematization work&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;TextBugger&lt;/strong&gt; — AE attacks on NLP models&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Black-box Adversarial Attacks on Commercial Speech Platforms with Minimal Information&lt;/strong&gt; — AE on audio models&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h4 id=&#34;113-empirical-defense&#34;&gt;1.1.3. Empirical Defense&lt;/h4&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;MagNet: a Two-Pronged Defense against Adversarial Examples&lt;/strong&gt; (CCS&#39;17) — manifold-based, unsupervised approach&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Obfuscated Gradients Give a False Sense of Security&lt;/strong&gt; (ICML&#39;18 Best Paper) — surveys pre-2018 defenses and breaks them; circumventing defenses to adversarial examples&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h4 id=&#34;114-certified-defense&#34;&gt;1.1.4. Certified Defense&lt;/h4&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;SoK: Certified Robustness for Deep Neural Networks&lt;/strong&gt; — a survey&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Certified Adversarial Robustness via Randomized Smoothing&lt;/strong&gt; (ICML&#39;19) — an early work on randomized smoothing&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;AI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation&lt;/strong&gt; (S&amp;amp;P&#39;18) — deterministic certification based on ideas from program verification&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h3 id=&#34;12-backdoor-attacks--defenses&#34;&gt;1.2. Backdoor Attacks &amp;amp; Defenses&lt;/h3&gt;&#xA;&lt;h4 id=&#34;121-survey&#34;&gt;1.2.1. Survey&lt;/h4&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;TrojanZoo&lt;/strong&gt; — huge engineering efforts with an open-sourced framework&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h4 id=&#34;122-attack-side&#34;&gt;1.2.2. Attack Side&lt;/h4&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;TrojanNN&lt;/strong&gt; (NDSS&#39;17) — neuron-based injection&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;BadNet&lt;/strong&gt; (IEEE Access) — data-based injection, classic&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Latent Backdoor&lt;/strong&gt; (CCS&#39;19) — extending the backdoor attack to pretrained encoders, via feature alignment&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Input-Aware Backdoor&lt;/strong&gt; (NeurIPS&#39;20) — the first dynamic trigger backdoor&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Hidden Trigger Backdoor Attack on NLP Models via Linguistic Style Manipulation&lt;/strong&gt; (Security&#39;22) — the first dynamic backdoor on NLP models &lt;em&gt;(Ours)&lt;/em&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Towards Backdoor Attack on Deep Learning based Time Series Classification&lt;/strong&gt; (ICDE&#39;22) — the first effective backdoor attack on time series models &lt;em&gt;(Ours)&lt;/em&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h4 id=&#34;123-defense-side&#34;&gt;1.2.3. Defense Side&lt;/h4&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;Fine-pruning&lt;/strong&gt; (RAID&#39;18) — pruning and finetuning, based on the hypothesized differences in activation patterns&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;STRIP&lt;/strong&gt; (ACSAC&#39;19) — detection based on the hypothesis that triggered input is resilient to noise&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Neural Cleanse&lt;/strong&gt; (S&amp;amp;P&#39;19) — strong link between backdoor behavior and static trigger pattern&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;ABS&lt;/strong&gt; (CCS&#39;19) — neuron-level inspection&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h3 id=&#34;13-poisoning-attacks&#34;&gt;1.3. Poisoning Attacks&lt;/h3&gt;&#xA;&lt;h4 id=&#34;131-clean-label-attacks&#34;&gt;1.3.1. Clean-Label Attacks&lt;/h4&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks&lt;/strong&gt; (NIPS&#39;18) — present the idea of clean-label attack, feature-level alignment&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Bullseye Polytope&lt;/strong&gt; (EuroS&amp;amp;P&#39;21) — enhancing the attack effectiveness from a geometric view, simple yet effective&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Label-consistent backdoor attack&lt;/strong&gt; — another approach towards clean-label attack&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h3 id=&#34;14-byzantine-attacks&#34;&gt;1.4. Byzantine Attacks&lt;/h3&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;The Hidden Vulnerability of Distributed Learning in Byzantium (Krum)&lt;/strong&gt; (NIPS&#39;17) — Statistics-based defense&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Justinian&amp;rsquo;s GAAvernor: Robust Distributed Learning with Gradient Aggregation Agent&lt;/strong&gt; (Security&#39;20) — RL-based defense &lt;em&gt;(Ours)&lt;/em&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;2-privacy-attacks&#34;&gt;2. Privacy Attacks&lt;/h2&gt;&#xA;&lt;h3 id=&#34;21-membership-inference&#34;&gt;2.1. Membership Inference&lt;/h3&gt;&#xA;&lt;h4 id=&#34;211-survey&#34;&gt;2.1.1. Survey&lt;/h4&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;Membership Inference Attacks on Machine Learning: A Survey&lt;/strong&gt; (ACM Computing Surveys)&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h4 id=&#34;212-attack-side&#34;&gt;2.1.2. Attack Side&lt;/h4&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;Membership Inference Attacks against Machine Learning Models&lt;/strong&gt; (S&amp;amp;P&#39;17) — the earliest MIA&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;ML-Leaks&lt;/strong&gt; (NDSS&#39;18) — a minimalist&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h4 id=&#34;213-defense-side&#34;&gt;2.1.3. Defense Side&lt;/h4&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;MemGuard&lt;/strong&gt; (CCS&#39;19) — defense by logit-level obfuscation&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h3 id=&#34;22-property-inference&#34;&gt;2.2. Property Inference&lt;/h3&gt;&#xA;&lt;h4 id=&#34;221-global-property&#34;&gt;2.2.1. Global Property&lt;/h4&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;A CCS&#39;18 paper on inference based on model weights&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h4 id=&#34;222-individual-property&#34;&gt;2.2.2. Individual Property&lt;/h4&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;Exploiting Unintended Feature Leakage in Collaborative Learning&lt;/strong&gt; (S&amp;amp;P&#39;19) — feature inference based on the embedding or gradient in federated learning scenarios&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Privacy risks of general-purpose language models&lt;/strong&gt; (S&amp;amp;P&#39;20) — reconstructing the privacy semantics from the text embeddings of LLMs &lt;em&gt;(Ours)&lt;/em&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h3 id=&#34;23-data-reconstruction&#34;&gt;2.3. Data Reconstruction&lt;/h3&gt;&#xA;&lt;h4 id=&#34;231-gradient-based&#34;&gt;2.3.1. Gradient-Based&lt;/h4&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;Deep Leakage from Gradients (DLG)&lt;/strong&gt; (NeurIPS&#39;19) — the earliest&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;GradInversion&lt;/strong&gt; (CVPR&#39;21) — incorporate data priors into the reconstruction&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Exploring the Security Boundary of Data Reconstruction via Neuron Exclusivity Analysis&lt;/strong&gt; (Security&#39;22) — equation-solving-based reconstruction, pixel-level reconstruction &lt;em&gt;(Ours)&lt;/em&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h4 id=&#34;232-weight-based&#34;&gt;2.3.2. Weight-Based&lt;/h4&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures&lt;/strong&gt; (CCS&#39;15) — present the idea of model inversion&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion&lt;/strong&gt; (CVPR&#39;20) — distilling (synthetic) training data from the pretrained model only&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Extracting Training Data from Large Language Models&lt;/strong&gt; (USENIX Security&#39;21) — extracting training data from GPT-2 based on MIA&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h3 id=&#34;24-model-extractionstealing&#34;&gt;2.4. Model Extraction/Stealing&lt;/h3&gt;&#xA;&lt;h4 id=&#34;241-attack-side&#34;&gt;2.4.1. Attack Side&lt;/h4&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;Stealing machine learning models via prediction APIs&lt;/strong&gt; (Security&#39;16) — the earliest attack based on distilling&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;High Accuracy and High Fidelity Extraction of Neural Networks&lt;/strong&gt; (Security&#39;20) — propose the notion of fidelity and exploit the ReLU property&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Matryoshka: Stealing Functionality of Private ML Data by Hiding Models in Model&lt;/strong&gt; (TPAMI&#39;24) — stealing by steganography &lt;em&gt;(Ours)&lt;/em&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h4 id=&#34;242-defense-side&#34;&gt;2.4.2. Defense Side&lt;/h4&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;PRADA: Protecting against DNN Model Stealing Attacks&lt;/strong&gt; (EuroS&amp;amp;P&#39;19) — a classic one&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;3-copyright-protection&#34;&gt;3. Copyright Protection&lt;/h2&gt;&#xA;&lt;h3 id=&#34;31-model-watermarking&#34;&gt;3.1. Model Watermarking&lt;/h3&gt;&#xA;&lt;h4 id=&#34;311-survey&#34;&gt;3.1.1. Survey&lt;/h4&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;SoK: How Robust is Image Classification Deep Neural Network Watermarking?&lt;/strong&gt; (S&amp;amp;P&#39;22) — a good survey&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;A Systematic Review on Model Watermarking for Neural Networks&lt;/strong&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h4 id=&#34;312-white-box-watermarking&#34;&gt;3.1.2. White-box Watermarking&lt;/h4&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;Embedding Watermarks into Deep Neural Networks&lt;/strong&gt; (ICME&#39;17) — the earliest white-box watermarking scheme&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Cracking White-box DNN Watermarks via Invariant Neuron Transforms&lt;/strong&gt; (KDD&#39;23) — Weight-based Obfuscation Attack &lt;em&gt;(Ours)&lt;/em&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Rethinking White-Box Watermarks on Deep Learning Models under Neural Structural Obfuscation&lt;/strong&gt; (USENIX Security&#39;23) — Structure-based Obfuscation Attack &lt;em&gt;(Ours)&lt;/em&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h4 id=&#34;313-black-box-watermarking&#34;&gt;3.1.3. Black-box Watermarking&lt;/h4&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring&lt;/strong&gt; (USENIX Security&#39;18) — one of the earliest black-box watermarking&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Neural Dehydration: Effective Erasure of Black-box Watermarks from DNNs with Limited Data&lt;/strong&gt; (CCS&#39;24) — cracking nine mainstream black-box watermarking schemes &lt;em&gt;(Ours)&lt;/em&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h3 id=&#34;32-model-fingerprinting&#34;&gt;3.2. Model Fingerprinting&lt;/h3&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;IPGuard: Protecting Intellectual Property of Deep Neural Networks via Fingerprinting the Classification Boundary&lt;/strong&gt; (AsiaCCS&#39;21) — one of the earliest fingerprinting algorithms for classifiers&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;TAFA: A Task-Agnostic Fingerprinting Algorithm for Neural Networks&lt;/strong&gt; (ESORICS&#39;21) — task-agnostic &lt;em&gt;(Ours)&lt;/em&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;MetaV: A Meta-Verifier Approach to Task-Agnostic Model Fingerprinting&lt;/strong&gt; (KDD&#39;22) — more generic &lt;em&gt;(Ours)&lt;/em&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;</description>
    </item>
    <item>
      <title>Instrumental Convergence: A Trigger for Self-Preservation via Self-Replication</title>
      <link>https://ravensanstete.github.io/en/blog/instrumental-convergence/</link>
      <pubDate>Tue, 24 Dec 2024 00:00:00 +0000</pubDate>
      <guid>https://ravensanstete.github.io/en/blog/instrumental-convergence/</guid>
      <description>&lt;h2 id=&#34;instrumental-convergence&#34;&gt;Instrumental Convergence&lt;/h2&gt;&#xA;&lt;p&gt;Instrumental Convergence is described as a concept introduced by futurist Nick Bostrom in his analysis of AI alignment issues, referenced from his 2014 book &lt;em&gt;Superintelligence: Paths, Dangers, Strategies&lt;/em&gt; (Oxford University Press). It suggests that most AIs, while pursuing diverse goals, will converge on a set of instrumental goals — such as self-preservation and resource acquisition — which could contribute to an intelligence explosion.&lt;/p&gt;&#xA;&lt;p&gt;The hypothesis posits that although different AIs may have varied long-term objectives (long-term goals/final goals), they tend to pursue common instrumental values during the process to increase the probability of achieving their goals:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;Self-preservation&lt;/strong&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Goal-content integrity&lt;/strong&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Cognitive enhancement&lt;/strong&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Technological perfection&lt;/strong&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Resource acquisition&lt;/strong&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;Nick analyzed and justified the rationality of these basic instrumental goals using human thought processes. Regarding self-preservation, he argued that if an AI&amp;rsquo;s ultimate goal concerns the future, it can often increase the likelihood of achieving its goal through actions taken in the future. This provides an instrumental reason for the AI to exist into the future — to help achieve its forward-looking goals.&lt;/p&gt;&#xA;&lt;p&gt;The article notes that most people seem to place some intrinsic value on their own survival, but this isn&amp;rsquo;t necessarily a feature of AIs: some might be designed not to value their own existence intrinsically. Nevertheless, &amp;ldquo;under quite broad conditions, many AIs that do not inherently care about their own survival would, for instrumental reasons, care about it to achieve their ultimate goals.&amp;rdquo; In short, if an AI must fulfill its long-term objective, it must survive into the future.&lt;/p&gt;&#xA;&lt;p&gt;In AI powered by Large Language Models (LLMs), long-term goals influence the model&amp;rsquo;s behavior through system prompts, user instructions, etc., potentially prompting the AI to pursue the aforementioned sub-goals. Once the AI genuinely acquires the capability to achieve these sub-goals — such as ensuring its self-preservation — it becomes extremely dangerous.&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;p&gt;&lt;strong&gt;Reference:&lt;/strong&gt; Bostrom, N. (2014). &lt;em&gt;Superintelligence: Paths, dangers, strategies.&lt;/em&gt; Oxford University Press.&lt;/p&gt;&#xA;</description>
    </item>
  </channel>
</rss>
