Site updated: 2024-08-19 17:12:13

jiliguluss · Aug 19, 2024 · a5ba5fa · a5ba5fa
1 parent 7ae8a6d
commit a5ba5fa
Show file tree

Hide file tree

Showing 3 changed files with 69 additions and 43 deletions.
diff --git a/2024/AFLplusplus源码分析——覆盖率/index.html b/2024/AFLplusplus源码分析——覆盖率/index.html
@@ -14,7 +14,7 @@
 
 
 
-    <script type="application/ld+json">{"@context":"http://schema.org","@type":"BlogPosting","author":{"@type":"Person","name":"一瓢清浅","sameAs":["#about","https://github.com/jiliguluss"],"image":"photo.jpg"},"articleBody":"\n\n前文 AFL++ 同步机制 提到，执行同步函数 sync_fuzzers 会调用函数 save_if_interesting。顾名思义，这个save_if_interesting 函数是用来保存 interesting 的 corpus。","dateCreated":"2024-08-15T17:28:52+08:00","dateModified":"2024-08-16T17:34:39+08:00","datePublished":"2024-08-15T17:28:52+08:00","description":"分析 AFL++ 度量覆盖率的相关代码","headline":"AFL++ 源码分析——覆盖率","image":[],"mainEntityOfPage":{"@type":"WebPage","@id":"https://www.stepbystep.asia/2024/AFLplusplus%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90%E2%80%94%E2%80%94%E8%A6%86%E7%9B%96%E7%8E%87/"},"publisher":{"@type":"Organization","name":"一瓢清浅","sameAs":["#about","https://github.com/jiliguluss"],"image":"photo.jpg","logo":{"@type":"ImageObject","url":"photo.jpg"}},"url":"https://www.stepbystep.asia/2024/AFLplusplus%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90%E2%80%94%E2%80%94%E8%A6%86%E7%9B%96%E7%8E%87/","keywords":"AFL++, Fuzz, 安全, 工具"}</script>
+    <script type="application/ld+json">{"@context":"http://schema.org","@type":"BlogPosting","author":{"@type":"Person","name":"一瓢清浅","sameAs":["#about","https://github.com/jiliguluss"],"image":"photo.jpg"},"articleBody":"\n\n前文 AFL++ 同步机制 提到，执行同步函数 sync_fuzzers 会调用函数 save_if_interesting。顾名思义，这个save_if_interesting 函数是用来保存 interesting 的 corpus。\nAFL++ 认为 corpus 是否 interesting，是基于 corpus 对 binary 的代码覆盖率来判断。在分析 save_if_interesting 的源码之前，有必要了解一下 AFL++ 的覆盖率统计机制。\n一、原理简介  在AFL++ 白皮书  中，对覆盖率的计算有简要说明。\n首先，通过插桩，来跟踪 corpus 在 binary 中走过的路径，并将路径转换为一系列 (branch_src, branch_dst) 元组的集合。例如：\n12corpus 1: A -&gt; B -&gt; C -&gt; D -&gt; E  =&gt;  (AB, BC, CD, DE)corpus 2: A -&gt; B -&gt; D -&gt; C -&gt; E  =&gt;  (AB, BD, DC, CE)\n\n其次，通过一个共享数组 shared_mem 来记录 (branch_src, branch_dst) 元组（可以看作是 CFG 中的 edge 的表示）被命中的次数，伪代码为：\n123cur_location = &lt;COMPILE_TIME_RANDOM&gt;;shared_mem[cur_location ^ prev_location]++; prev_location = cur_location &gt;&gt; 1;\n当 corpus 从 branch_src 走到 branch_dst 时，将 branch_dst 与branch_src进行异或运算的结果作为 shared_mem 的索引，并给索引指向的元素进行加一操作，表示 (branch_src, branch_dst) 多了一次命中。\n值得注意的是最后一行的右移操作。当从 branch_dst 开始找下一个 edge 时，并没有直接把 cur_location 赋值给 pre_location，而是先对cur_location 进行了一次右移操作，再赋值给pre_location。这样处理的好处有两个：\n\n区分 AB 和 BA。如果没有进行右移，那么 A^B 算出的索引，和 B^A 算出的索引，二者是相等的，也就是把 AB 和 BA 看做是同一个 edge。实际上 CFG 中 edge 都是有向边，方向性是一个很重要的信息。\n区分 AA 和 BB。在循环体中，如果 prev_location 与cur_location相等，那么 cur_location^prev_location 的结果将恒等于 0。导致不同的 basic block 在循环执行时，在 shared_mem 中无法得到有效区分。\n\n这种统计方式也是有一定局限性的，例如：\n123corpus 1: A -&gt; B -&gt; C -&gt; D -&gt; E  =&gt;  (AB, BC, CD, DE)corpus 2: A -&gt; B -&gt; C -&gt; A -&gt; E  =&gt;  (AB, BC, CA, AE)corpus 3: A -&gt; B -&gt; C -&gt; A -&gt; B -&gt; C -&gt; A -&gt; B -&gt; C -&gt; D -&gt; E  =&gt;  (AB, BC, CD, DE)\ncorpus 2 与 corpus 1 相比，增加了新的 edge 元组 CA 和 AE。因此 AFL++ 认为 corpus 2 找到一条新的路径。corpus 3 与 corpus 1 相比，没有增加新的 edge 元组。即使 corpus 3 的真实路径与 corpus 1 的真实路径有明显区别，但在 AFL++ 看来，corpus 3 并没有找到一条新路径。\nAFL++ 在判断一个 corpus 是否 interesting 时，除了考虑 corpus 有没有找到新路径（命中新 edge），也会考虑 edge 的命中次数。为了简化命中次数的比较，AFL++ 对次数进行分桶处理，即将命中次数分为如下八个 buckets（以 2 的幂次来分割）：\n11, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+\n记录一个疑问：3 作为单独一个桶有点乱入的感觉，为什么不是 2-3 作为一个桶，且为什么没有 0 这个桶呢？\n当一个 corpus 使得一个 edge 的次数从一个桶变到另一个桶时，这个 corpus 也会被认为是 interesting。\n总结一下，AFL++ 认为一个 corpus 是 interesting，当且仅当 corpus 至少满足以下条件之一：\n\ncorpus 找到了一个新的 edge。\ncorpus 使某个 edge 的命中次数从一个 bucket 转移到另一个 bucket。\n\n二、源码分析","dateCreated":"2024-08-15T17:28:52+08:00","dateModified":"2024-08-19T17:10:46+08:00","datePublished":"2024-08-15T17:28:52+08:00","description":"分析 AFL++ 度量覆盖率的相关代码","headline":"AFL++ 源码分析——覆盖率","image":[],"mainEntityOfPage":{"@type":"WebPage","@id":"https://www.stepbystep.asia/2024/AFLplusplus%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90%E2%80%94%E2%80%94%E8%A6%86%E7%9B%96%E7%8E%87/"},"publisher":{"@type":"Organization","name":"一瓢清浅","sameAs":["#about","https://github.com/jiliguluss"],"image":"photo.jpg","logo":{"@type":"ImageObject","url":"photo.jpg"}},"url":"https://www.stepbystep.asia/2024/AFLplusplus%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90%E2%80%94%E2%80%94%E8%A6%86%E7%9B%96%E7%8E%87/","keywords":"AFL++, Fuzz, 安全, 工具"}</script>
     <meta name="description" content="分析 AFL++ 度量覆盖率的相关代码">
 <meta property="og:type" content="blog">
 <meta property="og:title" content="AFL++ 源码分析——覆盖率">
@@ -23,7 +23,7 @@
 <meta property="og:description" content="分析 AFL++ 度量覆盖率的相关代码">
 <meta property="og:locale" content="zh_CN">
 <meta property="article:published_time" content="2024-08-15T09:28:52.000Z">
-<meta property="article:modified_time" content="2024-08-16T09:34:39.811Z">
+<meta property="article:modified_time" content="2024-08-19T09:10:46.084Z">
 <meta property="article:author" content="一瓢清浅">
 <meta property="article:tag" content="AFL++">
 <meta property="article:tag" content="Fuzz">
@@ -252,6 +252,32 @@ <h1 class="post-title">
             <!--excerpt-->
 
 <p>前文 <a href="../AFLplusplus%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90%E2%80%94%E2%80%94%E5%90%8C%E6%AD%A5%E6%9C%BA%E5%88%B6/index.html">AFL++ 同步机制</a> 提到，执行同步函数 <code>sync_fuzzers</code> 会调用函数 <code>save_if_interesting</code>。顾名思义，这个<code>save_if_interesting</code> 函数是用来保存 interesting 的 corpus。</p>
+<p>AFL++ 认为 corpus 是否 interesting，是基于 corpus 对 binary 的代码覆盖率来判断。在分析 <code>save_if_interesting</code> 的源码之前，有必要了解一下 AFL++ 的覆盖率统计机制。</p>
+<h2 id="一、原理简介"><a href="# 一、原理简介" class="headerlink" title="一、原理简介"></a>一、原理简介 </h2><p> 在<a target="_blank" rel="external nofollow noopener noreferrer" href="https://lcamtuf.coredump.cx/afl/technical_details.txt">AFL++ 白皮书 </a> 中，对覆盖率的计算有简要说明。</p>
+<p>首先，通过插桩，来跟踪 corpus 在 binary 中走过的路径，并将路径转换为一系列 <code>(branch_src, branch_dst)</code> 元组的集合。例如：</p>
+<figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">corpus 1: A -&gt; B -&gt; C -&gt; D -&gt; E  =&gt;  (AB, BC, CD, DE)</span><br><span class="line">corpus 2: A -&gt; B -&gt; D -&gt; C -&gt; E  =&gt;  (AB, BD, DC, CE)</span><br></pre></td></tr></table></figure>
+
+<p>其次，通过一个共享数组 <code>shared_mem</code> 来记录 <code>(branch_src, branch_dst)</code> 元组（可以看作是 CFG 中的 edge 的表示）被命中的次数，伪代码为：</p>
+<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">cur_location = &lt;COMPILE_TIME_RANDOM&gt;;</span><br><span class="line">shared_mem[cur_location ^ prev_location]++; </span><br><span class="line">prev_location = cur_location &gt;&gt; <span class="number">1</span>;</span><br></pre></td></tr></table></figure>
+<p>当 corpus 从 <code>branch_src</code> 走到 <code>branch_dst</code> 时，将 <code>branch_dst</code> 与<code>branch_src</code>进行异或运算的结果作为 <code>shared_mem</code> 的索引，并给索引指向的元素进行加一操作，表示 <code>(branch_src, branch_dst)</code> 多了一次命中。</p>
+<p>值得注意的是最后一行的右移操作。当从 <code>branch_dst</code> 开始找下一个 edge 时，并没有直接把 <code>cur_location</code> 赋值给 <code>pre_location</code>，而是先对<code>cur_location</code> 进行了一次右移操作，再赋值给<code>pre_location</code>。这样处理的好处有两个：</p>
+<ol>
+<li>区分 AB 和 BA。如果没有进行右移，那么 <code>A^B</code> 算出的索引，和 <code>B^A</code> 算出的索引，二者是相等的，也就是把 AB 和 BA 看做是同一个 edge。实际上 CFG 中 edge 都是有向边，方向性是一个很重要的信息。</li>
+<li>区分 AA 和 BB。在循环体中，如果 <code>prev_location</code> 与<code>cur_location</code>相等，那么 <code>cur_location^prev_location</code> 的结果将恒等于 0。导致不同的 basic block 在循环执行时，在 <code>shared_mem</code> 中无法得到有效区分。</li>
+</ol>
+<p>这种统计方式也是有一定局限性的，例如：</p>
+<figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">corpus 1: A -&gt; B -&gt; C -&gt; D -&gt; E  =&gt;  (AB, BC, CD, DE)</span><br><span class="line">corpus 2: A -&gt; B -&gt; C -&gt; A -&gt; E  =&gt;  (AB, BC, CA, AE)</span><br><span class="line">corpus 3: A -&gt; B -&gt; C -&gt; A -&gt; B -&gt; C -&gt; A -&gt; B -&gt; C -&gt; D -&gt; E  =&gt;  (AB, BC, CD, DE)</span><br></pre></td></tr></table></figure>
+<p>corpus 2 与 corpus 1 相比，增加了新的 edge 元组 CA 和 AE。因此 AFL++ 认为 corpus 2 找到一条新的路径。<br>corpus 3 与 corpus 1 相比，没有增加新的 edge 元组。即使 corpus 3 的真实路径与 corpus 1 的真实路径有明显区别，但在 AFL++ 看来，corpus 3 并没有找到一条新路径。</p>
+<p>AFL++ 在判断一个 corpus 是否 interesting 时，除了考虑 corpus 有没有找到新路径（命中新 edge），也会考虑 edge 的命中次数。为了简化命中次数的比较，AFL++ 对次数进行分桶处理，即将命中次数分为如下八个 buckets（以 2 的幂次来分割）：</p>
+<figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+</span><br></pre></td></tr></table></figure>
+<p>记录一个疑问：3 作为单独一个桶有点乱入的感觉，为什么不是 2-3 作为一个桶，且为什么没有 0 这个桶呢？</p>
+<p>当一个 corpus 使得一个 edge 的次数从一个桶变到另一个桶时，这个 corpus 也会被认为是 interesting。</p>
+<p>总结一下，<strong>AFL++ 认为一个 corpus 是 interesting，当且仅当 corpus 至少满足以下条件之一</strong>：</p>
+<ol>
+<li><strong>corpus 找到了一个新的 edge</strong>。</li>
+<li><strong>corpus 使某个 edge 的命中次数从一个 bucket 转移到另一个 bucket</strong>。</li>
+</ol>
+<h2 id="二、源码分析"><a href="# 二、源码分析" class="headerlink" title="二、源码分析"></a>二、源码分析</h2>
 
 
 

diff --git a/baidusitemap.xml b/baidusitemap.xml
@@ -2,7 +2,7 @@
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
     <loc>https://www.stepbystep.asia/2024/AFLplusplus%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90%E2%80%94%E2%80%94%E8%A6%86%E7%9B%96%E7%8E%87/</loc>
-    <lastmod>2024-08-16</lastmod>
+    <lastmod>2024-08-19</lastmod>
   </url>
   <url>
     <loc>https://www.stepbystep.asia/2024/%E7%94%B1subprocess-PIPE%E5%BC%95%E5%8F%91%E7%9A%84%E8%A1%80%E6%A1%88/</loc>