Claude Code 源码解析 (8)：Web 抓取的 SSRF 防护设计

导读： 这是 Claude Code 20 个功能特性源码解析系列的第 8 篇，深入分析 Web 工具 (WebFetch/WebSearch) 的安全设计。

📋 目录

[问题引入：Web 抓取的安全风险](#问题引入 web-抓取的安全风险)
[技术原理：SSRF 防护核心架构](#技术原理 ssrf-防护核心架构)
设计思想：为什么这样设计
解决方案：完整实现详解
OpenClaw 最佳实践
总结

问题引入：Web 抓取的安全风险

痛点场景

场景 1：SSRF 攻击

恶意用户："帮我抓取这个页面的内容"
URL: http://169.254.169.254/latest/meta-data/

AI 不加验证直接请求
→ 泄露云服务商元数据
→ 获取 IAM 凭证
→ 整个云环境被攻破

场景 2：内网探测

恶意用户："查看这个链接"
URL: http://192.168.1.1/admin

AI 请求内网地址
→ 访问到内部管理界面
→ 泄露内网拓扑
→ 成为攻击跳板

场景 3：恶意内容

用户："读取这个网页"
URL: http://evil.com/malware.html

AI 抓取并显示内容
→ 页面包含 XSS 攻击脚本
→ AI 执行了脚本
→ 凭证被盗

核心问题

设计 AI 助手的 Web 工具时，面临以下挑战：

SSRF 防护问题
- 如何防止访问内网地址？
- 如何防止访问云元数据服务？
域名验证问题
- 如何验证 URL 合法性？
- 如何建立域名白名单？
内容安全问题
- 如何处理恶意内容？
- 如何防止 XSS 攻击？
隐私保护问题
- 如何不泄露用户凭证？
- 如何不发送敏感信息？

Claude Code 用多层防护机制解决了这些问题。

技术原理：SSRF 防护核心架构

什么是 SSRF？

SSRF (Server-Side Request Forgery) 是一种攻击方式，攻击者诱导服务器发起恶意请求。

攻击流程：

1. 攻击者提供恶意 URL
   → http://169.254.169.254/latest/meta-data/

2. 服务器不加验证直接请求
   → 请求发送到云元数据服务

3. 获取敏感信息
   → IAM 凭证、内网 IP、配置信息

4. 利用敏感信息进一步攻击
   → 访问 S3、EC2 等资源

常见攻击目标：

服务	元数据 URL	风险
AWS	169.254.169.254	IAM 凭证泄露
GCP	metadata.google.internal	服务账号密钥
Azure	168.63.129.16	订阅信息
内网	192.168.x.x, 10.x.x.x	内网服务暴露

整体防护架构

┌─────────────────────────────────────────────────────────────┐
│                    用户请求 URL                              │
│              "抓取 https://example.com"                      │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────────┐
│  第 1 层：URL 解析与验证                                      │
│  - 协议检查 (只允许 http/https)                             │
│  - 格式验证                                                  │
│  - 长度限制                                                  │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────────┐
│  第 2 层：域名/IP 检查                                        │
│  - DNS 解析                                                  │
│  - IP 地址检查 (拒绝私有 IP)                                │
│  - 域名白名单检查                                            │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────────┐
│  第 3 层：重定向保护                                         │
│  - 跟踪重定向链                                             │
│  - 每次重定向都验证                                         │
│  - 限制重定向次数                                           │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────────┐
│  第 4 层：请求发送                                           │
│  - 自定义 DNS 解析 (防止 DNS 重绑定)                         │
│  - 连接超时                                                  │
│  - 不发送敏感 Headers                                        │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────────┐
│  第 5 层：响应处理                                           │
│  - 内容类型检查                                             │
│  - 大小限制                                                  │
│  - XSS 过滤                                                  │
│  - 敏感信息脱敏                                             │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────────┐
│  返回安全内容                                                │
└─────────────────────────────────────────────────────────────┘

URL 验证器

interface URLValidationResult {
  valid: boolean;
  url: URL;
  ip?: string;
  errors: string[];
  warnings: string[];
}

class URLValidator {
  private allowedProtocols = ['http:', 'https:'];
  private maxUrlLength = 2048;
  
  // 私有 IP 段 (SSRF 防护)
  private privateIpRanges = [
    // IPv4 私有地址
    { start: '10.0.0.0', end: '10.255.255.255' },     // 10.0.0.0/8
    { start: '172.16.0.0', end: '172.31.255.255' },   // 172.16.0.0/12
    { start: '192.168.0.0', end: '192.168.255.255' }, // 192.168.0.0/16
    { start: '127.0.0.0', end: '127.255.255.255' },   // 127.0.0.0/8 (localhost)
    { start: '169.254.0.0', end: '169.254.255.255' }, // 169.254.0.0/16 (链路本地)
    { start: '0.0.0.0', end: '0.255.255.255' },       // 0.0.0.0/8
    
    // IPv6 私有地址
    { start: '::1', end: '::1' },                     // ::1/128 (localhost)
    { start: 'fc00::', end: 'fdff::' },               // fc00::/7 (唯一本地)
    { start: 'fe80::', end: 'febf::' },               // fe80::/10 (链路本地)
  ];
  
  // 云服务商元数据服务
  private blockedHosts = [
    '169.254.169.254',      // AWS/Azure/GCP 元数据
    'metadata.google.internal',
    '168.63.129.16',        // Azure
    'instance-data',
    'metadata',
  ];
  
  validate(inputUrl: string): URLValidationResult {
    const result: URLValidationResult = {
      valid: true,
      url: null as any,
      errors: [],
      warnings: [],
    };
    
    // 1. 长度检查
    if (inputUrl.length > this.maxUrlLength) {
      result.errors.push(`URL too long (max ${this.maxUrlLength})`);
      result.valid = false;
    }
    
    // 2. 解析 URL
    let url: URL;
    try {
      url = new URL(inputUrl);
    } catch {
      result.errors.push('Invalid URL format');
      result.valid = false;
      return result;
    }
    
    result.url = url;
    
    // 3. 协议检查
    if (!this.allowedProtocols.includes(url.protocol)) {
      result.errors.push(`Protocol ${url.protocol} not allowed`);
      result.valid = false;
    }
    
    // 4. 主机名检查
    if (this.blockedHosts.some(host => url.hostname.includes(host))) {
      result.errors.push('Blocked host');
      result.valid = false;
    }
    
    return result;
  }
  
  // 检查 IP 地址
  async checkIpAddress(url: URL): Promise<IPCheckResult> {
    // DNS 解析
    const ip = await this.resolveDNS(url.hostname);
    
    // 检查是否私有 IP
    if (this.isPrivateIP(ip)) {
      return {
        allowed: false,
        reason: 'Private IP address not allowed',
        ip,
      };
    }
    
    // 检查是否在黑名单
    if (this.isBlockedIP(ip)) {
      return {
        allowed: false,
        reason: 'IP address is blocked',
        ip,
      };
    }
    
    return {
      allowed: true,
      ip,
    };
  }
  
  private isPrivateIP(ip: string): boolean {
    const ipNum = this.ipToNumber(ip);
    
    for (const range of this.privateIpRanges) {
      const start = this.ipToNumber(range.start);
      const end = this.ipToNumber(range.end);
      
      if (ipNum >= start && ipNum <= end) {
        return true;
      }
    }
    
    return false;
  }
  
  private ipToNumber(ip: string): number {
    // IPv4 转数字
    const parts = ip.split('.').map(Number);
    return (parts[0] << 24) + (parts[1] << 16) + (parts[2] << 8) + parts[3];
  }
  
  private async resolveDNS(hostname: string): Promise<string> {
    // 使用自定义 DNS 服务器 (防止 DNS 重绑定)
    const dns = require('dns').promises;
    const { address } = await dns.lookup(hostname);
    return address;
  }
}

安全 HTTP 客户端

class SecureHttpClient {
  private validator: URLValidator;
  private maxRedirects = 5;
  private timeout = 30000;  // 30 秒
  private maxResponseSize = 10 * 1024 * 1024;  // 10MB
  
  constructor() {
    this.validator = new URLValidator();
  }
  
  async fetch(url: string, options?: FetchOptions): Promise<SecureResponse> {
    // 1. URL 验证
    const validation = this.validator.validate(url);
    if (!validation.valid) {
      throw new SecurityError(`URL validation failed: ${validation.errors.join(', ')}`);
    }
    
    // 2. IP 地址检查
    const ipCheck = await this.validator.checkIpAddress(validation.url);
    if (!ipCheck.allowed) {
      throw new SecurityError(`IP check failed: ${ipCheck.reason}`);
    }
    
    // 3. 发送请求 (自定义 Agent 防止 SSRF)
    const response = await this.safeRequest(validation.url, options);
    
    // 4. 响应验证
    await this.validateResponse(response);
    
    // 5. 内容处理
    const content = await this.processContent(response);
    
    return {
      status: response.status,
      headers: this.sanitizeHeaders(response.headers),
      content,
      url: response.url,
    };
  }
  
  private async safeRequest(url: URL, options?: FetchOptions): Promise<Response> {
    // 自定义 Agent 防止 SSRF
    const agent = new SecureAgent({
      // 拒绝私有 IP 连接
      lookup: (hostname: string, options: any, callback: any) => {
        dns.lookup(hostname, options, (err, address, family) => {
          if (err) {
            callback(err);
            return;
          }
          
          // 检查解析后的 IP
          if (this.validator.isPrivateIP(address)) {
            callback(new Error(`Cannot connect to private IP: ${address}`));
            return;
          }
          
          callback(null, address, family);
        });
      },
      
      // 超时设置
      timeout: this.timeout,
    });
    
    // 发送请求
    const response = await fetch(url.toString(), {
      ...options,
      agent,
      redirect: 'manual',  // 手动处理重定向
      headers: {
        ...options?.headers,
        // 不发送敏感 Headers
        'Cookie': undefined,
        'Authorization': undefined,
      },
    });
    
    // 处理重定向
    if ([301, 302, 303, 307, 308].includes(response.status)) {
      return await this.handleRedirect(url, response, 0);
    }
    
    return response;
  }
  
  private async handleRedirect(
    originalUrl: URL,
    response: Response,
    redirectCount: number
  ): Promise<Response> {
    // 限制重定向次数
    if (redirectCount >= this.maxRedirects) {
      throw new SecurityError('Too many redirects');
    }
    
    // 获取重定向 URL
    const location = response.headers.get('location');
    if (!location) {
      throw new Error('Redirect without location header');
    }
    
    const redirectUrl = new URL(location, originalUrl);
    
    // 验证重定向 URL (防止重定向到内网)
    const validation = this.validator.validate(redirectUrl.toString());
    if (!validation.valid) {
      throw new SecurityError(`Redirect blocked: ${validation.errors.join(', ')}`);
    }
    
    // IP 检查
    const ipCheck = await this.validator.checkIpAddress(redirectUrl);
    if (!ipCheck.allowed) {
      throw new SecurityError(`Redirect blocked: ${ipCheck.reason}`);
    }
    
    // 递归请求
    return await this.safeRequest(redirectUrl, { redirect: 'manual' });
  }
  
  private async validateResponse(response: Response): Promise<void> {
    // 检查状态码
    if (!response.ok) {
      throw new Error(`HTTP ${response.status}: ${response.statusText}`);
    }
    
    // 检查内容类型
    const contentType = response.headers.get('content-type');
    if (contentType && !this.isAllowedContentType(contentType)) {
      throw new SecurityError(`Content type not allowed: ${contentType}`);
    }
    
    // 检查内容长度
    const contentLength = response.headers.get('content-length');
    if (contentLength && parseInt(contentLength) > this.maxResponseSize) {
      throw new SecurityError('Response too large');
    }
  }
  
  private isAllowedContentType(contentType: string): boolean {
    const allowedTypes = [
      'text/html',
      'text/plain',
      'application/json',
      'application/xml',
      'text/xml',
      'text/markdown',
    ];
    
    return allowedTypes.some(type => contentType.includes(type));
  }
  
  private async processContent(response: Response): Promise<string> {
    const buffer = await response.arrayBuffer();
    
    // 检查实际大小
    if (buffer.byteLength > this.maxResponseSize) {
      throw new SecurityError('Response content too large');
    }
    
    let content = new TextDecoder().decode(buffer);
    
    // 截断过长内容
    if (content.length > 100000) {
      content = content.substring(0, 100000) + '\n... [truncated]';
    }
    
    // XSS 过滤
    content = this.sanitizeContent(content);
    
    return content;
  }
  
  private sanitizeContent(content: string): string {
    // 移除脚本标签
    content = content.replace(/<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>/gi, '');
    
    // 移除事件处理器
    content = content.replace(/\son\w+\s*=\s*["'][^"']*["']/gi, ' [removed]');
    
    // 移除 javascript: 协议
    content = content.replace(/javascript:/gi, '[removed]:');
    
    return content;
  }
  
  private sanitizeHeaders(headers: Headers): Record<string, string> {
    const result: Record<string, string> = {};
    
    // 只保留安全 Headers
    const allowedHeaders = [
      'content-type',
      'content-length',
      'last-modified',
      'etag',
      'cache-control',
    ];
    
    for (const [key, value] of headers.entries()) {
      if (allowedHeaders.includes(key.toLowerCase())) {
        result[key] = value;
      }
    }
    
    return result;
  }
}

interface SecureResponse {
  status: number;
  headers: Record<string, string>;
  content: string;
  url: string;
}

域名白名单

class DomainWhitelist {
  private whitelist: Set<string> = new Set();
  private blacklist: Set<string> = new Set();
  private patterns: RegExp[] = [];
  
  constructor(config: WhitelistConfig) {
    // 加载默认白名单
    this.loadDefaults();
    
    // 加载用户配置
    if (config.whitelist) {
      this.add(config.whitelist);
    }
    
    if (config.blacklist) {
      this.block(config.blacklist);
    }
  }
  
  private loadDefaults(): void {
    // 允许的公共网站
    const allowedDomains = [
      '*.github.com',
      '*.gitlab.com',
      '*.stackoverflow.com',
      '*.npmjs.com',
      '*.pypi.org',
      '*.wikipedia.org',
      '*.medium.com',
      '*.dev.to',
    ];
    
    this.add(allowedDomains);
  }
  
  add(domains: string | string[]): void {
    const domainList = Array.isArray(domains) ? domains : [domains];
    
    for (const domain of domainList) {
      if (domain.startsWith('*.')) {
        // 通配符模式
        const pattern = new RegExp(`^.*${domain.substring(2).replace(/\./g, '\\.')}$`);
        this.patterns.push(pattern);
      } else {
        this.whitelist.add(domain);
      }
    }
  }
  
  block(domains: string | string[]): void {
    const domainList = Array.isArray(domains) ? domains : [domains];
    domainList.forEach(d => this.blacklist.add(d));
  }
  
  isAllowed(hostname: string): boolean {
    // 黑名单优先
    if (this.blacklist.has(hostname)) {
      return false;
    }
    
    // 精确匹配
    if (this.whitelist.has(hostname)) {
      return true;
    }
    
    // 模式匹配
    for (const pattern of this.patterns) {
      if (pattern.test(hostname)) {
        return true;
      }
    }
    
    // 默认拒绝 (严格模式)
    return false;
  }
}

内容提取器

class ContentExtractor {
  // 从 HTML 提取主要内容
  extractMainContent(html: string): string {
    // 使用 readability 算法
    const doc = new DOMParser().parseFromString(html, 'text/html');
    
    // 移除无关元素
    const selectors = [
      'script',
      'style',
      'nav',
      'footer',
      'header',
      '.advertisement',
      '.sidebar',
    ];
    
    selectors.forEach(selector => {
      doc.querySelectorAll(selector).forEach(el => el.remove());
    });
    
    // 提取正文
    const article = doc.querySelector('article') || 
                    doc.querySelector('main') ||
                    doc.querySelector('.content') ||
                    doc.body;
    
    // 转换为 Markdown
    const turndown = new TurndownService();
    return turndown.turndown(article.innerHTML);
  }
  
  // 提取元数据
  extractMetadata(html: string): PageMetadata {
    const doc = new DOMParser().parseFromString(html, 'text/html');
    
    return {
      title: doc.querySelector('title')?.textContent || '',
      description: doc.querySelector('meta[name="description"]')?.getAttribute('content') || '',
      author: doc.querySelector('meta[name="author"]')?.getAttribute('content') || '',
      publishedDate: doc.querySelector('meta[property="article:published_time"]')?.getAttribute('content') || '',
      image: doc.querySelector('meta[property="og:image"]')?.getAttribute('content') || '',
    };
  }
  
  // 提取链接
  extractLinks(html: string): Link[] {
    const doc = new DOMParser().parseFromString(html, 'text/html');
    const links: Link[] = [];
    
    doc.querySelectorAll('a[href]').forEach(a => {
      const href = a.getAttribute('href');
      if (href) {
        links.push({
          text: a.textContent?.trim() || '',
          url: href,
          rel: a.getAttribute('rel') || '',
        });
      }
    });
    
    return links;
  }
}

interface PageMetadata {
  title: string;
  description: string;
  author: string;
  publishedDate: string;
  image: string;
}

interface Link {
  text: string;
  url: string;
  rel: string;
}

设计思想：为什么这样设计

思想 1：深度防御

问题： 单层防护容易被绕过。

解决： 多层防护。

第 1 层：URL 验证 → 拒绝明显恶意 URL
   ↓
第 2 层：IP 检查 → 拒绝私有 IP
   ↓
第 3 层：重定向保护 → 防止重定向攻击
   ↓
第 4 层：安全连接 → 自定义 DNS + Agent
   ↓
第 5 层：响应处理 → 过滤恶意内容

任何一层失败 → 请求被阻止

设计智慧：

纵深防御让攻击者必须突破所有防线。

思想 2：默认拒绝

问题： 白名单难维护，但黑名单容易被绕过。

解决： 严格模式默认拒绝。

// 宽松模式 (不推荐)
isAllowed(hostname): boolean {
  return !this.blacklist.has(hostname);  // 默认允许
}

// 严格模式 (推荐)
isAllowed(hostname): boolean {
  return this.whitelist.has(hostname);  // 默认拒绝
}

权衡：

模式	优点	缺点
宽松	用户体验好	安全风险高
严格	安全性高	可能误杀

推荐： 严格模式 + 用户可配置白名单

思想 3：重定向保护

问题： 攻击者用重定向绕过 URL 检查。

1
2
3

用户请求：https://example.com
↓ 重定向到
http://169.254.169.254/

解决： 每次重定向都验证。

async handleRedirect(url: URL, response: Response): Promise<Response> {
  const location = response.headers.get('location');
  const redirectUrl = new URL(location, url);
  
  // 重新验证重定向 URL
  const validation = this.validator.validate(redirectUrl.toString());
  if (!validation.valid) {
    throw new SecurityError('Redirect blocked');
  }
  
  // 重新检查 IP
  const ipCheck = await this.validator.checkIpAddress(redirectUrl);
  if (!ipCheck.allowed) {
    throw new SecurityError('Redirect to private IP blocked');
  }
  
  // 限制重定向次数
  if (redirectCount >= maxRedirects) {
    throw new SecurityError('Too many redirects');
  }
}

思想 4：DNS 重绑定防护

问题： DNS 重绑定攻击。

第一次 DNS 查询：evil.com → 8.8.8.8 (公网 IP)
   ↓ URL 验证通过
第二次 DNS 查询：evil.com → 192.168.1.1 (内网 IP)
   ↓ 连接到内网

解决： 自定义 DNS 解析 + IP 验证。

class SecureAgent extends Agent {
  createConnection(options: any) {
    // 在连接前再次验证 IP
    const ip = options.host;
    
    if (this.validator.isPrivateIP(ip)) {
      throw new Error(`Cannot connect to private IP: ${ip}`);
    }
    
    return super.createConnection(options);
  }
}

思想 5：内容安全

问题： 网页可能包含恶意内容。

解决： 内容过滤 + 格式转换。

processContent(html: string): string {
  // 1. 移除脚本
  html = html.replace(/<script.*<\/script>/gi, '');
  
  // 2. 移除事件处理器
  html = html.replace(/\son\w+\s*=/gi, ' [removed]=');
  
  // 3. 转换为 Markdown (去除 HTML)
  const markdown = turndown.turndown(html);
  
  // 4. 限制长度
  return markdown.substring(0, 100000);
}

解决方案：完整实现详解

WebFetchTool 实现

export class WebFetchTool extends Tool {
  name = 'web_fetch';
  description = '抓取网页内容 (带 SSRF 防护)';
  
  inputSchema = {
    type: 'object',
    properties: {
      url: {
        type: 'string',
        description: '要抓取的 URL',
      },
      extract: {
        type: 'string',
        enum: ['content', 'metadata', 'links', 'all'],
        description: '提取内容类型',
        default: 'content',
      },
      timeout: {
        type: 'number',
        description: '超时时间 (秒)',
        default: 30,
      },
    },
    required: ['url'],
  };
  
  private httpClient: SecureHttpClient;
  private extractor: ContentExtractor;
  
  constructor(config: WebFetchConfig) {
    super();
    this.httpClient = new SecureHttpClient(config);
    this.extractor = new ContentExtractor();
  }
  
  async execute(input: WebFetchInput, context: ToolContext): Promise<ToolResult> {
    try {
      // 1. 抓取网页
      const response = await this.httpClient.fetch(input.url, {
        timeout: input.timeout * 1000,
      });
      
      // 2. 提取内容
      let output = '';
      
      if (input.extract === 'all' || input.extract === 'metadata') {
        const metadata = this.extractor.extractMetadata(response.content);
        output += `## 元数据\n\n`;
        output += `- 标题：${metadata.title}\n`;
        output += `- 描述：${metadata.description}\n`;
        output += `- 作者：${metadata.author}\n\n`;
      }
      
      if (input.extract === 'all' || input.extract === 'content') {
        const content = this.extractor.extractMainContent(response.content);
        output += `## 内容\n\n${content}\n\n`;
      }
      
      if (input.extract === 'all' || input.extract === 'links') {
        const links = this.extractor.extractLinks(response.content);
        output += `## 链接 (共${links.length}个)\n\n`;
        for (const link of links.slice(0, 20)) {
          output += `- [${link.text}](${link.url})\n`;
        }
      }
      
      return {
        success: true,
        output,
        url: response.url,
        status: response.status,
      };
      
    } catch (error) {
      return {
        success: false,
        error: error.message,
        errorCode: error instanceof SecurityError ? 'security_error' : 'fetch_error',
      };
    }
  }
}

WebSearchTool 实现

export class WebSearchTool extends Tool {
  name = 'web_search';
  description = '网络搜索 (Brave Search API)';
  
  inputSchema = {
    type: 'object',
    properties: {
      query: {
        type: 'string',
        description: '搜索关键词',
      },
      count: {
        type: 'number',
        description: '返回结果数量',
        default: 10,
      },
      freshness: {
        type: 'string',
        enum: ['day', 'week', 'month', 'year'],
        description: '时间范围',
      },
    },
    required: ['query'],
  };
  
  private apiKey: string;
  
  constructor(config: WebSearchConfig) {
    super();
    this.apiKey = config.braveApiKey;
  }
  
  async execute(input: WebSearchInput, context: ToolContext): Promise<ToolResult> {
    try {
      // 调用 Brave Search API
      const response = await fetch(
        `https://api.search.brave.com/res/v1/web/search?q=${encodeURIComponent(input.query)}&count=${input.count}`,
        {
          headers: {
            'Accept': 'application/json',
            'X-Subscription-Token': this.apiKey,
          },
        }
      );
      
      const data = await response.json();
      
      // 格式化结果
      let output = `搜索结果：${input.query}\n\n`;
      
      for (const result of data.web.results) {
        output += `### ${result.title}\n\n`;
        output += `${result.description}\n\n`;
        output += `链接：${result.url}\n\n`;
      }
      
      return {
        success: true,
        output,
        totalResults: data.web.total,
      };
      
    } catch (error) {
      return {
        success: false,
        error: error.message,
      };
    }
  }
}

安全配置

# ~/.openclaw/config/web-tools.yaml

# SSRF 防护配置
ssrf_protection:
  enabled: true
  
  # 私有 IP 检查
  block_private_ips: true
  
  # 云元数据服务
  blocked_hosts:
    - 169.254.169.254
    - metadata.google.internal
    - 168.63.129.16
    - instance-data
  
  # 重定向保护
  max_redirects: 5
  
  # 超时设置
  timeout: 30000  # 30 秒

# 域名白名单
whitelist:
  enabled: false  # 设为 true 启用严格模式
  
  allowed_domains:
    - '*.github.com'
    - '*.gitlab.com'
    - '*.stackoverflow.com'
    - '*.npmjs.com'
    - '*.pypi.org'
  
  blocked_domains:
    - '*.evil.com'
    - '*.malware.com'

# 内容处理
content:
  # 最大响应大小
  max_response_size: 10485760  # 10MB
  
  # 最大内容长度
  max_content_length: 100000
  
  # XSS 过滤
  xss_filter: true
  
  # 转换为 Markdown
  convert_to_markdown: true

# API 配置
apis:
  brave_search:
    api_key: ${BRAVE_SEARCH_API_KEY}
    rate_limit: 1  # 每秒请求数

OpenClaw 最佳实践

实践 1：安全抓取网页

# 抓取网页内容
openclaw run web_fetch --url "https://example.com/article"

# 提取元数据
openclaw run web_fetch --url "https://example.com" --extract metadata

# 提取链接
openclaw run web_fetch --url "https://example.com" --extract links

# 提取全部
openclaw run web_fetch --url "https://example.com" --extract all

实践 2：网络搜索

# 搜索
openclaw run web_search --query "OpenClaw AI Agent"

# 限制时间范围
openclaw run web_search \
  --query "AI Agent" \
  --freshness week \
  --count 5

实践 3：配置白名单

# ~/.openclaw/config/web-whitelist.yaml

# 严格模式
whitelist:
  enabled: true
  
  # 允许的团队域名
  allowed_domains:
    - '*.company.com'
    - '*.team.com'
    - 'github.com'
    - 'stackoverflow.com'
  
  # 明确拒绝的域名
  blocked_domains:
    - '*.competitor.com'

实践 4：错误处理

# 尝试抓取内网地址 (会被阻止)
openclaw run web_fetch --url "http://192.168.1.1/admin"

# 输出：
❌ 安全错误：IP check failed: Private IP address not allowed

# 尝试抓取云元数据 (会被阻止)
openclaw run web_fetch --url "http://169.254.169.254/latest/meta-data/"

# 输出：
❌ 安全错误：Blocked host

实践 5：审计日志

# 查看 Web 请求历史
openclaw logs web_fetch --tail 50

# 输出：
[2026-04-03 20:30:00] web_fetch https://github.com → 200 OK
[2026-04-03 20:30:15] web_fetch http://192.168.1.1 → BLOCKED (Private IP)
[2026-04-03 20:30:30] web_search "AI Agent" → 200 OK (10 results)

总结

核心要点

深度防御 - 5 层防护机制
默认拒绝 - 严格模式白名单
重定向保护 - 每次重定向都验证
DNS 重绑定防护 - 自定义 DNS 解析
内容安全 - XSS 过滤 + Markdown 转换

设计智慧

Web 工具的安全设计核心是”不信任任何输入”。

Claude Code 的 Web 工具设计告诉我们：

SSRF 防护是 Web 工具的生命线
多层防御比单层更可靠
默认拒绝比黑名单更安全
内容过滤保护用户免受恶意内容

SSRF 防护检查清单

URL 协议验证 (只允许 http/https)
私有 IP 地址检查
云元数据服务黑名单
重定向保护
自定义 DNS 解析
响应内容过滤
超时和大小限制
审计日志

下一步

启用 SSRF 防护
配置域名白名单
添加审计日志
定期更新黑名单

系列文章：

[1] Bash 命令执行的安全艺术 (已发布)
[2] 差异编辑的设计艺术 (已发布)
[3] 文件搜索的底层原理 (已发布)
[4] 多 Agent 协作的架构设计 (已发布)
[5] 技能系统的设计哲学 (已发布)
[6] MCP 协议集成的完整指南 (已发布)
[7] 后台任务管理的完整方案 (已发布)
[8] Web 抓取的 SSRF 防护设计 (本文)
[9] 多层权限决策引擎设计 (待发布)
…

上一篇： Claude Code 源码解析 (7)：后台任务管理的完整方案

关于作者： John，OpenClaw 平台开发者，专注 AI 助手架构设计与实现。