Các lỗi bảo mật phổ biến khi dùng AI code assistant và cách phát hiện tự động – ITFROMZERO

Table of Contents

Tại sao AI code assistant lại âm thầm đưa lỗ hổng vào codebase của bạn?

Server của mình từng bị brute-force SSH và phải xử lý gấp lúc nửa đêm — từ đó mình luôn setup bảo mật ngay từ đầu, review code kỹ hơn. Nhưng kể từ khi dùng AI code assistant, mình nhận ra mình đang review code ít hơn, không phải vì lười mà vì AI generate nhanh quá, cái gì cũng trông “có vẻ đúng”.

Vấn đề không phải AI dở. Model học từ hàng triệu repo GitHub — và phần lớn là code demo, tutorial, prototype viết vội, chưa qua security review bao giờ. AI chọn pattern “hay gặp nhất” theo context, không phải “an toàn nhất”. Hai thứ này thường không trùng nhau.

Ba yếu tố cộng hưởng tạo ra rủi ro:

Training data chứa code không an toàn: Repo public trên GitHub đầy code viết vội cho demo, tutorial, prototype — không có security review
AI không biết threat model của bạn: Nó không biết endpoint nào exposed ra internet, user nào trust, data nào nhạy cảm
Tốc độ tạo code cao = review giảm: Khi AI điền cả hàm 50 dòng, developer hay lướt qua thay vì đọc từng dòng như code tự viết

5 lỗi bảo mật hay gặp nhất từ AI-generated code

1. Hardcoded credentials và API keys

Lỗi số 1, và cũng thường xuyên nhất. Scenario điển hình: AI generate code mẫu với AWS key placeholder, bạn test thấy chạy được rồi commit — quên đổi. Hoặc tệ hơn, AI lấy pattern từ code thật trong training data rồi hardcode credential vào thẳng. GitHub Secret Scanning có thể quét và alert, nhưng lúc đó key đã nằm trong git history rồi — xóa file không đủ.

# Code AI generate — KHÔNG an toàn
import boto3

s3 = boto3.client(
    's3',
    aws_access_key_id='AKIAIOSFODNN7EXAMPLE',
    aws_secret_access_key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
    region_name='ap-northeast-1'
)

# Cách đúng — dùng environment variable
import os

s3 = boto3.client(
    's3',
    aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],
    aws_secret_access_key=os.environ['AWS_SECRET_ACCESS_KEY'],
    region_name=os.environ.get('AWS_REGION', 'ap-northeast-1')
)

2. SQL Injection

Cứ prompt AI “viết hàm tìm user theo username”, phần lớn response đầu tiên sẽ dùng f-string nối thẳng vào SQL. Đặc biệt với các hàm filter, search, report — chỗ nào nhận input từ ngoài là chỗ có nguy cơ.

# NGUY HIỂM — SQL injection
def get_user(username):
    query = f"SELECT * FROM users WHERE username = '{username}'"
    cursor.execute(query)
    return cursor.fetchone()

# An toàn — parameterized query
def get_user(username):
    query = "SELECT * FROM users WHERE username = %s"
    cursor.execute(query, (username,))
    return cursor.fetchone()

3. Command Injection

Khi bạn yêu cầu AI viết code “chạy lệnh shell” hay “ping địa chỉ IP”, AI rất hay dùng os.system() với string interpolation — một trong những lỗ hổng nguy hiểm nhất. Attacker chỉ cần truyền vào 8.8.8.8; rm -rf / là đủ để xóa sạch server.

# NGUY HIỂM — command injection
import os

def ping_host(ip):
    os.system(f"ping -c 4 {ip}")

# An toàn — dùng list arguments, tuyệt đối không dùng shell=True
import subprocess

def ping_host(ip):
    result = subprocess.run(
        ["ping", "-c", "4", ip],
        capture_output=True,
        text=True,
        timeout=10
    )
    return result.stdout

4. Path Traversal khi xử lý file upload

File upload là use case phổ biến — AI generate flow cơ bản khá tốt. Phần hay bị bỏ sót là sanitize đường dẫn. Truyền vào ../../etc/passwd là đọc được file ngoài thư mục cho phép, không cần exploit phức tạp gì thêm.

# NGUY HIỂM — path traversal
UPLOAD_DIR = "/var/www/uploads"

def get_file(filename):
    path = os.path.join(UPLOAD_DIR, filename)
    with open(path, 'rb') as f:
        return f.read()

# An toàn — resolve thực rồi kiểm tra prefix
import os

UPLOAD_DIR = "/var/www/uploads"

def get_file(filename):
    real_path = os.path.realpath(os.path.join(UPLOAD_DIR, filename))
    if not real_path.startswith(os.path.realpath(UPLOAD_DIR) + os.sep):
        raise ValueError("Path traversal detected")
    with open(real_path, 'rb') as f:
        return f.read()

5. Dùng thuật toán crypto yếu hoặc hàm nguy hiểm

MD5 bị crack trong vài giây trên GPU thông thường — vậy mà AI vẫn suggest MD5 cho password vì đó là pattern xuất hiện dày đặc trong code cũ. SHA1 không khá hơn bao nhiêu. pickle và eval() còn nguy hiểm hơn: deserialize pickle từ user input là arbitrary code execution, không cần điều kiện gì thêm.

# NGUY HIỂM — MD5 hash password
import hashlib
hashed = hashlib.md5(password.encode()).hexdigest()

# An toàn — bcrypt với salt
import bcrypt
hashed = bcrypt.hashpw(password.encode(), bcrypt.gensalt())

# NGUY HIỂM — pickle với data không tin tưởng (arbitrary code execution!)
import pickle
data = pickle.loads(user_input)

# An toàn — JSON
import json
data = json.loads(user_input)

Cài đặt công cụ quét bảo mật tự động

Review bằng mắt không đủ — đặc biệt khi AI generate code nhanh đến mức bạn không kịp đọc kỹ. Cần setup công cụ quét tự động ngay trong development workflow.

Cài đặt Bandit — Python security scanner

pip install bandit

# Quét toàn bộ project
bandit -r ./src

# Chỉ hiển thị lỗi high severity
bandit -r ./src -ll -ii

Cài đặt Semgrep — quét đa ngôn ngữ

pip install semgrep

# Chạy với ruleset bảo mật mặc định
semgrep --config=p/security-audit ./src

# Chạy với ruleset OWASP Top 10
semgrep --config=p/owasp-top-ten ./src

Cài đặt TruffleHog — phát hiện secret bị leak

# Quét toàn bộ Git history để tìm secret đã commit
docker run --rm -v "$PWD:/pwd" trufflesecurity/trufflehog:latest \
  git file:///pwd --only-verified

# Hoặc cài thẳng
pip install trufflehog
trufflehog git file://. --only-verified

Cấu hình pre-commit hooks — chặn lỗi trước khi commit

Thay vì chờ CI/CD bắt lỗi, setup pre-commit hooks để tự động quét mỗi lần commit. Phát hiện vấn đề là block commit ngay — code lỗi không có đường lên remote.

pip install pre-commit

Tạo file .pre-commit-config.yaml ở root project:

repos:
  - repo: https://github.com/Yelp/detect-secrets
    rev: v1.4.0
    hooks:
      - id: detect-secrets
        args: ['--baseline', '.secrets.baseline']

  - repo: https://github.com/PyCQA/bandit
    rev: 1.7.5
    hooks:
      - id: bandit
        args: ["-ll", "-ii"]
        files: .py$

  - repo: https://github.com/returntocorp/semgrep
    rev: v1.45.0
    hooks:
      - id: semgrep
        args: ['--config=p/security-audit', '--error']

# Kích hoạt hooks
pre-commit install

# Chạy thử trên toàn bộ file hiện tại
pre-commit run --all-files

Thêm Semgrep rule tùy chỉnh cho project

Nếu project có pattern đặc thù, tạo rule riêng để bắt đúng trường hợp của bạn:

# .semgrep/custom-rules.yaml
rules:
  - id: no-string-format-sql
    patterns:
      - pattern: |
          $CURSOR.execute(f"...{$VAR}...")
      - pattern: |
          $CURSOR.execute("..." + $VAR + "...")
    message: "SQL query dùng string interpolation — nguy cơ SQL injection"
    languages: [python]
    severity: ERROR

Kiểm tra kết quả và tích hợp CI/CD

Chạy full security scan thủ công

# Chạy cả 3 tool và lưu report
bandit -r ./src -f json -o bandit-report.json
semgrep --config=p/security-audit ./src --json > semgrep-report.json
trufflehog git file://. --only-verified --json > secrets-report.json

# Xem nhanh số lỗi Bandit
python3 -c "
import json
with open('bandit-report.json') as f:
    d = json.load(f)
totals = d['metrics']['_totals']
print(f'HIGH: {totals[\"SEVERITY.HIGH\"]}, MEDIUM: {totals[\"SEVERITY.MEDIUM\"]}')
"

Tích hợp vào GitHub Actions

# .github/workflows/security-scan.yml
name: Security Scan

on: [push, pull_request]

jobs:
  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # TruffleHog cần full history

      - name: Run Bandit
        run: pip install bandit && bandit -r ./src -ll -ii

      - name: Run Semgrep
        run: pip install semgrep && semgrep --config=p/security-audit ./src --error

      - name: Run TruffleHog
        uses: trufflesecurity/trufflehog@main
        with:
          path: ./
          base: main
          head: HEAD
          extra_args: --only-verified

Thói quen review AI code đúng cách

Công cụ tự động bắt được lỗi đã biết — nhưng không phải tất cả. Một số thói quen nên có thêm:

Hỏi AI về security sau khi nhận code: Thêm prompt “What are the security risks of this code?” — AI thường chỉ ra được điểm yếu nếu được hỏi trực tiếp, còn không nó không tự chủ động báo
Đặc biệt chú ý điểm nhận input từ user: Form fields, URL params, file uploads, API payloads — đây là các entry point cần đọc kỹ nhất
Kiểm tra version dependencies AI suggest: AI hay recommend version cũ có known CVE, luôn check bằng pip audit hoặc npm audit
Không commit bất kỳ secret nào dù chỉ tạm thời: Git history tồn tại mãi mãi — dùng .env và .gitignore ngay từ đầu, không có ngoại lệ

AI code assistant tăng tốc development thật — nhưng nếu không có guard, nó cũng tăng tốc sản sinh lỗ hổng. Setup những tool trên một lần, để pipeline tự chạy mãi mãi. Mỗi commit sạch từ đầu tốt hơn nhiều so với incident lúc 2 giờ sáng.