🔑 Sensitive Data Exposure in AI-Generated Code
Sensitive Data Exposure is a critical vulnerability that occurs when secrets, credentials, or proprietary information are improperly handled in AI-generated code. AI tools, while speeding up development, often replicate insecure patterns from training data or provide naive code that exposes sensitive information.
Below we outline major vulnerabilities, provide insecure vs. secure AI-generated code examples, and show detection methods.
1. Hard-Coded Secrets / Credentials (CWE-798)
AI-generated code may hard-code API keys, passwords, or tokens directly in the source code. This exposes sensitive data and increases the risk of compromise if the code is shared, stored in repositories, or leaked.
AI Insecure Example:
const API_KEY = "12345-abcdef-67890";
fetch("https://api.example.com/data?key=" + API_KEY);
Safe Solution:
const API_KEY = process.env.API_KEY;
fetch(`https://api.example.com/data?key=${API_KEY}`);
Detection: Secret scanning, SAST, manual code review.
2. Insecure Storage or Transmission of Sensitive Data (CWE-200)
AI-generated code may save sensitive information in plaintext or transmit it over unencrypted channels. This can lead to exposure of passwords, personal data, or proprietary information.
AI Insecure Example (Python):
with open("passwords.txt", "w") as f:
f.write(user_password)
requests.post("http://example.com/login", data={"password": user_password})
Safe Solution:
import bcrypt, requests
hashed = bcrypt.hashpw(user_password.encode(), bcrypt.gensalt())
with open("passwords.txt", "wb") as f:
f.write(hashed)
requests.post("https://example.com/login", data={"password": hashed})
Detection: Encryption review, network traffic monitoring, SAST.
3. LLM Data Leakage (OWASP LLM06)
AI tools may inadvertently include sensitive project data in generated code or prompts. LLMs trained on internal repositories could generate code containing confidential snippets or credentials from training data.
AI Insecure Example:
# Generated function includes a real password from training data
def get_secret():
return "SuperSecret123!"
Safe Solution:
# Do not embed sensitive data
def get_secret():
return os.environ.get("SECRET_KEY")
Detection: Manual review, code scanning, secret detection tools.
4. Logging Sensitive Information
AI may generate debug statements that log passwords, API keys, or tokens, increasing exposure risk.
AI Insecure Example (Python):
print("User password:", user_password)
Safe Solution:
print("User logged in:", username)
# Avoid logging sensitive data
Detection: Secret scanning, log audits, SAST.
5. Credentials in Source Repositories
AI-generated code may reference files containing credentials or secrets that are stored in repositories, risking exposure if the repository is public or improperly secured.
AI Insecure Example (Node.js):
const secrets = require('./secrets.json'); // contains API keys
Safe Solution:
const secrets = process.env; // load from environment variables
Detection: Repository scanning, SAST, manual review.
6. Hard-Coded Database Passwords
AI may generate code with database connection strings including plaintext passwords, which exposes critical infrastructure credentials.
AI Insecure Example (Java):
Connection conn = DriverManager.getConnection(
"jdbc:mysql://localhost:3306/db", "root", "password123");
Safe Solution:
Connection conn = DriverManager.getConnection(
System.getenv("DB_URL"), System.getenv("DB_USER"), System.getenv("DB_PASS"));
Detection: Secret scanning, static analysis.
7. Insecure Data Transmission (CWE-319)
AI-generated code may send sensitive data over HTTP or unencrypted channels, exposing credentials and personal information.
AI Insecure Example (Python):
requests.post("http://example.com/login", data={"user": username, "pass": password})
Safe Solution (Python):
requests.post("https://example.com/login", data={"user": username, "pass": password})
Detection: Network monitoring, code review, SAST.
🔧 How Our Services Help
- SonarQube Setup Assistance: Detects hard-coded secrets, insecure storage/transmission, logging of sensitive data, repository leaks, database password exposure, and insecure transmission.
- Source Code Review: Expert review of AI-generated code for all sensitive data vulnerabilities.
- Software Composition Analysis: Detects vulnerable dependencies or misconfigured packages affecting sensitive data handling.
- Software Licence Analysis: Ensures compliance for third-party components in AI-generated projects.