Common Python Bugs CodeHawk Catches — Async, Django, and Data Science Edition
Python's flexibility is both a blessing and a curse. Yes, you can do almost anything. No, Python won't stop you from doing it wrong.
CodeHawk was built to catch the bugs that Python developers ship accidentally — especially in async code, Django applications, and data processing pipelines where a single mistake can silently corrupt data or crash production.
Here are the five most common Python bugs CodeHawk catches in typical Flask, Django, FastAPI, and async Python codebases.
1. Missing Awaits in Async Functions
The #1 Python async mistake: you await some functions, but forget to await others. Now you have a fire-and-forget coroutine that silently fails.
# Vulnerable — missing await
async def create_user(email):
# This schedules the save but doesn't wait for it
db.save_user(email) # Missing await
return {"user": email}
# CodeHawk flags it: Coroutine object created but not awaited
# Safe: await the operation
async def create_user(email):
await db.save_user(email)
return {"user": email}
If the db.save_user() fails, you never know. Your API returns success while the user is never created. The error is logged somewhere in your async event loop (if at all) but never reaches your request handler.
This pattern is so common because Python's syntax doesn't require you to await — you can reference a coroutine without awaiting it, and Python happily creates a coroutine object you never use.
CodeHawk catches this: "Coroutine created but not awaited — did you mean to await this?"
2. SQL Injection via String Formatting
Django ORM prevents SQL injection if you use the ORM correctly. But developers still write raw SQL with string formatting.
# Vulnerable — string formatting in raw SQL
user_id = request.GET.get('user_id')
query = f"SELECT * FROM users WHERE id = {user_id}"
users = db.execute(query)
# CodeHawk flags it: String interpolation in SQL query
# Safe: Use parameterized queries
query = "SELECT * FROM users WHERE id = %s"
users = db.execute(query, [user_id])
The Django ORM documentation warns against this. Everyone knows this is bad. And yet, it still shows up in PRs — usually in legacy code where models.raw() was used and then later modified with user input.
CodeHawk checks for this specific pattern: variables being interpolated into query strings.
3. Unhandled Exceptions in Decorators and Middleware
Python decorators are powerful. They're also a place where exceptions hide silently.
# Vulnerable — decorator swallows exceptions
def timing_decorator(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs) # No try/except
elapsed = time.time() - start
logging.info(f"Function took {elapsed}s")
return result
return wrapper
# If func() raises an exception, it propagates (that's actually fine)
# But if you do this:
def timing_decorator_bad(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
try:
start = time.time()
result = func(*args, **kwargs)
elapsed = time.time() - start
logging.info(f"Function took {elapsed}s")
return result
except: # Bare except — swallows everything
logging.error("Function failed")
return None # Returns None instead of raising
# CodeHawk flags it: Bare except clause that swallows exceptions
Bare except clauses are the worst. They catch everything, including KeyboardInterrupt and SystemExit, and they hide errors. Code that calls this function thinks it failed gracefully and returns None, when really something catastrophic happened.
CodeHawk flags: "Bare except clause catches all exceptions including system-level ones."
4. Off-by-One Errors in Data Processing
Data pipelines are full of range operations, pagination, and batch processing. Off-by-one errors here don't cause crashes — they cause data loss or duplication.
# Vulnerable — off-by-one in batch processing
def process_batches(items, batch_size=100):
for i in range(len(items)):
if i % batch_size == 0 and i > 0: # Wrong condition
yield items[i - batch_size:i] # Yields batches incorrectly
# For items [0..99, 100..199, 200..299]:
# Batch 1: items[0:100] ← correct
# Batch 2: items[100:200] ← correct
# But if len(items) = 250, the last batch is lost!
# Safe: Use explicit batching
def process_batches(items, batch_size=100):
for i in range(0, len(items), batch_size):
yield items[i:i + batch_size]
Off-by-one errors in data processing are insidious because they don't crash — they silently process the wrong subset of data. A data analysis pipeline processes 99% of rows. A migration script skips the last batch. A report shows 99.9% of your user base.
CodeHawk flags suspicious range operations and batch logic.
5. Mutable Default Arguments
Every Python developer learns "never use mutable default arguments." Then, months later, someone writes it anyway and creates a production bug.
# Vulnerable — mutable default argument
def add_user_to_cache(user, cache=[]):
cache.append(user)
return cache
# First call: add_user_to_cache({id: 1}) → returns [{id: 1}]
# Second call: add_user_to_cache({id: 2}) → returns [{id: 1}, {id: 2}]
# Why? The default list `[]` is created ONCE when the function is defined
# Every call shares the same list
# CodeHawk flags it: Mutable object as default argument
# Safe: Use None and create new list in function
def add_user_to_cache(user, cache=None):
if cache is None:
cache = []
cache.append(user)
return cache
This is a gotcha that catches experienced Python developers. The list [] is not created fresh on each call — it's created once when the function is defined, and all calls share the same list. So your cache grows forever and data bleeds across requests.
CodeHawk catches this pattern immediately.
Implementing Automated Review for Python Teams
Step 1: Get Access
Install CodeHawk at github.com/apps/codehawk-crossgen — it's a GitHub App that starts reviewing PRs immediately on your org — select your repos and it starts reviewing PRs automatically.
Step 2: Combine with Existing Tools
Python teams usually have:
- Black for formatting
- Pylint or Flake8 for linting
- mypy for type checking
- Bandit for security scanning
CodeHawk complements these by catching semantic issues they miss:
- Async/await bugs (mypy doesn't catch missed awaits)
- Injection patterns requiring context (Bandit is good, CodeHawk adds Claude-level analysis)
- Business logic errors (none of the above catch these)
Step 3: Monitor and Tune
After the first week of CodeHawk reviews on your PRs:
- Are the flags accurate?
- Which flag types appear most often?
- Are there bug categories you'd want it to focus on more?
Use what you learn to calibrate how your team handles CodeHawk comments.
Real-World Example: Django View with Multiple Issues
# A typical Django view with several bugs CodeHawk would catch
from django.http import JsonResponse
from django.db import connection
@async_to_sync
async def fetch_user_data(request):
# Issue 1: SQL injection via string formatting
user_id = request.GET.get('user_id')
query = f"SELECT * FROM users WHERE id = {user_id}"
cursor = connection.cursor()
cursor.execute(query) # CodeHawk flags: SQL injection risk
# Issue 2: No error handling
user = cursor.fetchone()
data = json.loads(user[4]) # Could fail if user is None or invalid JSON
# Issue 3: Fire-and-forget async operation
cache.set_user(user) # Missing await (if async)
return JsonResponse({"data": data})
CodeHawk would flag all three issues in the PR review:
- SQL injection on line 5 — use parameterized query
- Potential null dereference on line 10 — check if user exists first
- Unawaited coroutine on line 13 — add await if this is async
Next Steps
- Install CodeHawk at github.com/apps/codehawk-crossgen
- Open a PR with Python code changes once access is granted
- Review the comments CodeHawk posts
- Evaluate usefulness — if it catches bugs your team cares about, keep it
The best code review tool for your team is the one that catches the specific bugs you actually ship. CodeHawk is tuned to catch the pattern-based bugs that slip past linters and type checkers.
For Python teams, that means async bugs, injection patterns, unhandled exceptions, and data processing mistakes — the stuff that makes 3am production pages, but prevents linter yelling.