Common Python Bugs CodeHawk Catches — Async, Django, and Data Science Edition

Python's flexibility is both a blessing and a curse. Yes, you can do almost anything. No, Python won't stop you from doing it wrong.

CodeHawk was built to catch the bugs that Python developers ship accidentally — especially in async code, Django applications, and data processing pipelines where a single mistake can silently corrupt data or crash production.

Here are the five most common Python bugs CodeHawk catches in typical Flask, Django, FastAPI, and async Python codebases.

1. Missing Awaits in Async Functions

The #1 Python async mistake: you await some functions, but forget to await others. Now you have a fire-and-forget coroutine that silently fails.

# Vulnerable — missing await
async def create_user(email):
    # This schedules the save but doesn't wait for it
    db.save_user(email)  # Missing await
    return {"user": email}

# CodeHawk flags it: Coroutine object created but not awaited
# Safe: await the operation
async def create_user(email):
    await db.save_user(email)
    return {"user": email}

If the db.save_user() fails, you never know. Your API returns success while the user is never created. The error is logged somewhere in your async event loop (if at all) but never reaches your request handler.

This pattern is so common because Python's syntax doesn't require you to await — you can reference a coroutine without awaiting it, and Python happily creates a coroutine object you never use.

CodeHawk catches this: "Coroutine created but not awaited — did you mean to await this?"

2. SQL Injection via String Formatting

Django ORM prevents SQL injection if you use the ORM correctly. But developers still write raw SQL with string formatting.

# Vulnerable — string formatting in raw SQL
user_id = request.GET.get('user_id')
query = f"SELECT * FROM users WHERE id = {user_id}"
users = db.execute(query)

# CodeHawk flags it: String interpolation in SQL query
# Safe: Use parameterized queries
query = "SELECT * FROM users WHERE id = %s"
users = db.execute(query, [user_id])

The Django ORM documentation warns against this. Everyone knows this is bad. And yet, it still shows up in PRs — usually in legacy code where models.raw() was used and then later modified with user input.

CodeHawk checks for this specific pattern: variables being interpolated into query strings.

3. Unhandled Exceptions in Decorators and Middleware

Python decorators are powerful. They're also a place where exceptions hide silently.

# Vulnerable — decorator swallows exceptions
def timing_decorator(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start = time.time()
        result = func(*args, **kwargs)  # No try/except
        elapsed = time.time() - start
        logging.info(f"Function took {elapsed}s")
        return result
    return wrapper

# If func() raises an exception, it propagates (that's actually fine)
# But if you do this:
def timing_decorator_bad(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        try:
            start = time.time()
            result = func(*args, **kwargs)
            elapsed = time.time() - start
            logging.info(f"Function took {elapsed}s")
            return result
        except:  # Bare except — swallows everything
            logging.error("Function failed")
            return None  # Returns None instead of raising

# CodeHawk flags it: Bare except clause that swallows exceptions

Bare except clauses are the worst. They catch everything, including KeyboardInterrupt and SystemExit, and they hide errors. Code that calls this function thinks it failed gracefully and returns None, when really something catastrophic happened.

CodeHawk flags: "Bare except clause catches all exceptions including system-level ones."

4. Off-by-One Errors in Data Processing

Data pipelines are full of range operations, pagination, and batch processing. Off-by-one errors here don't cause crashes — they cause data loss or duplication.

# Vulnerable — off-by-one in batch processing
def process_batches(items, batch_size=100):
    for i in range(len(items)):
        if i % batch_size == 0 and i > 0:  # Wrong condition
            yield items[i - batch_size:i]  # Yields batches incorrectly

# For items [0..99, 100..199, 200..299]:
# Batch 1: items[0:100]  ← correct
# Batch 2: items[100:200]  ← correct
# But if len(items) = 250, the last batch is lost!

# Safe: Use explicit batching
def process_batches(items, batch_size=100):
    for i in range(0, len(items), batch_size):
        yield items[i:i + batch_size]

Off-by-one errors in data processing are insidious because they don't crash — they silently process the wrong subset of data. A data analysis pipeline processes 99% of rows. A migration script skips the last batch. A report shows 99.9% of your user base.

CodeHawk flags suspicious range operations and batch logic.

5. Mutable Default Arguments

Every Python developer learns "never use mutable default arguments." Then, months later, someone writes it anyway and creates a production bug.

# Vulnerable — mutable default argument
def add_user_to_cache(user, cache=[]):
    cache.append(user)
    return cache

# First call: add_user_to_cache({id: 1})  → returns [{id: 1}]
# Second call: add_user_to_cache({id: 2})  → returns [{id: 1}, {id: 2}]
# Why? The default list `[]` is created ONCE when the function is defined
# Every call shares the same list

# CodeHawk flags it: Mutable object as default argument
# Safe: Use None and create new list in function
def add_user_to_cache(user, cache=None):
    if cache is None:
        cache = []
    cache.append(user)
    return cache

This is a gotcha that catches experienced Python developers. The list [] is not created fresh on each call — it's created once when the function is defined, and all calls share the same list. So your cache grows forever and data bleeds across requests.

CodeHawk catches this pattern immediately.

Implementing Automated Review for Python Teams

Step 1: Get Access

Install CodeHawk at github.com/apps/codehawk-crossgen — it's a GitHub App that starts reviewing PRs immediately on your org — select your repos and it starts reviewing PRs automatically.

Step 2: Combine with Existing Tools

Python teams usually have:

CodeHawk complements these by catching semantic issues they miss:

Step 3: Monitor and Tune

After the first week of CodeHawk reviews on your PRs:

Use what you learn to calibrate how your team handles CodeHawk comments.

Real-World Example: Django View with Multiple Issues

# A typical Django view with several bugs CodeHawk would catch
from django.http import JsonResponse
from django.db import connection

@async_to_sync
async def fetch_user_data(request):
    # Issue 1: SQL injection via string formatting
    user_id = request.GET.get('user_id')
    query = f"SELECT * FROM users WHERE id = {user_id}"
    cursor = connection.cursor()
    cursor.execute(query)  # CodeHawk flags: SQL injection risk
    
    # Issue 2: No error handling
    user = cursor.fetchone()
    data = json.loads(user[4])  # Could fail if user is None or invalid JSON
    
    # Issue 3: Fire-and-forget async operation
    cache.set_user(user)  # Missing await (if async)
    
    return JsonResponse({"data": data})

CodeHawk would flag all three issues in the PR review:

  1. SQL injection on line 5 — use parameterized query
  2. Potential null dereference on line 10 — check if user exists first
  3. Unawaited coroutine on line 13 — add await if this is async

Next Steps

The best code review tool for your team is the one that catches the specific bugs you actually ship. CodeHawk is tuned to catch the pattern-based bugs that slip past linters and type checkers.

For Python teams, that means async bugs, injection patterns, unhandled exceptions, and data processing mistakes — the stuff that makes 3am production pages, but prevents linter yelling.