Formatting Guide¶

Code formatting standards using Ruff formatter for consistent, readable code.

Note: This project uses Ruff's formatter, which is 100% compatible with Black but significantly faster. Ruff has replaced Black in modern Python tooling.

Quick Start¶

# Format all code
uv run ruff format src/ scripts/ tests/

# Check formatting without changes
uv run ruff format --check src/ scripts/ tests/

# Format specific files
uv run ruff format src/speech_nlp/config/settings.py

Ruff Formatter¶

Ruff's formatter is compatible with Black but much faster.

Basic Usage¶

# Format source code
uv run ruff format src/

# Format scripts
uv run ruff format scripts/

# Format tests
uv run ruff format tests/

# Format everything
uv run ruff format src/ scripts/ tests/

# Check without modifying
uv run ruff format --check src/ scripts/ tests/

# Show what would change
uv run ruff format --diff src/

Configuration¶

Settings in pyproject.toml:

[tool.ruff]
line-length = 100
indent-width = 4
target-version = "py311"

Why Ruff Formatter?¶

Ruff's formatter is designed to be a drop-in replacement for Black:

100% Black-compatible output
10-100x faster than Black
Single tool for both linting and formatting
Actively maintained as part of the Ruff ecosystem

Formatting Rules¶

Line Length¶

Maximum 100 characters:

# Good
result = some_function(
    first_argument,
    second_argument,
    third_argument,
)

# Bad (too long)
result = some_function(first_argument, second_argument, third_argument, fourth_argument)

String Quotes¶

Use double quotes:

# Good
name = "John Doe"
message = "Hello, world!"

# Bad
name = 'John Doe'
message = 'Hello, world!'

Exception: Use single quotes to avoid escaping:

# Good
message = 'He said "Hello"'

# Acceptable but unnecessary
message = "He said \"Hello\""

Indentation¶

4 spaces (no tabs):

# Good
def example():
    if True:
        print("Indented with 4 spaces")

# Bad
def example():
  if True:
    print("Inconsistent indentation")

Trailing Commas¶

Use trailing commas in multi-line structures:

# Good
items = [
    "apple",
    "banana",
    "cherry",
]

# Also acceptable
items = ["apple", "banana", "cherry"]

# Bad (multi-line without trailing comma)
items = [
    "apple",
    "banana",
    "cherry"
]

Blank Lines¶

2 blank lines between top-level definitions
1 blank line between methods in a class
1 blank line between logical sections in functions

# Good
import os
import sys


class MyClass:
    """Example class."""

    def __init__(self):
        """Initialize."""
        self.value = 0

    def method_one(self):
        """First method."""
        pass

    def method_two(self):
        """Second method."""
        pass


def standalone_function():
    """Standalone function."""
    pass

Line Breaks¶

Break lines at logical boundaries:

# Good - break at logical groupings
result = my_function(
    first_group_arg1,
    first_group_arg2,
    second_group_arg1,
    second_group_arg2,
)

# Good - chain methods on separate lines
result = (
    df.filter(condition)
    .groupby("category")
    .agg({"value": "sum"})
    .reset_index()
)

Formatting Specific Constructs¶

Function Definitions¶

# Short - single line
def simple_function(x: int, y: int) -> int:
    return x + y


# Long - break at parameters
def complex_function(
    first_parameter: str,
    second_parameter: int,
    third_parameter: Optional[bool] = None,
) -> Dict[str, Any]:
    """Complex function with many parameters."""
    pass

Function Calls¶

# Short - single line
result = calculate(10, 20, 30)

# Long - vertical alignment
result = calculate_complex_value(
    base_value=100,
    multiplier=1.5,
    adjustment_factor=0.95,
    include_tax=True,
)

List/Dict Comprehensions¶

# Short - single line
squares = [x ** 2 for x in range(10)]

# Long - break for readability
squares = [
    calculate_complex_value(x)
    for x in data
    if x.is_valid()
]

# Dictionary comprehension
mapping = {
    key: transform(value)
    for key, value in items
    if is_valid(key)
}

Imports¶

# Good - grouped and sorted
import os
import sys
from pathlib import Path

import pandas as pd
import numpy as np
from pydantic import BaseModel

from speech_nlp.config.schema import Settings
from speech_nlp.utils.calendar import get_days_in_month

# Bad - mixed order
from speech_nlp.config.schema import Settings
import pandas as pd
import os
from pydantic import BaseModel

Use Ruff to auto-sort:

uv run ruff check --select I --fix src/

String Formatting¶

Prefer f-strings for readability:

# Good - f-strings
message = f"Processing {year}-{month:02d}"
path = f"data/output/{year}-{month}/results.csv"

# Acceptable - format()
message = "Processing {}-{:02d}".format(year, month)

# Avoid - % formatting
message = "Processing %d-%02d" % (year, month)

IDE Integration¶

VS Code¶

Install Ruff extension and configure:

.vscode/settings.json:

{
  "[python]": {
    "editor.defaultFormatter": "charliermarsh.ruff",
    "editor.formatOnSave": true,
    "editor.codeActionsOnSave": {
      "source.fixAll.ruff": "explicit",
      "source.organizeImports.ruff": "explicit"
    }
  },
  "ruff.format.args": ["--line-length", "88"],
  "editor.rulers": [88]
}

PyCharm¶

Install Ruff plugin from marketplace
Enable in Settings → Tools → Ruff
Configure format on save:
Settings → Tools → Actions on Save
Enable "Reformat code"

Neovim/Vim¶

Use null-ls or conform.nvim:

require("conform").setup({
  formatters_by_ft = {
    python = { "ruff_format" },
  },
  format_on_save = {
    timeout_ms = 500,
    lsp_fallback = true,
  },
})

Pre-commit Hooks¶

Automatically format code before commits.

Setup¶

Install pre-commit:

uv pip install pre-commit
uv run pre-commit install

Configuration in .pre-commit-config.yaml:

repos:
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.1.8
    hooks:
      # Run linter
      - id: ruff
        args: [--fix]

      # Run formatter
      - id: ruff-format

  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.5.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: check-yaml
      - id: check-added-large-files

Usage¶

# Run manually on all files
uv run pre-commit run --all-files

# Run on staged files (happens automatically on commit)
uv run pre-commit run

# Update hook versions
uv run pre-commit autoupdate

CI/CD Integration¶

Format checking in GitHub Actions:

- name: Run ruff format check
  run: |
    uv run ruff format --check src/ scripts/ tests/

This ensures code is properly formatted before merging.

Common Formatting Patterns¶

Long Function Signatures¶

# Break at each parameter
def process_television_data(
    year: int,
    month: int,
    input_directory: Path,
    output_directory: Path,
    weight_file: Optional[Path] = None,
    verbose: bool = False,
) -> pd.DataFrame:
    """Process television viewing data."""
    pass

Method Chaining¶

# Each method on new line
result = (
    df.filter(pl.col("age") > 18)
    .groupby("region")
    .agg([
        pl.col("duration").sum().alias("total_duration"),
        pl.col("pnr").n_unique().alias("unique_panelists"),
    ])
    .sort("total_duration", descending=True)
)

Complex Conditionals¶

# Break at logical operators
if (
    year >= 2020
    and month in range(1, 13)
    and has_required_files
    and not is_legacy_mode
):
    process_data()

# Or use variables for clarity
is_valid_period = year >= 2020 and month in range(1, 13)
has_requirements = has_required_files and not is_legacy_mode

if is_valid_period and has_requirements:
    process_data()

Type Hints¶

# Long type hints
from typing import Dict, List, Optional, Union

def complex_function(
    data: Dict[str, List[int]],
    options: Optional[Dict[str, Union[str, int]]] = None,
) -> List[Dict[str, Any]]:
    """Complex function with detailed types."""
    pass

# Use TypeAlias for complex types
from typing import TypeAlias

PersonData: TypeAlias = Dict[str, Union[str, int, List[str]]]

def process_person(data: PersonData) -> PersonData:
    """Process person data."""
    pass

Troubleshooting¶

Formatter Not Running¶

# Verify Ruff installed
uv run ruff --version

# Re-sync dependencies
uv sync

# Check for syntax errors first
uv run ruff check src/

Conflicts with Linter¶

Formatter and linter should work together. If conflicts occur:

# Run in correct order: lint then format
uv run ruff check --fix src/
uv run ruff format src/

IDE Not Formatting¶

VS Code:

Verify Ruff extension installed
Check default formatter: Ctrl+Shift+P → "Format Document With..."
Ensure .vscode/settings.json configured

PyCharm:

Reinstall Ruff plugin
Clear caches: File → Invalidate Caches
Verify enabled in Settings

Pre-commit Hook Fails¶

# Update hooks
uv run pre-commit autoupdate

# Clear cache
uv run pre-commit clean

# Re-install
uv run pre-commit uninstall
uv run pre-commit install

Checking Formatting¶

Before committing:

# Check if formatting needed
uv run ruff format --check src/ scripts/ tests/

# See what would change
uv run ruff format --diff src/ scripts/ tests/

# Apply formatting
uv run ruff format src/ scripts/ tests/

Exit codes:

0 - Already formatted correctly
1 - Would reformat files

Best Practices¶

Format early and often - Don't accumulate formatting issues
Use auto-format on save - IDE integration makes it effortless
Run pre-commit hooks - Catch issues before pushing
Don't fight the formatter - Accept opinionated defaults
Format before linting - Formatting fixes some lint issues
Keep configuration minimal - Use defaults when possible

Linting Guide - Code quality checks
Testing Guide - Test practices and coverage
Code Style - General style guidelines
CI/CD Local Testing - Running CI checks locally