CLI Guide
Complete guide to using the AYNE command-line interface.
Overview
The AYNE CLI provides a modern, user-friendly interface for all data collection, validation, and database management tasks. Built with Typer, it offers:
- ✅ Type-safe commands with automatic validation
- ✅ Beautiful output with Rich formatting
- ✅ Dry-run mode for safe previews
- ✅ Comprehensive help with
--helpon any command - ✅ Progress indicators for long-running tasks
Installation
After installing the package, the ayne command becomes available:
Command Structure
All commands follow this pattern:
Command groups:
db- Database managementtmdb- TMDB data collectionomdb- OMDB data enrichmentcollect- Combined workflowsvalidate- Data validation
Quick Start
# Initialize database
ayne db init
# Test database connectivity
ayne db test
# Discover movies from TMDB
ayne tmdb update --max-movies 1000
# Enrich with OMDB data
ayne omdb update --max-movies 500
# Run daily refresh
ayne collect daily
# Validate data quality
ayne validate all
Database Commands
Initialize Database
Create database schema and tables:
# Standard initialization
ayne db init
# Force re-initialization (drops existing tables)
ayne db init --force
# Preview what would be created
ayne db init --dry-run
Test Database
Run connectivity and CRUD tests:
Show Statistics
Display database statistics:
TMDB Commands
The TMDB commands are organized into three distinct operations:
update- Discover new movies (basic info only, fast)enrich- Fetch detailed data for discovered movies (complete info, slower)refresh- Update existing movies that need data refresh (age-based)
⚠️ Important: TMDB API Limits
- 500-page maximum: TMDB's discover endpoint has a hard limit of 500 pages (~10,000 movies max per query)
- Workaround: Use
--min-yearand--max-yearto query different time ranges - Two-stage process:
- Discovery (
tmdb update): Fast bulk collection (20 movies/request, basic info) - Enrichment (
tmdb enrich): Individual detail fetches (1 movie/request, complete data)
Discover Movies
Discover and store new movies from TMDB (basic info only):
# Full discovery (limited to ~10,000 movies by TMDB API)
ayne tmdb update --full
# Strategy: Use year ranges to work around 500-page limit
ayne tmdb update --min-year 2020 --max-year 2024 --max-movies 5000
ayne tmdb update --min-year 2015 --max-year 2019 --max-movies 5000
ayne tmdb update --min-year 2010 --max-year 2014 --max-movies 5000
# Limited discovery with filters
ayne tmdb update --max-movies 1000 --min-popularity 20 --min-votes 100
# Preview without changes
ayne tmdb update --max-movies 1000 --dry-run
Options:
--full- Unlimited discovery (automatic year-range splitting handles 500-page limit)--max-movies N- Limit to N movies--min-popularity N- Minimum popularity score (default: 10.0)--min-votes N- Minimum vote count (default: 50)--min-year YYYY- Minimum release year (default: 1950)--max-year YYYY- Maximum release year (default: current year)--max-pages N- Maximum API pages per year range--dry-run- Preview without making changes
Note: The system now automatically handles TMDB's 500-page limit by recursively splitting year ranges. You can safely specify wide ranges without manual adjustment.
Enrich Movies with Details
Fetch detailed TMDB data for movies that only have basic info:
# Enrich all discovered movies (up to limit)
ayne tmdb enrich --limit 500
# Enrich recent movies only
ayne tmdb enrich --min-year 2023 --limit 100
# Enrich specific year range
ayne tmdb enrich --min-year 2020 --max-year 2024 --limit 1000
# Preview what would be enriched
ayne tmdb enrich --limit 500 --dry-run
Options:
--limit N- Maximum number of movies to enrich (default: 100)--min-year YYYY- Only enrich movies from this year onwards--max-year YYYY- Only enrich movies up to this year--dry-run- Preview without making changes
What it does:
Fetches complete movie information including:
- Budget and revenue
- Genres and production companies
- Cast and crew (via TMDB)
- IMDb ID (required for OMDB enrichment)
- Ratings and vote counts
- Production countries and languages
Refresh TMDB Data
Update existing movies that are due for refresh based on age-based intervals:
What it does: Updates movies that already have TMDB details and are due for a refresh. This command only touches movies that need updating based on refresh intervals.
# Refresh recent releases (recommended for daily updates)
ayne tmdb refresh --min-year 2024 --limit 50
# Refresh any movies needing update
ayne tmdb refresh --limit 100
# Refresh specific year range
ayne tmdb refresh --min-year 2020 --max-year 2024 --limit 500
# Preview refresh candidates
ayne tmdb refresh --limit 100 --dry-run
Options:
--limit N- Maximum number of movies to refresh (default: 100)--min-year YYYY- Only refresh movies from this year onwards--max-year YYYY- Only refresh movies up to this year--dry-run- Preview without making changes
When to use:
- Daily updates:
--min-year 2024 --limit 50to keep recent releases current - Weekly updates:
--min-year 2023 --limit 200for last 1-2 years - Monthly updates:
--limit 1000for broader updates
Refresh intervals (automatic, based on movie age):
- Recent (0-60 days): Every 5 days
- Established (60-180 days): Every 15 days
- Mature (180-365 days): Every 30 days
- Archived (>365 days): Every 90-180 days
OMDB Commands
The OMDB commands mirror TMDB structure:
enrich- Add OMDB data to movies that don't have it yetrefresh- Update existing OMDB data
Enrich Movies with OMDB Data
Add OMDB data (IMDb ratings, box office, awards, etc.) to movies:
# Standard enrichment
ayne omdb enrich --max-movies 1000
# Enrich recent releases only
ayne omdb enrich --min-year 2023 --max-movies 500
# Enrich specific year range
ayne omdb enrich --min-year 2020 --max-year 2024 --max-movies 1000
# Preview what would be enriched
ayne omdb enrich --max-movies 500 --dry-run
Options:
--max-movies N- Maximum number of movies to enrich (uses config default)--min-year YYYY- Only enrich movies from this year onwards--max-year YYYY- Only enrich movies up to this year--dry-run- Preview without making changes
Requirements:
- Movies must have IMDb IDs (obtained from TMDB enrichment)
- Run
ayne tmdb enrichfirst if movies don't have IMDb IDs
Note: OMDB has daily API limits. Use --max-movies to control quota usage.
Refresh OMDB Data
Update existing OMDB data for movies:
# Refresh recent movies
ayne omdb refresh --min-year 2024 --limit 50
# Refresh any movies with OMDB data
ayne omdb refresh --limit 100
# Refresh specific year range
ayne omdb refresh --min-year 2020 --max-year 2024 --limit 200
# Preview refresh candidates
ayne omdb refresh --limit 100 --dry-run
Options:
--limit N- Maximum number of movies to refresh (default: 100)--min-year YYYY- Only refresh movies from this year onwards--max-year YYYY- Only refresh movies up to this year--dry-run- Preview without making changes
Recommended Workflows
Initial Data Population
Build your movie database from scratch:
# Step 1: Discover movies in chunks (works around 500-page limit)
ayne tmdb update --min-year 2020 --max-year 2024 --max-movies 5000
ayne tmdb update --min-year 2015 --max-year 2019 --max-movies 5000
ayne tmdb update --min-year 2010 --max-year 2014 --max-movies 5000
# Step 2: Enrich all discovered movies with details
ayne tmdb enrich --limit 5000
# Step 3: Add OMDB data (mind the API limits)
ayne omdb enrich --max-movies 1000
# Check progress
ayne db stats
Daily Updates (Recommended)
Keep recent movie data current with daily automation:
# Option 1: Manual daily run
ayne tmdb refresh --min-year 2024 --limit 50
ayne omdb refresh --min-year 2024 --limit 30
# Option 2: Using collect command
ayne collect daily --tmdb-refresh 50 --omdb-limit 30
# Schedule via cron (Linux/Mac) - runs at 2 AM daily
0 2 * * * cd /path/to/project && uv run ayne tmdb refresh --min-year 2024 --limit 50
# Schedule via Task Scheduler (Windows) - runs at 2 AM daily
# Program: pwsh.exe
# Arguments: -Command "cd D:\path\to\project; uv run ayne tmdb refresh --min-year 2024 --limit 50"
Weekly Updates
Broader update scope for weekly maintenance:
# Update last 1-2 years of movies
ayne tmdb refresh --min-year 2023 --limit 200
ayne omdb refresh --min-year 2023 --limit 150
# Or specific ranges
ayne tmdb refresh --min-year 2020 --max-year 2024 --limit 500
Monthly Updates
Comprehensive refresh for monthly maintenance:
# Update all movies needing refresh (no year filter)
ayne tmdb refresh --limit 1000
ayne omdb refresh --limit 500
# Or with collect command
ayne collect full --refresh-limit 1000 --max-omdb 500
Working Around TMDB 500-Page Limit
Strategy to collect large datasets:
# Collect by decade
ayne tmdb update --min-year 2020 --max-year 2029 --full
ayne tmdb update --min-year 2010 --max-year 2019 --full
ayne tmdb update --min-year 2000 --max-year 2009 --full
# Or by 5-year periods
ayne tmdb update --min-year 2020 --max-year 2024 --max-movies 5000
ayne tmdb update --min-year 2015 --max-year 2019 --max-movies 5000
# Then enrich all discoveries
ayne tmdb enrich --limit 10000
Targeted Updates
Update specific subsets:
# Recent popular movies only
ayne tmdb update --min-year 2023 --min-popularity 50 --max-movies 1000
ayne tmdb enrich --min-year 2023 --limit 1000
# High-quality movies (high vote counts)
ayne tmdb update --min-year 2020 --min-votes 1000 --max-movies 2000
# Specific year analysis
ayne tmdb update --min-year 2023 --max-year 2023 --max-movies 1000
ayne tmdb enrich --min-year 2023 --max-year 2023 --limit 1000
ayne omdb enrich --min-year 2023 --max-year 2023 --max-movies 1000
Collection Workflows
Daily Refresh
Recommended for scheduled daily runs:
# Standard daily refresh
ayne collect daily
# With discovery enabled
ayne collect daily --discover --discover-limit 500
# Custom limits
ayne collect daily --tmdb-refresh 200 --omdb-limit 500
# Include frozen movies (force refresh)
ayne collect daily --include-frozen
# Preview workflow
ayne collect daily --dry-run
Options:
--tmdb-refresh N- Number of TMDB records to refresh (default: 100)--omdb-limit N- Maximum OMDB enrichments (uses config default)--discover- Also discover new movies--discover-limit N- Limit for discovery--include-frozen- Include frozen movies in refresh (force refresh)--dry-run- Preview without changes
Use Case: Schedule this command to run daily via cron or Task Scheduler:
# Linux/Mac cron (runs daily at 2 AM)
0 2 * * * cd /path/to/project && uv run ayne collect daily
# Windows Task Scheduler
# Schedule: Daily at 2:00 AM
# Action: Start a program
# Program: pwsh.exe
# Arguments: -Command "cd D:\path\to\project; uv run ayne collect daily"
Full Collection
For initial data population or comprehensive updates:
# Full collection with limits
ayne collect full --max-tmdb 5000 --max-omdb 1000
# With filters
ayne collect full --min-year 2020 --min-popularity 20
# Unlimited TMDB discovery
ayne collect full --max-omdb 1000
# Include frozen movies in refresh
ayne collect full --include-frozen --refresh-limit 500
# Preview workflow
ayne collect full --dry-run
Options:
--max-tmdb N- Maximum TMDB discoveries (None = unlimited)--max-omdb N- Maximum OMDB enrichments--min-popularity N- Minimum popularity filter--min-year YYYY- Minimum release year--refresh-limit N- Maximum movies to refresh (default: 100)--include-frozen- Include frozen movies in refresh (force refresh)--dry-run- Preview without changes
Validation Commands
Validate TMDB Data
Check TMDB data quality:
Checks:
- Missing titles
- Missing release dates
- Invalid popularity scores
- Invalid vote counts
- Stale data (>90 days old)
Validate OMDB Data
Check OMDB/IMDb data quality:
Checks:
- Missing IMDb IDs
- Invalid IMDb ratings
- Invalid Metascores
- Unprocessed movies (have ID, no data)
- Stale data (>180 days old)
Validate All
Run comprehensive validation:
Configuration Commands
Show Version
Display version and environment info:
Show Configuration
Display all current settings:
Shows:
- Environment (development/staging/production)
- Log level and format
- Database path
- API endpoints and rate limits
- Filter settings
- Collection limits
Common Workflows
Initial Setup
# 1. Initialize database
ayne db init
# 2. Test connectivity
ayne db test
# 3. Run full collection
ayne collect full --max-tmdb 5000 --max-omdb 1000
# 4. Validate data
ayne validate all
Daily Maintenance
# Run daily workflow
ayne collect daily --discover --discover-limit 100
# Validate data quality
ayne validate all
Targeted Updates
# Update only recent movies
ayne tmdb update --min-year 2024 --max-movies 500
# Enrich high-popularity movies without OMDB data
ayne omdb update --max-movies 1000
# Validate specific dataset
ayne validate tmdb
Development/Testing
# Preview operations
ayne tmdb update --max-movies 100 --dry-run
ayne omdb update --max-movies 50 --dry-run
ayne collect daily --dry-run
# Small test collection
ayne collect full --max-tmdb 100 --max-omdb 50
# Verify database
ayne db test --verbose
ayne db stats
Configuration Override
All commands respect configuration hierarchy:
- Environment variables (highest priority)
- .env file
- YAML config (environment-specific)
- CLI arguments
- Code defaults (lowest priority)
Examples
# Override via environment variable
export TMDB_MIN_RELEASE_YEAR=2024
ayne tmdb update
# Override via .env file
# Edit .env and set: TMDB_MIN_POPULARITY=20
ayne tmdb update
# Override via CLI argument (highest priority for that run)
ayne tmdb update --min-year 2020 --min-popularity 15
Dry Run Best Practices
Always use --dry-run when:
- Testing new filter combinations
- Verifying quota usage before large operations
- Debugging unexpected behavior
- Learning command behavior
# Safe exploration
ayne tmdb update --full --dry-run
ayne collect daily --discover --dry-run
ayne db init --force --dry-run
Error Handling
The CLI provides clear error messages and appropriate exit codes:
- Exit 0: Success
- Exit 1: Error occurred
Common Issues
"Import 'typer' could not be resolved"
"Database not found"
"API rate limit exceeded"
"No movies discovered"
Advanced Usage
Combining Commands
# Chain operations (PowerShell)
ayne db init; ayne tmdb update --max-movies 1000; ayne omdb update --max-movies 500; ayne validate all
# Conditional execution (PowerShell)
ayne db test && ayne collect daily
Logging
Control log level via environment:
# Verbose logging
$env:LOG_LEVEL="DEBUG"
ayne collect daily
# Minimal logging
$env:LOG_LEVEL="WARNING"
ayne collect full
JSON Output
For programmatic processing:
Help System
Every command has built-in help:
# Top-level help
ayne --help
# Group help
ayne db --help
ayne tmdb --help
# Command help
ayne tmdb update --help
ayne collect daily --help
ayne validate all --help
Migration from Scripts
If migrating from old scripts:
| Old Script | New CLI Command |
|---|---|
python scripts/init_database.py |
ayne db init |
python scripts/test_database.py |
ayne db test |
python scripts/collect_optimized.py --discover |
ayne tmdb update |
python scripts/collect_optimized.py |
ayne collect daily |
Benefits of CLI:
- ✅ No need for
python scripts/prefix - ✅ Cleaner argument names
- ✅ Better error messages
- ✅ Colored output
- ✅ Progress indicators
- ✅ Dry-run support
- ✅ Comprehensive help
Next Steps
- Model Training:
ayne ml train(coming soon) - API Server:
ayne serve(coming soon) - Data Export:
ayne export(coming soon)