Phase 1: Core Validation¶

Duration: 2-3 days Goal: Prove the concept works and establish solid foundations Status: ✅ COMPLETED - Concept validated, architecture established

The Challenge¶

Build a minimal viable system that can: - Parse Python project dependencies from pyproject.toml files - Integrate with the MCP (Model Context Protocol) ecosystem - Provide a single, reliable tool for dependency scanning

Critical Questions to Answer: 1. Can we reliably parse diverse pyproject.toml structures? 2. Does MCP integration work smoothly for real AI assistants? 3. What architecture patterns will scale as we add complexity?

Technical Implementation¶

Foundation Architecture¶

From the very beginning, we established a layered architecture that would support future growth:

# Core Services Layer
src/autodoc_mcp/core/
├── dependency_parser.py    # PyProject.toml parsing logic
├── cache_manager.py       # Simple JSON file caching
└── error_formatter.py     # Structured error handling

# Infrastructure Layer
src/autodoc_mcp/
├── main.py               # FastMCP server entry point
├── config.py             # Configuration management
├── models.py             # Pydantic data models
└── exceptions.py         # Custom exception hierarchy

Why This Architecture Worked: - Clear boundaries: Each component had a single responsibility - Easy testing: Mock boundaries aligned with architectural boundaries - Evolutionary: New features could be added without refactoring existing code - Maintainable: Changes in one layer didn't ripple through others

The First MCP Tool: `scan_dependencies`¶

The initial tool was deceptively simple but included sophisticated error handling:

async def scan_dependencies(project_path: Optional[str] = None) -> dict:
    """
    Parse pyproject.toml and extract all dependencies with graceful error handling.

    Args:
        project_path: Path to project directory (defaults to current directory)

    Returns:
        ScanResult with dependencies, warnings, and parsing statistics
    """

Key Innovation: Graceful degradation from day one. Instead of failing on malformed files, the parser collected warnings and returned partial results.

# Example response showing graceful degradation
{
    "success": true,
    "dependencies": {
        "fastmcp": ">=0.1.0",
        "pydantic": "^2.0.0",
        "httpx": "*"
    },
    "warnings": [
        "Invalid version constraint 'invalid-version' for package 'some-pkg', skipped"
    ],
    "statistics": {
        "total_found": 15,
        "valid_parsed": 12,
        "invalid_skipped": 3
    }
}

Technical Decisions That Scaled¶

Decision 1: FastMCP Framework¶

Choice: Use FastMCP instead of building raw MCP integration Rationale: Focus on business logic, not protocol implementation Long-term Impact: Enabled rapid development of 7 additional tools without protocol complexity

# Clean, declarative tool definition
@mcp.tool()
async def scan_dependencies(project_path: Optional[str] = None) -> dict:
    """Parse project dependencies from pyproject.toml file."""
    # Implementation focuses on business logic only

Decision 2: Pydantic for Data Validation¶

Choice: Use Pydantic v2 for all data models and validation Rationale: Type safety, automatic validation, and excellent error messages Long-term Impact: Prevented entire classes of runtime errors and improved debugging

class ScanResult(BaseModel):
    """Results from dependency scanning operation."""
    success: bool
    dependencies: Dict[str, str] = Field(default_factory=dict)
    warnings: List[str] = Field(default_factory=list)
    errors: List[str] = Field(default_factory=list)
    statistics: Optional[ScanStatistics] = None

Decision 3: Comprehensive Error Context¶

Choice: Include recovery suggestions in all error responses Rationale: Users need actionable information, not just error messages Long-term Impact: Created consistent, helpful error experience across all 8 tools

# Error messages include context for recovery
{
    "error": "Failed to parse pyproject.toml",
    "details": "Invalid TOML syntax at line 23: Missing closing quote",
    "suggestions": [
        "Check line 23 in pyproject.toml for syntax errors",
        "Validate TOML syntax using an online validator",
        "Ensure all strings are properly quoted"
    ]
}

Quality Foundation¶

Testing Strategy from Day One¶

We established comprehensive testing patterns that supported rapid development:

# Pattern: Integration tests with real files
def test_scan_real_project():
    """Test with actual pyproject.toml file"""
    result = await scan_dependencies("./")
    assert result["success"] is True
    assert "fastmcp" in result["dependencies"]

# Pattern: Error condition testing
def test_scan_malformed_toml():
    """Test graceful handling of invalid TOML"""
    result = await scan_dependencies("./test/fixtures/invalid.toml")
    assert result["success"] is False
    assert "TOML syntax error" in result["errors"][0]
    assert len(result["suggestions"]) > 0

Coverage from Day One: 85% test coverage established in Phase 1, creating a quality foundation for future development.

CI/CD Pipeline¶

Complete automation established early:

# Key quality gates from Phase 1
- name: Run tests
  run: pytest --cov=src --cov-report=term-missing

- name: Type checking
  run: mypy src/

- name: Code formatting
  run: ruff check src/ tests/

- name: Security scanning
  run: bandit -r src/

Validation Results¶

✅ Parsing Reliability Validated¶

Tested against 20+ real Python projects with diverse dependency specifications: - pydantic: Complex version constraints with extras - django: Multiple dependency groups (main, dev, test) - fastapi: Modern pyproject.toml structure - requests: Simple, traditional structure

Result: 95%+ successful parsing rate with graceful degradation for edge cases.

✅ MCP Integration Validated¶

Integrated with multiple AI assistants: - Claude Code: stdio transport working perfectly - Cursor: MCP server configuration successful - Local testing: Direct FastMCP integration validated

Result: Smooth integration experience with clear setup instructions.

✅ Architecture Scalability Validated¶

Added second tool (get_basic_docs) to test architectural patterns: - New tool added in <1 hour - No changes required to existing code - Testing patterns reused successfully

Result: Architecture ready for expansion to 8 tools.

Lessons Learned¶

What Worked Exceptionally Well¶

Graceful Degradation Philosophy: Collecting warnings instead of failing fast made the tool resilient to real-world messiness.
Architecture-First Approach: Spending time on the layered architecture paid off immediately when adding the second tool.
Error Context Innovation: Including recovery suggestions in errors differentiated our UX from standard developer tools.
Quality Gates Early: Establishing 85% test coverage and CI/CD in Phase 1 prevented technical debt accumulation.

Challenges and Solutions¶

Challenge 1: TOML Parsing Edge Cases¶

Problem: Python's toml library doesn't handle all real-world edge cases gracefully Solution: Wrapped parsing in comprehensive try-catch with specific error messages

try:
    parsed_toml = toml.load(toml_path)
except toml.TomlDecodeError as e:
    return {
        "success": False,
        "errors": [f"TOML syntax error: {str(e)}"],
        "suggestions": [
            "Validate TOML syntax using an online validator",
            "Check for missing quotes or bracket mismatches"
        ]
    }

Challenge 2: Version Constraint Diversity¶

Problem: Python projects use inconsistent version constraint formats Solution: Built a flexible parser that handles multiple formats gracefully

# Flexible version constraint parsing
VALID_PATTERNS = [
    r"^[><=~!^]*[\d\.]+([\w\d\.-]*)?$",  # Standard semantic versions
    r"^\*$",                              # Wildcard
    r"^[><=~!^]*\d+$",                   # Major version only
]

Challenge 3: Configuration Management¶

Problem: Different environments need different settings Solution: Environment-aware configuration with validation

class AutoDocsConfig(BaseModel):
    cache_dir: Path = Field(default_factory=lambda: Path.home() / ".cache" / "autodoc-mcp")
    timeout_seconds: int = Field(default=30, ge=5, le=300)
    max_file_size_mb: int = Field(default=10, ge=1, le=100)

    @field_validator("cache_dir")
    @classmethod
    def validate_cache_dir(cls, v: Path) -> Path:
        v.mkdir(parents=True, exist_ok=True)
        return v

Impact on Subsequent Phases¶

Foundation for Phase 2¶

The dependency parsing capability became the input for documentation fetching. The structured error handling patterns were reused for network operations.

Foundation for Phase 3¶

The graceful degradation philosophy established in Phase 1 became the template for handling network failures and partial results in Phase 3.

Foundation for Phase 4¶

The configuration management and data model patterns scaled perfectly to handle the complexity of multi-dependency context fetching.

Key Metrics¶

Development Velocity¶

Day 1: Project setup, basic FastMCP integration
Day 2: Dependency parsing with error handling
Day 3: Comprehensive testing and CI/CD setup

Code Quality¶

Test Coverage: 85%
Type Coverage: 100% (MyPy strict mode)
Documentation: Complete API documentation for all public methods

Functionality¶

pyproject.toml Parsing: 95%+ success rate across diverse projects
MCP Integration: 100% compatibility with tested AI assistants
Error Handling: Comprehensive recovery suggestions for all failure modes

Looking Forward¶

Phase 1 established the quality and architectural foundations that enabled rapid, confident development in subsequent phases. The patterns established here - graceful degradation, comprehensive testing, and user-focused error messages - became the hallmarks of the entire system.

Next: Phase 2: Documentation Fetching - Building the core documentation engine.

This phase documentation is part of the AutoDocs MCP Server Development Journey.