Documentation Aggregation Technical Reference
Source code documentation for the documentation aggregation system.
Source Repository: gardenlinux/docs-ng > Source File: src/README.md
Source Code Structure
src/
├── aggregate.py # CLI entry point
├── migration_tracker.py # Standalone utility
└── aggregation/ # Core package
├── __init__.py
├── models.py # Data classes
├── config.py # Config I/O
├── fetcher.py # Git + local fetch
├── transformer.py # Content transforms
└── structure.py # Directory transformsModule Reference
aggregation/models.py
Data classes for type safety:
RepoConfig— Repository configuration data classAggregateResult— Fetch result with commit hash
aggregation/config.py
Configuration file handling:
load_config()— Parse repos-config.jsonsave_config()— Write updated config (commit locks)
aggregation/fetcher.py
Repository fetching:
DocsFetcher — Main fetcher class
Methods:
__init__(project_root, update_locks=False)— Initialize with optional commit lock updatingfetch()— Fetch repository and return result with commit hash_fetch_remote()— Git sparse checkout from remote repository_fetch_local()— Filesystem copy from local repository_copy_docs()— Static method to copy docs directory_copy_root_files()— Static method to copy root-level files (e.g., CONTRIBUTING.md)
aggregation/transformer.py
Content transformation:
rewrite_links()— Fix markdown links for cross-repository referencesquote_yaml_value()— YAML safety for frontmatter valuesensure_frontmatter()— Add or fix frontmatter in markdown filesparse_frontmatter()— Extract metadata from markdown frontmatterfix_broken_project_links()— Validate and fix links to project mirrors
aggregation/structure.py
Directory operations:
transform_directory_structure()— Restructure docs based on config mappingcopy_targeted_docs(source_dir, docs_dir, repo_name, media_dirs=None, root_files=None)— Copy files withgithub_target_pathfrontmatter to specified locations- Handles nested media dirs (e.g.,
tutorials/assets/) by copying to same relative path - Handles root-level media dirs (e.g.,
_static/) by copying to common ancestor of targeted files - Supports scanning root_files for targeted placement
- Handles nested media dirs (e.g.,
process_markdown_file()— Transform single markdown file (links, frontmatter)process_all_markdown()— Batch process all markdown files in directory
aggregate.py
CLI orchestration — Combines all modules into the complete aggregation workflow.
Usage Example
Basic programmatic usage:
from aggregation import load_config, DocsFetcher, process_all_markdown
# Load configuration
repos = load_config("repos-config.json")
# Initialize fetcher
fetcher = DocsFetcher(project_root)
# Fetch documentation
result = fetcher.fetch(repo, output_dir)
# Transform markdown files
process_all_markdown(target_dir, repo_name)Key Concepts
Targeted Documentation
Files with github_target_path in their frontmatter are automatically placed at that exact path:
---
github_target_path: "docs/tutorials/example.md"
---The copy_targeted_docs() function scans all markdown files and copies those with this frontmatter to their specified locations.
Link Rewriting
The rewrite_links() function transforms markdown links to work in the aggregated site:
- Relative links within the same repo are maintained
- Cross-repository links are rewritten to point to the correct locations
- Links to project mirrors are validated
Media Handling
Media directories specified in media_directories configuration are:
- Discovered recursively in the source repository
- Copied alongside their associated documentation
- Placed according to whether they're nested (same relative path) or root-level (common ancestor)
Commit Locking
When update_locks=True is passed to DocsFetcher.__init__(), the system:
- Fetches from the
ref(branch/tag) - Records the resolved commit hash
- Updates
repos-config.jsonwith the lock
This ensures reproducible builds.
Development
Running Tests
See Testing Reference for details on the test suite.
Adding New Transformation
To add a new transformation:
- Add function to
transformer.py - Call it from
process_markdown_file()orprocess_all_markdown() - Add tests in
tests/unit/test_transformer.py
Adding New Structure Type
To add a new structure mapping type:
- Update
transform_directory_structure()instructure.py - Add corresponding structure key handling
- Update configuration documentation
Architecture Decisions
Key architectural decisions are documented in the source repository:
- Sparse git checkout for efficiency
- Frontmatter-based targeting for flexibility
- Separate fetch/transform/structure stages for modularity
See Also
- Testing Reference — Test suite documentation
- Configuration Reference — Complete configuration field reference
- Architecture Explanation — How the system works