Split the 630-line README.md into focused, well-organized documentation: - README.md: Concise overview with quick start and links - docs/INSTALLATION.md: Installation instructions and setup - docs/CONFIGURATION.md: Configuration options and custom categories - docs/USAGE.md: Command-line options and usage examples - docs/HOW_IT_WORKS.md: Architecture and internal processes - docs/TROUBLESHOOTING.md: Common issues and solutions - docs/DEVELOPMENT.md: Project structure and development guide - docs/CONTRIBUTING.md: Contribution guidelines and standards Benefits: - Main README is now clean and welcoming (~150 lines vs 630) - Each doc has a clear, focused purpose - Better navigation with cross-linking between docs - Follows GitHub best practices with docs/ directory - Easier to maintain and update specific sections
12 KiB
How NoEntropy Works
This guide explains the internal architecture and processes that power NoEntropy's intelligent file organization.
Overview
NoEntropy uses a multi-stage pipeline that combines AI-powered categorization with intelligent caching and concurrent processing to efficiently organize your files.
Organization Process
NoEntropy follows a five-step process to organize your files:
┌─────────────────┐
│ 1. Scan Files │ → Read all files in DOWNLOAD_FOLDER
└────────┬────────┘ (and subdirs if --recursive flag is used)
▼
┌─────────────────────────┐
│ 2. Initial Categorization │ → Ask Gemini to categorize by filename
└────────┬────────────────┘
▼
┌──────────────────────┐
│ 3. Deep Inspection │ → Read text files for sub-categories
│ (Concurrent) │ • Reads file content
│ │ • Asks AI for sub-folder
└────────┬──────────────┘
▼
┌──────────────────────┐
│ 4. Preview & Confirm│ → Show organization plan
│ │ • Ask user approval
└────────┬──────────────┘
▼
┌──────────────────────┐
│ 5. Execute Moves │ → Move files to organized folders
└──────────────────────┘
Step 1: File Scanning
What happens:
- Scans the configured download folder
- Optionally scans subdirectories with
--recursiveflag - Collects file paths and metadata (size, modification time)
- Filters out directories and focuses on files only
Output: List of file paths ready for categorization
Step 2: Initial Categorization
What happens:
- Sends list of filenames to Gemini API
- AI analyzes filenames and determines appropriate categories
- Returns a categorization plan for all files
- Uses custom categories if configured, otherwise uses defaults
AI Prompt includes:
- List of all filenames
- Available categories (default or custom)
- Instructions to categorize based on file type and content
- Request for main category assignment
Output: Initial organization plan with main categories
Step 3: Deep Inspection
What happens:
- Identifies text-based files that can be read
- Concurrently reads file contents (up to
--max-concurrentfiles at once) - Sends content to Gemini AI for sub-folder suggestions
- AI analyzes content and suggests relevant sub-categories
- Applies intelligent retry logic with exponential backoff
Supported text file formats:
Source Code: rs, py, js, ts, jsx, tsx, java, go, c, cpp, h, hpp, rb, php, swift, kt, scala, lua, r, m
Web/Config: html, css, json, xml, yaml, yml, toml, ini, cfg, conf
Documentation: txt, md, sql, sh, bat, ps1, log
Why concurrent?
- Processes multiple files simultaneously
- Significantly reduces total processing time
- Configurable concurrency limit prevents API rate limiting
Output: Enhanced organization plan with sub-folders
Step 4: Preview & Confirmation
What happens:
- Displays complete organization plan to user
- Shows source file and destination path for each file
- Waits for user confirmation (y/n)
- Allows user to review before any changes are made
User options:
- Accept: Proceed with organization
- Decline: Cancel and exit without changes
Output: User decision (proceed or abort)
Step 5: Execute Moves
What happens:
- Creates destination directories as needed
- Moves files to their designated locations
- Records each move in the undo log
- Reports success/failure for each operation
- Displays final summary statistics
Safety features:
- Only moves files after user confirmation
- Tracks all operations for undo capability
- Handles errors gracefully without stopping entire process
- Creates parent directories automatically
Output: Organized files and execution summary
Caching System
NoEntropy includes an intelligent caching system to minimize API calls and improve performance.
Cache Design
- Location:
.noentropy_cache.jsonin project root - Format: JSON with file path as key
- Expiry: 7 days (automatically cleaned up)
- Max Entries: 1000 entries (LRU eviction)
- Change Detection: File size + modification time (not content hash)
How Caching Works
-
First Run:
- Files are analyzed via Gemini API
- Categorization results are cached with metadata
-
Cache Check (subsequent runs):
File found in cache? ├─ No → Analyze via API, cache result └─ Yes → File changed (size/time)? ├─ Yes → Re-analyze via API, update cache └─ No → Use cached categorization -
Cache Maintenance:
- Removes entries older than 7 days on every run
- Evicts oldest entries when limit (1000) is reached
- Validates file still exists before using cache
Cache Benefits
- Reduced API Costs: Avoids re-analyzing unchanged files
- Faster Processing: No API call needed for cached files
- Efficient: Metadata-based change detection (no content hashing)
- Automatic Cleanup: Self-maintaining with age and size limits
When Cache is Invalidated
Cache entries are invalidated when:
- File size changes
- File modification time changes
- Cache entry is older than 7 days
- File no longer exists
- Cache is manually deleted
Undo Log System
NoEntropy tracks all file moves to enable undo functionality.
Undo Log Design
- Location:
~/.config/noentropy/data/undo_log.json - Format: JSON array of move records
- Retention: 30 days (automatically cleaned up)
- Max Entries: 1000 entries (oldest evicted)
- Status Tracking: Completed, Undone, Failed states
Move Record Structure
Each file move is recorded with:
- Source path (original location)
- Destination path (new location)
- Timestamp of move
- Status (completed/undone/failed)
How Undo Works
-
During Organization:
For each file moved: ├─ Record source path ├─ Record destination path ├─ Record timestamp └─ Mark as "completed" -
Undo Execution:
Load undo log ├─ Filter "completed" moves (not already undone) ├─ Show preview to user ├─ Request confirmation └─ If confirmed: ├─ Check destination exists ├─ Check source doesn't exist (avoid conflicts) ├─ Move file back to source ├─ Mark as "undone" └─ Clean up empty directories -
Conflict Handling:
- Source exists: Skip restore (prevent overwrite)
- Destination missing: Skip restore (file was deleted)
- Permission error: Skip restore, report error
Undo Safety Features
- Preview Before Action: Always shows what will be undone
- Conflict Detection: Prevents data loss from overwrites
- Missing File Handling: Gracefully skips deleted files
- Partial Undo Support: Continues processing despite individual failures
- Empty Directory Cleanup: Removes empty folders after undo
- Dry-Run Mode: Preview undo without executing
Undo Limitations
- Only tracks moves made by NoEntropy
- Cannot track manual file operations
- Limited to 30-day history
- Cannot restore deleted files (only moves)
Supported File Categories
NoEntropy can organize files into these default categories:
| Category | File Types |
|---|---|
| Images | PNG, JPG, JPEG, GIF, SVG, BMP, WEBP, ICO, TIFF |
| Documents | PDF, DOC, DOCX, TXT, MD, RTF, ODT, PAGES |
| Installers | EXE, DMG, APP, PKG, DEB, RPM, MSI, APK |
| Music | MP3, WAV, FLAC, M4A, AAC, OGG, WMA |
| Videos | MP4, AVI, MKV, MOV, WMV, FLV, WEBM |
| Archives | ZIP, TAR, GZ, RAR, 7Z, BZ2, XZ |
| Code | Source code and configuration files |
| Misc | Everything else |
AI Integration
NoEntropy uses Google's Gemini API for intelligent categorization.
API Usage
- Model: Gemini 1.5 Flash (configurable)
- Concurrent Requests: 5 by default (configurable via
--max-concurrent) - Retry Logic: Exponential backoff for failed requests
- Rate Limiting: Respects API rate limits with configurable concurrency
Prompt Engineering
NoEntropy uses carefully crafted prompts to get accurate categorization:
-
Initial Categorization Prompt:
- Lists all filenames
- Specifies available categories
- Requests JSON response with categorization plan
-
Deep Inspection Prompt:
- Provides file content
- Requests sub-folder suggestion based on content
- Asks for semantic analysis, not just extension
Error Handling
- Network Errors: Retry with exponential backoff
- Rate Limiting: Respects limits, retries after delay
- Invalid Responses: Logs error, continues with other files
- Timeout: Configurable timeout with fallback behavior
Performance Characteristics
Factors Affecting Performance
-
Number of Files:
- 10-50 files: ~10-30 seconds
- 100-500 files: 1-3 minutes
- 1000+ files: 5-10 minutes
-
Concurrency Level:
- Higher = faster but more API load
- Lower = slower but safer for rate limits
- Default (5) balances speed and safety
-
Cache Hit Rate:
- High hit rate (>80%): Significantly faster
- Low hit rate (<20%): More API calls needed
- Regular usage improves hit rate over time
-
Text File Count:
- More text files = more deep inspection
- Deep inspection adds processing time
- Concurrent processing mitigates this
Optimization Strategies
- Use caching: Regular runs benefit from cached results
- Adjust concurrency: Increase for faster processing
- Dry-run first: Test configuration without full processing
- Organize regularly: Smaller batches process faster
Architecture Diagram
┌─────────────────────────────────────────────────────────┐
│ NoEntropy CLI │
│ (Orchestrator) │
└────────────┬──────────────────────────────┬─────────────┘
│ │
┌────────▼─────────┐ ┌───────▼────────┐
│ File Scanner │ │ Config Manager │
│ & Detector │ │ │
└────────┬─────────┘ └────────────────┘
│
┌────────▼──────────────────────────────────────┐
│ Gemini AI Client │
│ (with retry logic & concurrent processing) │
└────────┬──────────────────────────────────────┘
│
┌────────▼─────────┐ ┌────────────────┐
│ Cache System │ │ Undo Log │
└──────────────────┘ └────────────────┘
│
┌────────▼─────────┐
│ File Mover │
└──────────────────┘
Next Steps
- Usage Guide - Learn how to use NoEntropy
- Configuration Guide - Configure NoEntropy
- Development Guide - Contribute to NoEntropy