docs: restructure documentation into organized files
Split the 630-line README.md into focused, well-organized documentation: - README.md: Concise overview with quick start and links - docs/INSTALLATION.md: Installation instructions and setup - docs/CONFIGURATION.md: Configuration options and custom categories - docs/USAGE.md: Command-line options and usage examples - docs/HOW_IT_WORKS.md: Architecture and internal processes - docs/TROUBLESHOOTING.md: Common issues and solutions - docs/DEVELOPMENT.md: Project structure and development guide - docs/CONTRIBUTING.md: Contribution guidelines and standards Benefits: - Main README is now clean and welcoming (~150 lines vs 630) - Each doc has a clear, focused purpose - Better navigation with cross-linking between docs - Follows GitHub best practices with docs/ directory - Easier to maintain and update specific sections
This commit is contained in:
345
docs/HOW_IT_WORKS.md
Normal file
345
docs/HOW_IT_WORKS.md
Normal file
@@ -0,0 +1,345 @@
|
||||
# How NoEntropy Works
|
||||
|
||||
This guide explains the internal architecture and processes that power NoEntropy's intelligent file organization.
|
||||
|
||||
## Overview
|
||||
|
||||
NoEntropy uses a multi-stage pipeline that combines AI-powered categorization with intelligent caching and concurrent processing to efficiently organize your files.
|
||||
|
||||
## Organization Process
|
||||
|
||||
NoEntropy follows a five-step process to organize your files:
|
||||
|
||||
```
|
||||
┌─────────────────┐
|
||||
│ 1. Scan Files │ → Read all files in DOWNLOAD_FOLDER
|
||||
└────────┬────────┘ (and subdirs if --recursive flag is used)
|
||||
▼
|
||||
┌─────────────────────────┐
|
||||
│ 2. Initial Categorization │ → Ask Gemini to categorize by filename
|
||||
└────────┬────────────────┘
|
||||
▼
|
||||
┌──────────────────────┐
|
||||
│ 3. Deep Inspection │ → Read text files for sub-categories
|
||||
│ (Concurrent) │ • Reads file content
|
||||
│ │ • Asks AI for sub-folder
|
||||
└────────┬──────────────┘
|
||||
▼
|
||||
┌──────────────────────┐
|
||||
│ 4. Preview & Confirm│ → Show organization plan
|
||||
│ │ • Ask user approval
|
||||
└────────┬──────────────┘
|
||||
▼
|
||||
┌──────────────────────┐
|
||||
│ 5. Execute Moves │ → Move files to organized folders
|
||||
└──────────────────────┘
|
||||
```
|
||||
|
||||
### Step 1: File Scanning
|
||||
|
||||
**What happens:**
|
||||
- Scans the configured download folder
|
||||
- Optionally scans subdirectories with `--recursive` flag
|
||||
- Collects file paths and metadata (size, modification time)
|
||||
- Filters out directories and focuses on files only
|
||||
|
||||
**Output:** List of file paths ready for categorization
|
||||
|
||||
### Step 2: Initial Categorization
|
||||
|
||||
**What happens:**
|
||||
- Sends list of filenames to Gemini API
|
||||
- AI analyzes filenames and determines appropriate categories
|
||||
- Returns a categorization plan for all files
|
||||
- Uses custom categories if configured, otherwise uses defaults
|
||||
|
||||
**AI Prompt includes:**
|
||||
- List of all filenames
|
||||
- Available categories (default or custom)
|
||||
- Instructions to categorize based on file type and content
|
||||
- Request for main category assignment
|
||||
|
||||
**Output:** Initial organization plan with main categories
|
||||
|
||||
### Step 3: Deep Inspection
|
||||
|
||||
**What happens:**
|
||||
- Identifies text-based files that can be read
|
||||
- Concurrently reads file contents (up to `--max-concurrent` files at once)
|
||||
- Sends content to Gemini AI for sub-folder suggestions
|
||||
- AI analyzes content and suggests relevant sub-categories
|
||||
- Applies intelligent retry logic with exponential backoff
|
||||
|
||||
**Supported text file formats:**
|
||||
```
|
||||
Source Code: rs, py, js, ts, jsx, tsx, java, go, c, cpp, h, hpp, rb, php, swift, kt, scala, lua, r, m
|
||||
Web/Config: html, css, json, xml, yaml, yml, toml, ini, cfg, conf
|
||||
Documentation: txt, md, sql, sh, bat, ps1, log
|
||||
```
|
||||
|
||||
**Why concurrent?**
|
||||
- Processes multiple files simultaneously
|
||||
- Significantly reduces total processing time
|
||||
- Configurable concurrency limit prevents API rate limiting
|
||||
|
||||
**Output:** Enhanced organization plan with sub-folders
|
||||
|
||||
### Step 4: Preview & Confirmation
|
||||
|
||||
**What happens:**
|
||||
- Displays complete organization plan to user
|
||||
- Shows source file and destination path for each file
|
||||
- Waits for user confirmation (y/n)
|
||||
- Allows user to review before any changes are made
|
||||
|
||||
**User options:**
|
||||
- Accept: Proceed with organization
|
||||
- Decline: Cancel and exit without changes
|
||||
|
||||
**Output:** User decision (proceed or abort)
|
||||
|
||||
### Step 5: Execute Moves
|
||||
|
||||
**What happens:**
|
||||
- Creates destination directories as needed
|
||||
- Moves files to their designated locations
|
||||
- Records each move in the undo log
|
||||
- Reports success/failure for each operation
|
||||
- Displays final summary statistics
|
||||
|
||||
**Safety features:**
|
||||
- Only moves files after user confirmation
|
||||
- Tracks all operations for undo capability
|
||||
- Handles errors gracefully without stopping entire process
|
||||
- Creates parent directories automatically
|
||||
|
||||
**Output:** Organized files and execution summary
|
||||
|
||||
## Caching System
|
||||
|
||||
NoEntropy includes an intelligent caching system to minimize API calls and improve performance.
|
||||
|
||||
### Cache Design
|
||||
|
||||
- **Location**: `.noentropy_cache.json` in project root
|
||||
- **Format**: JSON with file path as key
|
||||
- **Expiry**: 7 days (automatically cleaned up)
|
||||
- **Max Entries**: 1000 entries (LRU eviction)
|
||||
- **Change Detection**: File size + modification time (not content hash)
|
||||
|
||||
### How Caching Works
|
||||
|
||||
1. **First Run**:
|
||||
- Files are analyzed via Gemini API
|
||||
- Categorization results are cached with metadata
|
||||
|
||||
2. **Cache Check** (subsequent runs):
|
||||
```
|
||||
File found in cache?
|
||||
├─ No → Analyze via API, cache result
|
||||
└─ Yes → File changed (size/time)?
|
||||
├─ Yes → Re-analyze via API, update cache
|
||||
└─ No → Use cached categorization
|
||||
```
|
||||
|
||||
3. **Cache Maintenance**:
|
||||
- Removes entries older than 7 days on every run
|
||||
- Evicts oldest entries when limit (1000) is reached
|
||||
- Validates file still exists before using cache
|
||||
|
||||
### Cache Benefits
|
||||
|
||||
- **Reduced API Costs**: Avoids re-analyzing unchanged files
|
||||
- **Faster Processing**: No API call needed for cached files
|
||||
- **Efficient**: Metadata-based change detection (no content hashing)
|
||||
- **Automatic Cleanup**: Self-maintaining with age and size limits
|
||||
|
||||
### When Cache is Invalidated
|
||||
|
||||
Cache entries are invalidated when:
|
||||
- File size changes
|
||||
- File modification time changes
|
||||
- Cache entry is older than 7 days
|
||||
- File no longer exists
|
||||
- Cache is manually deleted
|
||||
|
||||
## Undo Log System
|
||||
|
||||
NoEntropy tracks all file moves to enable undo functionality.
|
||||
|
||||
### Undo Log Design
|
||||
|
||||
- **Location**: `~/.config/noentropy/data/undo_log.json`
|
||||
- **Format**: JSON array of move records
|
||||
- **Retention**: 30 days (automatically cleaned up)
|
||||
- **Max Entries**: 1000 entries (oldest evicted)
|
||||
- **Status Tracking**: Completed, Undone, Failed states
|
||||
|
||||
### Move Record Structure
|
||||
|
||||
Each file move is recorded with:
|
||||
- Source path (original location)
|
||||
- Destination path (new location)
|
||||
- Timestamp of move
|
||||
- Status (completed/undone/failed)
|
||||
|
||||
### How Undo Works
|
||||
|
||||
1. **During Organization**:
|
||||
```
|
||||
For each file moved:
|
||||
├─ Record source path
|
||||
├─ Record destination path
|
||||
├─ Record timestamp
|
||||
└─ Mark as "completed"
|
||||
```
|
||||
|
||||
2. **Undo Execution**:
|
||||
```
|
||||
Load undo log
|
||||
├─ Filter "completed" moves (not already undone)
|
||||
├─ Show preview to user
|
||||
├─ Request confirmation
|
||||
└─ If confirmed:
|
||||
├─ Check destination exists
|
||||
├─ Check source doesn't exist (avoid conflicts)
|
||||
├─ Move file back to source
|
||||
├─ Mark as "undone"
|
||||
└─ Clean up empty directories
|
||||
```
|
||||
|
||||
3. **Conflict Handling**:
|
||||
- **Source exists**: Skip restore (prevent overwrite)
|
||||
- **Destination missing**: Skip restore (file was deleted)
|
||||
- **Permission error**: Skip restore, report error
|
||||
|
||||
### Undo Safety Features
|
||||
|
||||
- **Preview Before Action**: Always shows what will be undone
|
||||
- **Conflict Detection**: Prevents data loss from overwrites
|
||||
- **Missing File Handling**: Gracefully skips deleted files
|
||||
- **Partial Undo Support**: Continues processing despite individual failures
|
||||
- **Empty Directory Cleanup**: Removes empty folders after undo
|
||||
- **Dry-Run Mode**: Preview undo without executing
|
||||
|
||||
### Undo Limitations
|
||||
|
||||
- Only tracks moves made by NoEntropy
|
||||
- Cannot track manual file operations
|
||||
- Limited to 30-day history
|
||||
- Cannot restore deleted files (only moves)
|
||||
|
||||
## Supported File Categories
|
||||
|
||||
NoEntropy can organize files into these default categories:
|
||||
|
||||
| Category | File Types |
|
||||
|----------|------------|
|
||||
| **Images** | PNG, JPG, JPEG, GIF, SVG, BMP, WEBP, ICO, TIFF |
|
||||
| **Documents** | PDF, DOC, DOCX, TXT, MD, RTF, ODT, PAGES |
|
||||
| **Installers** | EXE, DMG, APP, PKG, DEB, RPM, MSI, APK |
|
||||
| **Music** | MP3, WAV, FLAC, M4A, AAC, OGG, WMA |
|
||||
| **Videos** | MP4, AVI, MKV, MOV, WMV, FLV, WEBM |
|
||||
| **Archives** | ZIP, TAR, GZ, RAR, 7Z, BZ2, XZ |
|
||||
| **Code** | Source code and configuration files |
|
||||
| **Misc** | Everything else |
|
||||
|
||||
## AI Integration
|
||||
|
||||
NoEntropy uses Google's Gemini API for intelligent categorization.
|
||||
|
||||
### API Usage
|
||||
|
||||
- **Model**: Gemini 1.5 Flash (configurable)
|
||||
- **Concurrent Requests**: 5 by default (configurable via `--max-concurrent`)
|
||||
- **Retry Logic**: Exponential backoff for failed requests
|
||||
- **Rate Limiting**: Respects API rate limits with configurable concurrency
|
||||
|
||||
### Prompt Engineering
|
||||
|
||||
NoEntropy uses carefully crafted prompts to get accurate categorization:
|
||||
|
||||
1. **Initial Categorization Prompt**:
|
||||
- Lists all filenames
|
||||
- Specifies available categories
|
||||
- Requests JSON response with categorization plan
|
||||
|
||||
2. **Deep Inspection Prompt**:
|
||||
- Provides file content
|
||||
- Requests sub-folder suggestion based on content
|
||||
- Asks for semantic analysis, not just extension
|
||||
|
||||
### Error Handling
|
||||
|
||||
- **Network Errors**: Retry with exponential backoff
|
||||
- **Rate Limiting**: Respects limits, retries after delay
|
||||
- **Invalid Responses**: Logs error, continues with other files
|
||||
- **Timeout**: Configurable timeout with fallback behavior
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Factors Affecting Performance
|
||||
|
||||
1. **Number of Files**:
|
||||
- 10-50 files: ~10-30 seconds
|
||||
- 100-500 files: 1-3 minutes
|
||||
- 1000+ files: 5-10 minutes
|
||||
|
||||
2. **Concurrency Level**:
|
||||
- Higher = faster but more API load
|
||||
- Lower = slower but safer for rate limits
|
||||
- Default (5) balances speed and safety
|
||||
|
||||
3. **Cache Hit Rate**:
|
||||
- High hit rate (>80%): Significantly faster
|
||||
- Low hit rate (<20%): More API calls needed
|
||||
- Regular usage improves hit rate over time
|
||||
|
||||
4. **Text File Count**:
|
||||
- More text files = more deep inspection
|
||||
- Deep inspection adds processing time
|
||||
- Concurrent processing mitigates this
|
||||
|
||||
### Optimization Strategies
|
||||
|
||||
1. **Use caching**: Regular runs benefit from cached results
|
||||
2. **Adjust concurrency**: Increase for faster processing
|
||||
3. **Dry-run first**: Test configuration without full processing
|
||||
4. **Organize regularly**: Smaller batches process faster
|
||||
|
||||
## Architecture Diagram
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ NoEntropy CLI │
|
||||
│ (Orchestrator) │
|
||||
└────────────┬──────────────────────────────┬─────────────┘
|
||||
│ │
|
||||
┌────────▼─────────┐ ┌───────▼────────┐
|
||||
│ File Scanner │ │ Config Manager │
|
||||
│ & Detector │ │ │
|
||||
└────────┬─────────┘ └────────────────┘
|
||||
│
|
||||
┌────────▼──────────────────────────────────────┐
|
||||
│ Gemini AI Client │
|
||||
│ (with retry logic & concurrent processing) │
|
||||
└────────┬──────────────────────────────────────┘
|
||||
│
|
||||
┌────────▼─────────┐ ┌────────────────┐
|
||||
│ Cache System │ │ Undo Log │
|
||||
└──────────────────┘ └────────────────┘
|
||||
│
|
||||
┌────────▼─────────┐
|
||||
│ File Mover │
|
||||
└──────────────────┘
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [Usage Guide](USAGE.md) - Learn how to use NoEntropy
|
||||
- [Configuration Guide](CONFIGURATION.md) - Configure NoEntropy
|
||||
- [Development Guide](DEVELOPMENT.md) - Contribute to NoEntropy
|
||||
|
||||
---
|
||||
|
||||
[Back to Main README](../README.md)
|
||||
Reference in New Issue
Block a user