Split the 630-line README.md into focused, well-organized documentation: - README.md: Concise overview with quick start and links - docs/INSTALLATION.md: Installation instructions and setup - docs/CONFIGURATION.md: Configuration options and custom categories - docs/USAGE.md: Command-line options and usage examples - docs/HOW_IT_WORKS.md: Architecture and internal processes - docs/TROUBLESHOOTING.md: Common issues and solutions - docs/DEVELOPMENT.md: Project structure and development guide - docs/CONTRIBUTING.md: Contribution guidelines and standards Benefits: - Main README is now clean and welcoming (~150 lines vs 630) - Each doc has a clear, focused purpose - Better navigation with cross-linking between docs - Follows GitHub best practices with docs/ directory - Easier to maintain and update specific sections
346 lines
12 KiB
Markdown
346 lines
12 KiB
Markdown
# How NoEntropy Works
|
|
|
|
This guide explains the internal architecture and processes that power NoEntropy's intelligent file organization.
|
|
|
|
## Overview
|
|
|
|
NoEntropy uses a multi-stage pipeline that combines AI-powered categorization with intelligent caching and concurrent processing to efficiently organize your files.
|
|
|
|
## Organization Process
|
|
|
|
NoEntropy follows a five-step process to organize your files:
|
|
|
|
```
|
|
┌─────────────────┐
|
|
│ 1. Scan Files │ → Read all files in DOWNLOAD_FOLDER
|
|
└────────┬────────┘ (and subdirs if --recursive flag is used)
|
|
▼
|
|
┌─────────────────────────┐
|
|
│ 2. Initial Categorization │ → Ask Gemini to categorize by filename
|
|
└────────┬────────────────┘
|
|
▼
|
|
┌──────────────────────┐
|
|
│ 3. Deep Inspection │ → Read text files for sub-categories
|
|
│ (Concurrent) │ • Reads file content
|
|
│ │ • Asks AI for sub-folder
|
|
└────────┬──────────────┘
|
|
▼
|
|
┌──────────────────────┐
|
|
│ 4. Preview & Confirm│ → Show organization plan
|
|
│ │ • Ask user approval
|
|
└────────┬──────────────┘
|
|
▼
|
|
┌──────────────────────┐
|
|
│ 5. Execute Moves │ → Move files to organized folders
|
|
└──────────────────────┘
|
|
```
|
|
|
|
### Step 1: File Scanning
|
|
|
|
**What happens:**
|
|
- Scans the configured download folder
|
|
- Optionally scans subdirectories with `--recursive` flag
|
|
- Collects file paths and metadata (size, modification time)
|
|
- Filters out directories and focuses on files only
|
|
|
|
**Output:** List of file paths ready for categorization
|
|
|
|
### Step 2: Initial Categorization
|
|
|
|
**What happens:**
|
|
- Sends list of filenames to Gemini API
|
|
- AI analyzes filenames and determines appropriate categories
|
|
- Returns a categorization plan for all files
|
|
- Uses custom categories if configured, otherwise uses defaults
|
|
|
|
**AI Prompt includes:**
|
|
- List of all filenames
|
|
- Available categories (default or custom)
|
|
- Instructions to categorize based on file type and content
|
|
- Request for main category assignment
|
|
|
|
**Output:** Initial organization plan with main categories
|
|
|
|
### Step 3: Deep Inspection
|
|
|
|
**What happens:**
|
|
- Identifies text-based files that can be read
|
|
- Concurrently reads file contents (up to `--max-concurrent` files at once)
|
|
- Sends content to Gemini AI for sub-folder suggestions
|
|
- AI analyzes content and suggests relevant sub-categories
|
|
- Applies intelligent retry logic with exponential backoff
|
|
|
|
**Supported text file formats:**
|
|
```
|
|
Source Code: rs, py, js, ts, jsx, tsx, java, go, c, cpp, h, hpp, rb, php, swift, kt, scala, lua, r, m
|
|
Web/Config: html, css, json, xml, yaml, yml, toml, ini, cfg, conf
|
|
Documentation: txt, md, sql, sh, bat, ps1, log
|
|
```
|
|
|
|
**Why concurrent?**
|
|
- Processes multiple files simultaneously
|
|
- Significantly reduces total processing time
|
|
- Configurable concurrency limit prevents API rate limiting
|
|
|
|
**Output:** Enhanced organization plan with sub-folders
|
|
|
|
### Step 4: Preview & Confirmation
|
|
|
|
**What happens:**
|
|
- Displays complete organization plan to user
|
|
- Shows source file and destination path for each file
|
|
- Waits for user confirmation (y/n)
|
|
- Allows user to review before any changes are made
|
|
|
|
**User options:**
|
|
- Accept: Proceed with organization
|
|
- Decline: Cancel and exit without changes
|
|
|
|
**Output:** User decision (proceed or abort)
|
|
|
|
### Step 5: Execute Moves
|
|
|
|
**What happens:**
|
|
- Creates destination directories as needed
|
|
- Moves files to their designated locations
|
|
- Records each move in the undo log
|
|
- Reports success/failure for each operation
|
|
- Displays final summary statistics
|
|
|
|
**Safety features:**
|
|
- Only moves files after user confirmation
|
|
- Tracks all operations for undo capability
|
|
- Handles errors gracefully without stopping entire process
|
|
- Creates parent directories automatically
|
|
|
|
**Output:** Organized files and execution summary
|
|
|
|
## Caching System
|
|
|
|
NoEntropy includes an intelligent caching system to minimize API calls and improve performance.
|
|
|
|
### Cache Design
|
|
|
|
- **Location**: `.noentropy_cache.json` in project root
|
|
- **Format**: JSON with file path as key
|
|
- **Expiry**: 7 days (automatically cleaned up)
|
|
- **Max Entries**: 1000 entries (LRU eviction)
|
|
- **Change Detection**: File size + modification time (not content hash)
|
|
|
|
### How Caching Works
|
|
|
|
1. **First Run**:
|
|
- Files are analyzed via Gemini API
|
|
- Categorization results are cached with metadata
|
|
|
|
2. **Cache Check** (subsequent runs):
|
|
```
|
|
File found in cache?
|
|
├─ No → Analyze via API, cache result
|
|
└─ Yes → File changed (size/time)?
|
|
├─ Yes → Re-analyze via API, update cache
|
|
└─ No → Use cached categorization
|
|
```
|
|
|
|
3. **Cache Maintenance**:
|
|
- Removes entries older than 7 days on every run
|
|
- Evicts oldest entries when limit (1000) is reached
|
|
- Validates file still exists before using cache
|
|
|
|
### Cache Benefits
|
|
|
|
- **Reduced API Costs**: Avoids re-analyzing unchanged files
|
|
- **Faster Processing**: No API call needed for cached files
|
|
- **Efficient**: Metadata-based change detection (no content hashing)
|
|
- **Automatic Cleanup**: Self-maintaining with age and size limits
|
|
|
|
### When Cache is Invalidated
|
|
|
|
Cache entries are invalidated when:
|
|
- File size changes
|
|
- File modification time changes
|
|
- Cache entry is older than 7 days
|
|
- File no longer exists
|
|
- Cache is manually deleted
|
|
|
|
## Undo Log System
|
|
|
|
NoEntropy tracks all file moves to enable undo functionality.
|
|
|
|
### Undo Log Design
|
|
|
|
- **Location**: `~/.config/noentropy/data/undo_log.json`
|
|
- **Format**: JSON array of move records
|
|
- **Retention**: 30 days (automatically cleaned up)
|
|
- **Max Entries**: 1000 entries (oldest evicted)
|
|
- **Status Tracking**: Completed, Undone, Failed states
|
|
|
|
### Move Record Structure
|
|
|
|
Each file move is recorded with:
|
|
- Source path (original location)
|
|
- Destination path (new location)
|
|
- Timestamp of move
|
|
- Status (completed/undone/failed)
|
|
|
|
### How Undo Works
|
|
|
|
1. **During Organization**:
|
|
```
|
|
For each file moved:
|
|
├─ Record source path
|
|
├─ Record destination path
|
|
├─ Record timestamp
|
|
└─ Mark as "completed"
|
|
```
|
|
|
|
2. **Undo Execution**:
|
|
```
|
|
Load undo log
|
|
├─ Filter "completed" moves (not already undone)
|
|
├─ Show preview to user
|
|
├─ Request confirmation
|
|
└─ If confirmed:
|
|
├─ Check destination exists
|
|
├─ Check source doesn't exist (avoid conflicts)
|
|
├─ Move file back to source
|
|
├─ Mark as "undone"
|
|
└─ Clean up empty directories
|
|
```
|
|
|
|
3. **Conflict Handling**:
|
|
- **Source exists**: Skip restore (prevent overwrite)
|
|
- **Destination missing**: Skip restore (file was deleted)
|
|
- **Permission error**: Skip restore, report error
|
|
|
|
### Undo Safety Features
|
|
|
|
- **Preview Before Action**: Always shows what will be undone
|
|
- **Conflict Detection**: Prevents data loss from overwrites
|
|
- **Missing File Handling**: Gracefully skips deleted files
|
|
- **Partial Undo Support**: Continues processing despite individual failures
|
|
- **Empty Directory Cleanup**: Removes empty folders after undo
|
|
- **Dry-Run Mode**: Preview undo without executing
|
|
|
|
### Undo Limitations
|
|
|
|
- Only tracks moves made by NoEntropy
|
|
- Cannot track manual file operations
|
|
- Limited to 30-day history
|
|
- Cannot restore deleted files (only moves)
|
|
|
|
## Supported File Categories
|
|
|
|
NoEntropy can organize files into these default categories:
|
|
|
|
| Category | File Types |
|
|
|----------|------------|
|
|
| **Images** | PNG, JPG, JPEG, GIF, SVG, BMP, WEBP, ICO, TIFF |
|
|
| **Documents** | PDF, DOC, DOCX, TXT, MD, RTF, ODT, PAGES |
|
|
| **Installers** | EXE, DMG, APP, PKG, DEB, RPM, MSI, APK |
|
|
| **Music** | MP3, WAV, FLAC, M4A, AAC, OGG, WMA |
|
|
| **Videos** | MP4, AVI, MKV, MOV, WMV, FLV, WEBM |
|
|
| **Archives** | ZIP, TAR, GZ, RAR, 7Z, BZ2, XZ |
|
|
| **Code** | Source code and configuration files |
|
|
| **Misc** | Everything else |
|
|
|
|
## AI Integration
|
|
|
|
NoEntropy uses Google's Gemini API for intelligent categorization.
|
|
|
|
### API Usage
|
|
|
|
- **Model**: Gemini 1.5 Flash (configurable)
|
|
- **Concurrent Requests**: 5 by default (configurable via `--max-concurrent`)
|
|
- **Retry Logic**: Exponential backoff for failed requests
|
|
- **Rate Limiting**: Respects API rate limits with configurable concurrency
|
|
|
|
### Prompt Engineering
|
|
|
|
NoEntropy uses carefully crafted prompts to get accurate categorization:
|
|
|
|
1. **Initial Categorization Prompt**:
|
|
- Lists all filenames
|
|
- Specifies available categories
|
|
- Requests JSON response with categorization plan
|
|
|
|
2. **Deep Inspection Prompt**:
|
|
- Provides file content
|
|
- Requests sub-folder suggestion based on content
|
|
- Asks for semantic analysis, not just extension
|
|
|
|
### Error Handling
|
|
|
|
- **Network Errors**: Retry with exponential backoff
|
|
- **Rate Limiting**: Respects limits, retries after delay
|
|
- **Invalid Responses**: Logs error, continues with other files
|
|
- **Timeout**: Configurable timeout with fallback behavior
|
|
|
|
## Performance Characteristics
|
|
|
|
### Factors Affecting Performance
|
|
|
|
1. **Number of Files**:
|
|
- 10-50 files: ~10-30 seconds
|
|
- 100-500 files: 1-3 minutes
|
|
- 1000+ files: 5-10 minutes
|
|
|
|
2. **Concurrency Level**:
|
|
- Higher = faster but more API load
|
|
- Lower = slower but safer for rate limits
|
|
- Default (5) balances speed and safety
|
|
|
|
3. **Cache Hit Rate**:
|
|
- High hit rate (>80%): Significantly faster
|
|
- Low hit rate (<20%): More API calls needed
|
|
- Regular usage improves hit rate over time
|
|
|
|
4. **Text File Count**:
|
|
- More text files = more deep inspection
|
|
- Deep inspection adds processing time
|
|
- Concurrent processing mitigates this
|
|
|
|
### Optimization Strategies
|
|
|
|
1. **Use caching**: Regular runs benefit from cached results
|
|
2. **Adjust concurrency**: Increase for faster processing
|
|
3. **Dry-run first**: Test configuration without full processing
|
|
4. **Organize regularly**: Smaller batches process faster
|
|
|
|
## Architecture Diagram
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ NoEntropy CLI │
|
|
│ (Orchestrator) │
|
|
└────────────┬──────────────────────────────┬─────────────┘
|
|
│ │
|
|
┌────────▼─────────┐ ┌───────▼────────┐
|
|
│ File Scanner │ │ Config Manager │
|
|
│ & Detector │ │ │
|
|
└────────┬─────────┘ └────────────────┘
|
|
│
|
|
┌────────▼──────────────────────────────────────┐
|
|
│ Gemini AI Client │
|
|
│ (with retry logic & concurrent processing) │
|
|
└────────┬──────────────────────────────────────┘
|
|
│
|
|
┌────────▼─────────┐ ┌────────────────┐
|
|
│ Cache System │ │ Undo Log │
|
|
└──────────────────┘ └────────────────┘
|
|
│
|
|
┌────────▼─────────┐
|
|
│ File Mover │
|
|
└──────────────────┘
|
|
```
|
|
|
|
## Next Steps
|
|
|
|
- [Usage Guide](USAGE.md) - Learn how to use NoEntropy
|
|
- [Configuration Guide](CONFIGURATION.md) - Configure NoEntropy
|
|
- [Development Guide](DEVELOPMENT.md) - Contribute to NoEntropy
|
|
|
|
---
|
|
|
|
[Back to Main README](../README.md)
|