Files
noentropy/docs/HOW_IT_WORKS.md
glitchySid d4e8dbc6b3 docs: restructure documentation into organized files
Split the 630-line README.md into focused, well-organized documentation:

- README.md: Concise overview with quick start and links
- docs/INSTALLATION.md: Installation instructions and setup
- docs/CONFIGURATION.md: Configuration options and custom categories
- docs/USAGE.md: Command-line options and usage examples
- docs/HOW_IT_WORKS.md: Architecture and internal processes
- docs/TROUBLESHOOTING.md: Common issues and solutions
- docs/DEVELOPMENT.md: Project structure and development guide
- docs/CONTRIBUTING.md: Contribution guidelines and standards

Benefits:
- Main README is now clean and welcoming (~150 lines vs 630)
- Each doc has a clear, focused purpose
- Better navigation with cross-linking between docs
- Follows GitHub best practices with docs/ directory
- Easier to maintain and update specific sections
2026-01-02 00:55:29 +05:30

346 lines
12 KiB
Markdown

# How NoEntropy Works
This guide explains the internal architecture and processes that power NoEntropy's intelligent file organization.
## Overview
NoEntropy uses a multi-stage pipeline that combines AI-powered categorization with intelligent caching and concurrent processing to efficiently organize your files.
## Organization Process
NoEntropy follows a five-step process to organize your files:
```
┌─────────────────┐
│ 1. Scan Files │ → Read all files in DOWNLOAD_FOLDER
└────────┬────────┘ (and subdirs if --recursive flag is used)
┌─────────────────────────┐
│ 2. Initial Categorization │ → Ask Gemini to categorize by filename
└────────┬────────────────┘
┌──────────────────────┐
│ 3. Deep Inspection │ → Read text files for sub-categories
│ (Concurrent) │ • Reads file content
│ │ • Asks AI for sub-folder
└────────┬──────────────┘
┌──────────────────────┐
│ 4. Preview & Confirm│ → Show organization plan
│ │ • Ask user approval
└────────┬──────────────┘
┌──────────────────────┐
│ 5. Execute Moves │ → Move files to organized folders
└──────────────────────┘
```
### Step 1: File Scanning
**What happens:**
- Scans the configured download folder
- Optionally scans subdirectories with `--recursive` flag
- Collects file paths and metadata (size, modification time)
- Filters out directories and focuses on files only
**Output:** List of file paths ready for categorization
### Step 2: Initial Categorization
**What happens:**
- Sends list of filenames to Gemini API
- AI analyzes filenames and determines appropriate categories
- Returns a categorization plan for all files
- Uses custom categories if configured, otherwise uses defaults
**AI Prompt includes:**
- List of all filenames
- Available categories (default or custom)
- Instructions to categorize based on file type and content
- Request for main category assignment
**Output:** Initial organization plan with main categories
### Step 3: Deep Inspection
**What happens:**
- Identifies text-based files that can be read
- Concurrently reads file contents (up to `--max-concurrent` files at once)
- Sends content to Gemini AI for sub-folder suggestions
- AI analyzes content and suggests relevant sub-categories
- Applies intelligent retry logic with exponential backoff
**Supported text file formats:**
```
Source Code: rs, py, js, ts, jsx, tsx, java, go, c, cpp, h, hpp, rb, php, swift, kt, scala, lua, r, m
Web/Config: html, css, json, xml, yaml, yml, toml, ini, cfg, conf
Documentation: txt, md, sql, sh, bat, ps1, log
```
**Why concurrent?**
- Processes multiple files simultaneously
- Significantly reduces total processing time
- Configurable concurrency limit prevents API rate limiting
**Output:** Enhanced organization plan with sub-folders
### Step 4: Preview & Confirmation
**What happens:**
- Displays complete organization plan to user
- Shows source file and destination path for each file
- Waits for user confirmation (y/n)
- Allows user to review before any changes are made
**User options:**
- Accept: Proceed with organization
- Decline: Cancel and exit without changes
**Output:** User decision (proceed or abort)
### Step 5: Execute Moves
**What happens:**
- Creates destination directories as needed
- Moves files to their designated locations
- Records each move in the undo log
- Reports success/failure for each operation
- Displays final summary statistics
**Safety features:**
- Only moves files after user confirmation
- Tracks all operations for undo capability
- Handles errors gracefully without stopping entire process
- Creates parent directories automatically
**Output:** Organized files and execution summary
## Caching System
NoEntropy includes an intelligent caching system to minimize API calls and improve performance.
### Cache Design
- **Location**: `.noentropy_cache.json` in project root
- **Format**: JSON with file path as key
- **Expiry**: 7 days (automatically cleaned up)
- **Max Entries**: 1000 entries (LRU eviction)
- **Change Detection**: File size + modification time (not content hash)
### How Caching Works
1. **First Run**:
- Files are analyzed via Gemini API
- Categorization results are cached with metadata
2. **Cache Check** (subsequent runs):
```
File found in cache?
├─ No → Analyze via API, cache result
└─ Yes → File changed (size/time)?
├─ Yes → Re-analyze via API, update cache
└─ No → Use cached categorization
```
3. **Cache Maintenance**:
- Removes entries older than 7 days on every run
- Evicts oldest entries when limit (1000) is reached
- Validates file still exists before using cache
### Cache Benefits
- **Reduced API Costs**: Avoids re-analyzing unchanged files
- **Faster Processing**: No API call needed for cached files
- **Efficient**: Metadata-based change detection (no content hashing)
- **Automatic Cleanup**: Self-maintaining with age and size limits
### When Cache is Invalidated
Cache entries are invalidated when:
- File size changes
- File modification time changes
- Cache entry is older than 7 days
- File no longer exists
- Cache is manually deleted
## Undo Log System
NoEntropy tracks all file moves to enable undo functionality.
### Undo Log Design
- **Location**: `~/.config/noentropy/data/undo_log.json`
- **Format**: JSON array of move records
- **Retention**: 30 days (automatically cleaned up)
- **Max Entries**: 1000 entries (oldest evicted)
- **Status Tracking**: Completed, Undone, Failed states
### Move Record Structure
Each file move is recorded with:
- Source path (original location)
- Destination path (new location)
- Timestamp of move
- Status (completed/undone/failed)
### How Undo Works
1. **During Organization**:
```
For each file moved:
├─ Record source path
├─ Record destination path
├─ Record timestamp
└─ Mark as "completed"
```
2. **Undo Execution**:
```
Load undo log
├─ Filter "completed" moves (not already undone)
├─ Show preview to user
├─ Request confirmation
└─ If confirmed:
├─ Check destination exists
├─ Check source doesn't exist (avoid conflicts)
├─ Move file back to source
├─ Mark as "undone"
└─ Clean up empty directories
```
3. **Conflict Handling**:
- **Source exists**: Skip restore (prevent overwrite)
- **Destination missing**: Skip restore (file was deleted)
- **Permission error**: Skip restore, report error
### Undo Safety Features
- **Preview Before Action**: Always shows what will be undone
- **Conflict Detection**: Prevents data loss from overwrites
- **Missing File Handling**: Gracefully skips deleted files
- **Partial Undo Support**: Continues processing despite individual failures
- **Empty Directory Cleanup**: Removes empty folders after undo
- **Dry-Run Mode**: Preview undo without executing
### Undo Limitations
- Only tracks moves made by NoEntropy
- Cannot track manual file operations
- Limited to 30-day history
- Cannot restore deleted files (only moves)
## Supported File Categories
NoEntropy can organize files into these default categories:
| Category | File Types |
|----------|------------|
| **Images** | PNG, JPG, JPEG, GIF, SVG, BMP, WEBP, ICO, TIFF |
| **Documents** | PDF, DOC, DOCX, TXT, MD, RTF, ODT, PAGES |
| **Installers** | EXE, DMG, APP, PKG, DEB, RPM, MSI, APK |
| **Music** | MP3, WAV, FLAC, M4A, AAC, OGG, WMA |
| **Videos** | MP4, AVI, MKV, MOV, WMV, FLV, WEBM |
| **Archives** | ZIP, TAR, GZ, RAR, 7Z, BZ2, XZ |
| **Code** | Source code and configuration files |
| **Misc** | Everything else |
## AI Integration
NoEntropy uses Google's Gemini API for intelligent categorization.
### API Usage
- **Model**: Gemini 1.5 Flash (configurable)
- **Concurrent Requests**: 5 by default (configurable via `--max-concurrent`)
- **Retry Logic**: Exponential backoff for failed requests
- **Rate Limiting**: Respects API rate limits with configurable concurrency
### Prompt Engineering
NoEntropy uses carefully crafted prompts to get accurate categorization:
1. **Initial Categorization Prompt**:
- Lists all filenames
- Specifies available categories
- Requests JSON response with categorization plan
2. **Deep Inspection Prompt**:
- Provides file content
- Requests sub-folder suggestion based on content
- Asks for semantic analysis, not just extension
### Error Handling
- **Network Errors**: Retry with exponential backoff
- **Rate Limiting**: Respects limits, retries after delay
- **Invalid Responses**: Logs error, continues with other files
- **Timeout**: Configurable timeout with fallback behavior
## Performance Characteristics
### Factors Affecting Performance
1. **Number of Files**:
- 10-50 files: ~10-30 seconds
- 100-500 files: 1-3 minutes
- 1000+ files: 5-10 minutes
2. **Concurrency Level**:
- Higher = faster but more API load
- Lower = slower but safer for rate limits
- Default (5) balances speed and safety
3. **Cache Hit Rate**:
- High hit rate (>80%): Significantly faster
- Low hit rate (<20%): More API calls needed
- Regular usage improves hit rate over time
4. **Text File Count**:
- More text files = more deep inspection
- Deep inspection adds processing time
- Concurrent processing mitigates this
### Optimization Strategies
1. **Use caching**: Regular runs benefit from cached results
2. **Adjust concurrency**: Increase for faster processing
3. **Dry-run first**: Test configuration without full processing
4. **Organize regularly**: Smaller batches process faster
## Architecture Diagram
```
┌─────────────────────────────────────────────────────────┐
│ NoEntropy CLI │
│ (Orchestrator) │
└────────────┬──────────────────────────────┬─────────────┘
│ │
┌────────▼─────────┐ ┌───────▼────────┐
│ File Scanner │ │ Config Manager │
│ & Detector │ │ │
└────────┬─────────┘ └────────────────┘
┌────────▼──────────────────────────────────────┐
│ Gemini AI Client │
│ (with retry logic & concurrent processing) │
└────────┬──────────────────────────────────────┘
┌────────▼─────────┐ ┌────────────────┐
│ Cache System │ │ Undo Log │
└──────────────────┘ └────────────────┘
┌────────▼─────────┐
│ File Mover │
└──────────────────┘
```
## Next Steps
- [Usage Guide](USAGE.md) - Learn how to use NoEntropy
- [Configuration Guide](CONFIGURATION.md) - Configure NoEntropy
- [Development Guide](DEVELOPMENT.md) - Contribute to NoEntropy
---
[Back to Main README](../README.md)