noentropy/docs/HOW_IT_WORKS.md

# How NoEntropy Works

This guide explains the internal architecture and processes that power NoEntropy's intelligent file organization.

## Overview

NoEntropy uses a multi-stage pipeline that combines AI-powered categorization with intelligent caching and concurrent processing to efficiently organize your files.

## Organization Process

NoEntropy follows a five-step process to organize your files:

```
┌─────────────────┐
│  1. Scan Files  │ → Read all files in DOWNLOAD_FOLDER
└────────┬────────┘   (and subdirs if --recursive flag is used)
         ▼
┌─────────────────────────┐
│ 2. Initial Categorization │ → Ask Gemini to categorize by filename
└────────┬────────────────┘
         ▼
┌──────────────────────┐
│  3. Deep Inspection   │ → Read text files for sub-categories
│     (Concurrent)      │   • Reads file content
│                       │   • Asks AI for sub-folder
└────────┬──────────────┘
         ▼
┌──────────────────────┐
│  4. Preview & Confirm│ → Show organization plan
│                       │   • Ask user approval
└────────┬──────────────┘
         ▼
┌──────────────────────┐
│   5. Execute Moves    │ → Move files to organized folders
└──────────────────────┘
```

### Step 1: File Scanning

**What happens:**
- Scans the configured download folder
- Optionally scans subdirectories with `--recursive` flag
- Collects file paths and metadata (size, modification time)
- Filters out directories and focuses on files only

**Output:** List of file paths ready for categorization

### Step 2: Initial Categorization

**What happens:**
- Sends list of filenames to Gemini API
- AI analyzes filenames and determines appropriate categories
- Returns a categorization plan for all files
- Uses custom categories if configured, otherwise uses defaults

**AI Prompt includes:**
- List of all filenames
- Available categories (default or custom)
- Instructions to categorize based on file type and content
- Request for main category assignment

**Output:** Initial organization plan with main categories

### Step 3: Deep Inspection

**What happens:**
- Identifies text-based files that can be read
- Concurrently reads file contents (up to `--max-concurrent` files at once)
- Sends content to Gemini AI for sub-folder suggestions
- AI analyzes content and suggests relevant sub-categories
- Applies intelligent retry logic with exponential backoff

**Supported text file formats:**
```
Source Code: rs, py, js, ts, jsx, tsx, java, go, c, cpp, h, hpp, rb, php, swift, kt, scala, lua, r, m
Web/Config: html, css, json, xml, yaml, yml, toml, ini, cfg, conf
Documentation: txt, md, sql, sh, bat, ps1, log
```

**Why concurrent?**
- Processes multiple files simultaneously
- Significantly reduces total processing time
- Configurable concurrency limit prevents API rate limiting

**Output:** Enhanced organization plan with sub-folders

### Step 4: Preview & Confirmation

**What happens:**
- Displays complete organization plan to user
- Shows source file and destination path for each file
- Waits for user confirmation (y/n)
- Allows user to review before any changes are made

**User options:**
- Accept: Proceed with organization
- Decline: Cancel and exit without changes

**Output:** User decision (proceed or abort)

### Step 5: Execute Moves

**What happens:**
- Creates destination directories as needed
- Moves files to their designated locations
- Records each move in the undo log
- Reports success/failure for each operation
- Displays final summary statistics

**Safety features:**
- Only moves files after user confirmation
- Tracks all operations for undo capability
- Handles errors gracefully without stopping entire process
- Creates parent directories automatically

**Output:** Organized files and execution summary

## Caching System

NoEntropy includes an intelligent caching system to minimize API calls and improve performance.

### Cache Design

- **Location**: `.noentropy_cache.json` in project root
- **Format**: JSON with file path as key
- **Expiry**: 7 days (automatically cleaned up)
- **Max Entries**: 1000 entries (LRU eviction)
- **Change Detection**: File size + modification time (not content hash)

### How Caching Works

1. **First Run**:
   - Files are analyzed via Gemini API
   - Categorization results are cached with metadata

2. **Cache Check** (subsequent runs):
   ```
   File found in cache?
   ├─ No → Analyze via API, cache result
   └─ Yes → File changed (size/time)?
       ├─ Yes → Re-analyze via API, update cache
       └─ No → Use cached categorization
   ```

3. **Cache Maintenance**:
   - Removes entries older than 7 days on every run
   - Evicts oldest entries when limit (1000) is reached
   - Validates file still exists before using cache

### Cache Benefits

- **Reduced API Costs**: Avoids re-analyzing unchanged files
- **Faster Processing**: No API call needed for cached files
- **Efficient**: Metadata-based change detection (no content hashing)
- **Automatic Cleanup**: Self-maintaining with age and size limits

### When Cache is Invalidated

Cache entries are invalidated when:
- File size changes
- File modification time changes
- Cache entry is older than 7 days
- File no longer exists
- Cache is manually deleted

## Undo Log System

NoEntropy tracks all file moves to enable undo functionality.

### Undo Log Design

- **Location**: `~/.config/noentropy/data/undo_log.json`
- **Format**: JSON array of move records
- **Retention**: 30 days (automatically cleaned up)
- **Max Entries**: 1000 entries (oldest evicted)
- **Status Tracking**: Completed, Undone, Failed states

### Move Record Structure

Each file move is recorded with:
- Source path (original location)
- Destination path (new location)
- Timestamp of move
- Status (completed/undone/failed)

### How Undo Works

1. **During Organization**:
   ```
   For each file moved:
   ├─ Record source path
   ├─ Record destination path
   ├─ Record timestamp
   └─ Mark as "completed"
   ```

2. **Undo Execution**:
   ```
   Load undo log
   ├─ Filter "completed" moves (not already undone)
   ├─ Show preview to user
   ├─ Request confirmation
   └─ If confirmed:
       ├─ Check destination exists
       ├─ Check source doesn't exist (avoid conflicts)
       ├─ Move file back to source
       ├─ Mark as "undone"
       └─ Clean up empty directories
   ```

3. **Conflict Handling**:
   - **Source exists**: Skip restore (prevent overwrite)
   - **Destination missing**: Skip restore (file was deleted)
   - **Permission error**: Skip restore, report error

### Undo Safety Features

- **Preview Before Action**: Always shows what will be undone
- **Conflict Detection**: Prevents data loss from overwrites
- **Missing File Handling**: Gracefully skips deleted files
- **Partial Undo Support**: Continues processing despite individual failures
- **Empty Directory Cleanup**: Removes empty folders after undo
- **Dry-Run Mode**: Preview undo without executing

### Undo Limitations

- Only tracks moves made by NoEntropy
- Cannot track manual file operations
- Limited to 30-day history
- Cannot restore deleted files (only moves)

## Supported File Categories

NoEntropy can organize files into these default categories:

| Category | File Types |
|----------|------------|
| **Images** | PNG, JPG, JPEG, GIF, SVG, BMP, WEBP, ICO, TIFF |
| **Documents** | PDF, DOC, DOCX, TXT, MD, RTF, ODT, PAGES |
| **Installers** | EXE, DMG, APP, PKG, DEB, RPM, MSI, APK |
| **Music** | MP3, WAV, FLAC, M4A, AAC, OGG, WMA |
| **Videos** | MP4, AVI, MKV, MOV, WMV, FLV, WEBM |
| **Archives** | ZIP, TAR, GZ, RAR, 7Z, BZ2, XZ |
| **Code** | Source code and configuration files |
| **Misc** | Everything else |

## AI Integration

NoEntropy uses Google's Gemini API for intelligent categorization.

### API Usage

- **Model**: Gemini 1.5 Flash (configurable)
- **Concurrent Requests**: 5 by default (configurable via `--max-concurrent`)
- **Retry Logic**: Exponential backoff for failed requests
- **Rate Limiting**: Respects API rate limits with configurable concurrency

### Prompt Engineering

NoEntropy uses carefully crafted prompts to get accurate categorization:

1. **Initial Categorization Prompt**:
   - Lists all filenames
   - Specifies available categories
   - Requests JSON response with categorization plan

2. **Deep Inspection Prompt**:
   - Provides file content
   - Requests sub-folder suggestion based on content
   - Asks for semantic analysis, not just extension

### Error Handling

- **Network Errors**: Retry with exponential backoff
- **Rate Limiting**: Respects limits, retries after delay
- **Invalid Responses**: Logs error, continues with other files
- **Timeout**: Configurable timeout with fallback behavior

## Performance Characteristics

### Factors Affecting Performance

1. **Number of Files**:
   - 10-50 files: ~10-30 seconds
   - 100-500 files: 1-3 minutes
   - 1000+ files: 5-10 minutes

2. **Concurrency Level**:
   - Higher = faster but more API load
   - Lower = slower but safer for rate limits
   - Default (5) balances speed and safety

3. **Cache Hit Rate**:
   - High hit rate (>80%): Significantly faster
   - Low hit rate (<20%): More API calls needed
   - Regular usage improves hit rate over time

4. **Text File Count**:
   - More text files = more deep inspection
   - Deep inspection adds processing time
   - Concurrent processing mitigates this

### Optimization Strategies

1. **Use caching**: Regular runs benefit from cached results
2. **Adjust concurrency**: Increase for faster processing
3. **Dry-run first**: Test configuration without full processing
4. **Organize regularly**: Smaller batches process faster

## Architecture Diagram

```
┌─────────────────────────────────────────────────────────┐
│                     NoEntropy CLI                       │
│                   (Orchestrator)                        │
└────────────┬──────────────────────────────┬─────────────┘
             │                              │
    ┌────────▼─────────┐           ┌───────▼────────┐
    │  File Scanner    │           │  Config Manager │
    │  & Detector      │           │                 │
    └────────┬─────────┘           └────────────────┘
             │
    ┌────────▼──────────────────────────────────────┐
    │           Gemini AI Client                    │
    │  (with retry logic & concurrent processing)   │
    └────────┬──────────────────────────────────────┘
             │
    ┌────────▼─────────┐           ┌────────────────┐
    │  Cache System    │           │   Undo Log     │
    └──────────────────┘           └────────────────┘
             │
    ┌────────▼─────────┐
    │   File Mover     │
    └──────────────────┘
```

## Next Steps

- [Usage Guide](USAGE.md) - Learn how to use NoEntropy
- [Configuration Guide](CONFIGURATION.md) - Configure NoEntropy
- [Development Guide](DEVELOPMENT.md) - Contribute to NoEntropy

---

[Back to Main README](../README.md)