Files
noentropy/docs/HOW_IT_WORKS.md
glitchySid d4e8dbc6b3 docs: restructure documentation into organized files
Split the 630-line README.md into focused, well-organized documentation:

- README.md: Concise overview with quick start and links
- docs/INSTALLATION.md: Installation instructions and setup
- docs/CONFIGURATION.md: Configuration options and custom categories
- docs/USAGE.md: Command-line options and usage examples
- docs/HOW_IT_WORKS.md: Architecture and internal processes
- docs/TROUBLESHOOTING.md: Common issues and solutions
- docs/DEVELOPMENT.md: Project structure and development guide
- docs/CONTRIBUTING.md: Contribution guidelines and standards

Benefits:
- Main README is now clean and welcoming (~150 lines vs 630)
- Each doc has a clear, focused purpose
- Better navigation with cross-linking between docs
- Follows GitHub best practices with docs/ directory
- Easier to maintain and update specific sections
2026-01-02 00:55:29 +05:30

12 KiB

How NoEntropy Works

This guide explains the internal architecture and processes that power NoEntropy's intelligent file organization.

Overview

NoEntropy uses a multi-stage pipeline that combines AI-powered categorization with intelligent caching and concurrent processing to efficiently organize your files.

Organization Process

NoEntropy follows a five-step process to organize your files:

┌─────────────────┐
│  1. Scan Files  │ → Read all files in DOWNLOAD_FOLDER 
└────────┬────────┘   (and subdirs if --recursive flag is used)
         ▼
┌─────────────────────────┐
│ 2. Initial Categorization │ → Ask Gemini to categorize by filename
└────────┬────────────────┘
         ▼
┌──────────────────────┐
│  3. Deep Inspection   │ → Read text files for sub-categories
│     (Concurrent)      │   • Reads file content
│                       │   • Asks AI for sub-folder
└────────┬──────────────┘
         ▼
┌──────────────────────┐
│  4. Preview & Confirm│ → Show organization plan
│                       │   • Ask user approval
└────────┬──────────────┘
         ▼
┌──────────────────────┐
│   5. Execute Moves    │ → Move files to organized folders
└──────────────────────┘

Step 1: File Scanning

What happens:

  • Scans the configured download folder
  • Optionally scans subdirectories with --recursive flag
  • Collects file paths and metadata (size, modification time)
  • Filters out directories and focuses on files only

Output: List of file paths ready for categorization

Step 2: Initial Categorization

What happens:

  • Sends list of filenames to Gemini API
  • AI analyzes filenames and determines appropriate categories
  • Returns a categorization plan for all files
  • Uses custom categories if configured, otherwise uses defaults

AI Prompt includes:

  • List of all filenames
  • Available categories (default or custom)
  • Instructions to categorize based on file type and content
  • Request for main category assignment

Output: Initial organization plan with main categories

Step 3: Deep Inspection

What happens:

  • Identifies text-based files that can be read
  • Concurrently reads file contents (up to --max-concurrent files at once)
  • Sends content to Gemini AI for sub-folder suggestions
  • AI analyzes content and suggests relevant sub-categories
  • Applies intelligent retry logic with exponential backoff

Supported text file formats:

Source Code: rs, py, js, ts, jsx, tsx, java, go, c, cpp, h, hpp, rb, php, swift, kt, scala, lua, r, m
Web/Config: html, css, json, xml, yaml, yml, toml, ini, cfg, conf
Documentation: txt, md, sql, sh, bat, ps1, log

Why concurrent?

  • Processes multiple files simultaneously
  • Significantly reduces total processing time
  • Configurable concurrency limit prevents API rate limiting

Output: Enhanced organization plan with sub-folders

Step 4: Preview & Confirmation

What happens:

  • Displays complete organization plan to user
  • Shows source file and destination path for each file
  • Waits for user confirmation (y/n)
  • Allows user to review before any changes are made

User options:

  • Accept: Proceed with organization
  • Decline: Cancel and exit without changes

Output: User decision (proceed or abort)

Step 5: Execute Moves

What happens:

  • Creates destination directories as needed
  • Moves files to their designated locations
  • Records each move in the undo log
  • Reports success/failure for each operation
  • Displays final summary statistics

Safety features:

  • Only moves files after user confirmation
  • Tracks all operations for undo capability
  • Handles errors gracefully without stopping entire process
  • Creates parent directories automatically

Output: Organized files and execution summary

Caching System

NoEntropy includes an intelligent caching system to minimize API calls and improve performance.

Cache Design

  • Location: .noentropy_cache.json in project root
  • Format: JSON with file path as key
  • Expiry: 7 days (automatically cleaned up)
  • Max Entries: 1000 entries (LRU eviction)
  • Change Detection: File size + modification time (not content hash)

How Caching Works

  1. First Run:

    • Files are analyzed via Gemini API
    • Categorization results are cached with metadata
  2. Cache Check (subsequent runs):

    File found in cache?
    ├─ No → Analyze via API, cache result
    └─ Yes → File changed (size/time)?
        ├─ Yes → Re-analyze via API, update cache
        └─ No → Use cached categorization
    
  3. Cache Maintenance:

    • Removes entries older than 7 days on every run
    • Evicts oldest entries when limit (1000) is reached
    • Validates file still exists before using cache

Cache Benefits

  • Reduced API Costs: Avoids re-analyzing unchanged files
  • Faster Processing: No API call needed for cached files
  • Efficient: Metadata-based change detection (no content hashing)
  • Automatic Cleanup: Self-maintaining with age and size limits

When Cache is Invalidated

Cache entries are invalidated when:

  • File size changes
  • File modification time changes
  • Cache entry is older than 7 days
  • File no longer exists
  • Cache is manually deleted

Undo Log System

NoEntropy tracks all file moves to enable undo functionality.

Undo Log Design

  • Location: ~/.config/noentropy/data/undo_log.json
  • Format: JSON array of move records
  • Retention: 30 days (automatically cleaned up)
  • Max Entries: 1000 entries (oldest evicted)
  • Status Tracking: Completed, Undone, Failed states

Move Record Structure

Each file move is recorded with:

  • Source path (original location)
  • Destination path (new location)
  • Timestamp of move
  • Status (completed/undone/failed)

How Undo Works

  1. During Organization:

    For each file moved:
    ├─ Record source path
    ├─ Record destination path
    ├─ Record timestamp
    └─ Mark as "completed"
    
  2. Undo Execution:

    Load undo log
    ├─ Filter "completed" moves (not already undone)
    ├─ Show preview to user
    ├─ Request confirmation
    └─ If confirmed:
        ├─ Check destination exists
        ├─ Check source doesn't exist (avoid conflicts)
        ├─ Move file back to source
        ├─ Mark as "undone"
        └─ Clean up empty directories
    
  3. Conflict Handling:

    • Source exists: Skip restore (prevent overwrite)
    • Destination missing: Skip restore (file was deleted)
    • Permission error: Skip restore, report error

Undo Safety Features

  • Preview Before Action: Always shows what will be undone
  • Conflict Detection: Prevents data loss from overwrites
  • Missing File Handling: Gracefully skips deleted files
  • Partial Undo Support: Continues processing despite individual failures
  • Empty Directory Cleanup: Removes empty folders after undo
  • Dry-Run Mode: Preview undo without executing

Undo Limitations

  • Only tracks moves made by NoEntropy
  • Cannot track manual file operations
  • Limited to 30-day history
  • Cannot restore deleted files (only moves)

Supported File Categories

NoEntropy can organize files into these default categories:

Category File Types
Images PNG, JPG, JPEG, GIF, SVG, BMP, WEBP, ICO, TIFF
Documents PDF, DOC, DOCX, TXT, MD, RTF, ODT, PAGES
Installers EXE, DMG, APP, PKG, DEB, RPM, MSI, APK
Music MP3, WAV, FLAC, M4A, AAC, OGG, WMA
Videos MP4, AVI, MKV, MOV, WMV, FLV, WEBM
Archives ZIP, TAR, GZ, RAR, 7Z, BZ2, XZ
Code Source code and configuration files
Misc Everything else

AI Integration

NoEntropy uses Google's Gemini API for intelligent categorization.

API Usage

  • Model: Gemini 1.5 Flash (configurable)
  • Concurrent Requests: 5 by default (configurable via --max-concurrent)
  • Retry Logic: Exponential backoff for failed requests
  • Rate Limiting: Respects API rate limits with configurable concurrency

Prompt Engineering

NoEntropy uses carefully crafted prompts to get accurate categorization:

  1. Initial Categorization Prompt:

    • Lists all filenames
    • Specifies available categories
    • Requests JSON response with categorization plan
  2. Deep Inspection Prompt:

    • Provides file content
    • Requests sub-folder suggestion based on content
    • Asks for semantic analysis, not just extension

Error Handling

  • Network Errors: Retry with exponential backoff
  • Rate Limiting: Respects limits, retries after delay
  • Invalid Responses: Logs error, continues with other files
  • Timeout: Configurable timeout with fallback behavior

Performance Characteristics

Factors Affecting Performance

  1. Number of Files:

    • 10-50 files: ~10-30 seconds
    • 100-500 files: 1-3 minutes
    • 1000+ files: 5-10 minutes
  2. Concurrency Level:

    • Higher = faster but more API load
    • Lower = slower but safer for rate limits
    • Default (5) balances speed and safety
  3. Cache Hit Rate:

    • High hit rate (>80%): Significantly faster
    • Low hit rate (<20%): More API calls needed
    • Regular usage improves hit rate over time
  4. Text File Count:

    • More text files = more deep inspection
    • Deep inspection adds processing time
    • Concurrent processing mitigates this

Optimization Strategies

  1. Use caching: Regular runs benefit from cached results
  2. Adjust concurrency: Increase for faster processing
  3. Dry-run first: Test configuration without full processing
  4. Organize regularly: Smaller batches process faster

Architecture Diagram

┌─────────────────────────────────────────────────────────┐
│                     NoEntropy CLI                       │
│                   (Orchestrator)                        │
└────────────┬──────────────────────────────┬─────────────┘
             │                              │
    ┌────────▼─────────┐           ┌───────▼────────┐
    │  File Scanner    │           │  Config Manager │
    │  & Detector      │           │                 │
    └────────┬─────────┘           └────────────────┘
             │
    ┌────────▼──────────────────────────────────────┐
    │           Gemini AI Client                    │
    │  (with retry logic & concurrent processing)   │
    └────────┬──────────────────────────────────────┘
             │
    ┌────────▼─────────┐           ┌────────────────┐
    │  Cache System    │           │   Undo Log     │
    └──────────────────┘           └────────────────┘
             │
    ┌────────▼─────────┐
    │   File Mover     │
    └──────────────────┘

Next Steps


Back to Main README