Files

glitchySid d4e8dbc6b3 docs: restructure documentation into organized files

Split the 630-line README.md into focused, well-organized documentation:

- README.md: Concise overview with quick start and links
- docs/INSTALLATION.md: Installation instructions and setup
- docs/CONFIGURATION.md: Configuration options and custom categories
- docs/USAGE.md: Command-line options and usage examples
- docs/HOW_IT_WORKS.md: Architecture and internal processes
- docs/TROUBLESHOOTING.md: Common issues and solutions
- docs/DEVELOPMENT.md: Project structure and development guide
- docs/CONTRIBUTING.md: Contribution guidelines and standards

Benefits:
- Main README is now clean and welcoming (~150 lines vs 630)
- Each doc has a clear, focused purpose
- Better navigation with cross-linking between docs
- Follows GitHub best practices with docs/ directory
- Easier to maintain and update specific sections

2026-01-02 00:55:29 +05:30

12 KiB

Raw Blame History

How NoEntropy Works

This guide explains the internal architecture and processes that power NoEntropy's intelligent file organization.

Overview

NoEntropy uses a multi-stage pipeline that combines AI-powered categorization with intelligent caching and concurrent processing to efficiently organize your files.

Organization Process

NoEntropy follows a five-step process to organize your files:

┌─────────────────┐
│  1. Scan Files  │ → Read all files in DOWNLOAD_FOLDER 
└────────┬────────┘   (and subdirs if --recursive flag is used)
         ▼
┌─────────────────────────┐
│ 2. Initial Categorization │ → Ask Gemini to categorize by filename
└────────┬────────────────┘
         ▼
┌──────────────────────┐
│  3. Deep Inspection   │ → Read text files for sub-categories
│     (Concurrent)      │   • Reads file content
│                       │   • Asks AI for sub-folder
└────────┬──────────────┘
         ▼
┌──────────────────────┐
│  4. Preview & Confirm│ → Show organization plan
│                       │   • Ask user approval
└────────┬──────────────┘
         ▼
┌──────────────────────┐
│   5. Execute Moves    │ → Move files to organized folders
└──────────────────────┘

Step 1: File Scanning

What happens:

Scans the configured download folder
Optionally scans subdirectories with --recursive flag
Collects file paths and metadata (size, modification time)
Filters out directories and focuses on files only

Output: List of file paths ready for categorization

Step 2: Initial Categorization

What happens:

Sends list of filenames to Gemini API
AI analyzes filenames and determines appropriate categories
Returns a categorization plan for all files
Uses custom categories if configured, otherwise uses defaults

AI Prompt includes:

List of all filenames
Available categories (default or custom)
Instructions to categorize based on file type and content
Request for main category assignment

Output: Initial organization plan with main categories

Step 3: Deep Inspection

What happens:

Identifies text-based files that can be read
Concurrently reads file contents (up to --max-concurrent files at once)
Sends content to Gemini AI for sub-folder suggestions
AI analyzes content and suggests relevant sub-categories
Applies intelligent retry logic with exponential backoff

Supported text file formats:

Source Code: rs, py, js, ts, jsx, tsx, java, go, c, cpp, h, hpp, rb, php, swift, kt, scala, lua, r, m
Web/Config: html, css, json, xml, yaml, yml, toml, ini, cfg, conf
Documentation: txt, md, sql, sh, bat, ps1, log

Why concurrent?

Processes multiple files simultaneously
Significantly reduces total processing time
Configurable concurrency limit prevents API rate limiting

Output: Enhanced organization plan with sub-folders

Step 4: Preview & Confirmation

What happens:

Displays complete organization plan to user
Shows source file and destination path for each file
Waits for user confirmation (y/n)
Allows user to review before any changes are made

User options:

Accept: Proceed with organization
Decline: Cancel and exit without changes

Output: User decision (proceed or abort)

Step 5: Execute Moves

What happens:

Creates destination directories as needed
Moves files to their designated locations
Records each move in the undo log
Reports success/failure for each operation
Displays final summary statistics

Safety features:

Only moves files after user confirmation
Tracks all operations for undo capability
Handles errors gracefully without stopping entire process
Creates parent directories automatically

Output: Organized files and execution summary

Caching System

NoEntropy includes an intelligent caching system to minimize API calls and improve performance.

Cache Design

Location: .noentropy_cache.json in project root
Format: JSON with file path as key
Expiry: 7 days (automatically cleaned up)
Max Entries: 1000 entries (LRU eviction)
Change Detection: File size + modification time (not content hash)

How Caching Works

First Run:
- Files are analyzed via Gemini API
- Categorization results are cached with metadata

Cache Check (subsequent runs):

File found in cache?
├─ No → Analyze via API, cache result
└─ Yes → File changed (size/time)?
    ├─ Yes → Re-analyze via API, update cache
    └─ No → Use cached categorization

Cache Maintenance:
- Removes entries older than 7 days on every run
- Evicts oldest entries when limit (1000) is reached
- Validates file still exists before using cache

Cache Benefits

Reduced API Costs: Avoids re-analyzing unchanged files
Faster Processing: No API call needed for cached files
Efficient: Metadata-based change detection (no content hashing)
Automatic Cleanup: Self-maintaining with age and size limits

When Cache is Invalidated

Cache entries are invalidated when:

File size changes
File modification time changes
Cache entry is older than 7 days
File no longer exists
Cache is manually deleted

Undo Log System

NoEntropy tracks all file moves to enable undo functionality.

Undo Log Design

Location: ~/.config/noentropy/data/undo_log.json
Format: JSON array of move records
Retention: 30 days (automatically cleaned up)
Max Entries: 1000 entries (oldest evicted)
Status Tracking: Completed, Undone, Failed states

Move Record Structure

Each file move is recorded with:

Source path (original location)
Destination path (new location)
Timestamp of move
Status (completed/undone/failed)

How Undo Works

During Organization:

For each file moved:
├─ Record source path
├─ Record destination path
├─ Record timestamp
└─ Mark as "completed"

Undo Execution:

Load undo log
├─ Filter "completed" moves (not already undone)
├─ Show preview to user
├─ Request confirmation
└─ If confirmed:
    ├─ Check destination exists
    ├─ Check source doesn't exist (avoid conflicts)
    ├─ Move file back to source
    ├─ Mark as "undone"
    └─ Clean up empty directories

Conflict Handling:
- Source exists: Skip restore (prevent overwrite)
- Destination missing: Skip restore (file was deleted)
- Permission error: Skip restore, report error

Undo Safety Features

Preview Before Action: Always shows what will be undone
Conflict Detection: Prevents data loss from overwrites
Missing File Handling: Gracefully skips deleted files
Partial Undo Support: Continues processing despite individual failures
Empty Directory Cleanup: Removes empty folders after undo
Dry-Run Mode: Preview undo without executing

Undo Limitations

Only tracks moves made by NoEntropy
Cannot track manual file operations
Limited to 30-day history
Cannot restore deleted files (only moves)

Supported File Categories

NoEntropy can organize files into these default categories:

Category	File Types
Images	PNG, JPG, JPEG, GIF, SVG, BMP, WEBP, ICO, TIFF
Documents	PDF, DOC, DOCX, TXT, MD, RTF, ODT, PAGES
Installers	EXE, DMG, APP, PKG, DEB, RPM, MSI, APK
Music	MP3, WAV, FLAC, M4A, AAC, OGG, WMA
Videos	MP4, AVI, MKV, MOV, WMV, FLV, WEBM
Archives	ZIP, TAR, GZ, RAR, 7Z, BZ2, XZ
Code	Source code and configuration files
Misc	Everything else

AI Integration

NoEntropy uses Google's Gemini API for intelligent categorization.

API Usage

Model: Gemini 1.5 Flash (configurable)
Concurrent Requests: 5 by default (configurable via --max-concurrent)
Retry Logic: Exponential backoff for failed requests
Rate Limiting: Respects API rate limits with configurable concurrency

Prompt Engineering

NoEntropy uses carefully crafted prompts to get accurate categorization:

Initial Categorization Prompt:
- Lists all filenames
- Specifies available categories
- Requests JSON response with categorization plan
Deep Inspection Prompt:
- Provides file content
- Requests sub-folder suggestion based on content
- Asks for semantic analysis, not just extension

Error Handling

Network Errors: Retry with exponential backoff
Rate Limiting: Respects limits, retries after delay
Invalid Responses: Logs error, continues with other files
Timeout: Configurable timeout with fallback behavior

Performance Characteristics

Factors Affecting Performance

Number of Files:
- 10-50 files: ~10-30 seconds
- 100-500 files: 1-3 minutes
- 1000+ files: 5-10 minutes
Concurrency Level:
- Higher = faster but more API load
- Lower = slower but safer for rate limits
- Default (5) balances speed and safety
Cache Hit Rate:
- High hit rate (>80%): Significantly faster
- Low hit rate (<20%): More API calls needed
- Regular usage improves hit rate over time
Text File Count:
- More text files = more deep inspection
- Deep inspection adds processing time
- Concurrent processing mitigates this

Optimization Strategies

Use caching: Regular runs benefit from cached results
Adjust concurrency: Increase for faster processing
Dry-run first: Test configuration without full processing
Organize regularly: Smaller batches process faster

Architecture Diagram

┌─────────────────────────────────────────────────────────┐
│                     NoEntropy CLI                       │
│                   (Orchestrator)                        │
└────────────┬──────────────────────────────┬─────────────┘
             │                              │
    ┌────────▼─────────┐           ┌───────▼────────┐
    │  File Scanner    │           │  Config Manager │
    │  & Detector      │           │                 │
    └────────┬─────────┘           └────────────────┘
             │
    ┌────────▼──────────────────────────────────────┐
    │           Gemini AI Client                    │
    │  (with retry logic & concurrent processing)   │
    └────────┬──────────────────────────────────────┘
             │
    ┌────────▼─────────┐           ┌────────────────┐
    │  Cache System    │           │   Undo Log     │
    └──────────────────┘           └────────────────┘
             │
    ┌────────▼─────────┐
    │   File Mover     │
    └──────────────────┘

Next Steps

Usage Guide - Learn how to use NoEntropy
Configuration Guide - Configure NoEntropy
Development Guide - Contribute to NoEntropy

Back to Main README

12 KiB Raw Blame History

How NoEntropy Works

Overview

Organization Process

Step 1: File Scanning

Step 2: Initial Categorization

Step 3: Deep Inspection

Step 4: Preview & Confirmation

Step 5: Execute Moves

Caching System

Cache Design

How Caching Works

Cache Benefits

When Cache is Invalidated

Undo Log System

Undo Log Design

Move Record Structure

How Undo Works

Undo Safety Features

Undo Limitations

Supported File Categories

AI Integration

API Usage

Prompt Engineering

Error Handling

Performance Characteristics

Factors Affecting Performance

Optimization Strategies

Architecture Diagram

Next Steps

12 KiB

Raw Blame History