feat: Add comprehensive improvements - CLI, error handling, and docs

- Add CLI argument parsing with clap (dry-run, max-concurrent options)
- Replace .env configuration with interactive prompts and TOML config
- Add BaseDirs-based configuration storage in ~/.config/noentropy/
- Improve Gemini API client with configurable model and timeout
- Add concurrent processing with semaphore for rate limiting
- Improve error handling with retry logic and exponential backoff
- Add comprehensive README with installation and usage instructions
- Add config.example.toml template for users
- Update main.rs with better UX and colored output
- Add lib.rs exports for config module
- Refactor error response parsing for cleaner code
- Update API endpoint to use configurable model parameter
- Add proper error type handling in gemini_errors.rs
This commit is contained in:
2025-12-29 00:11:27 +05:30
parent bbf88fc4fc
commit 3cdcd33439
6 changed files with 530 additions and 92 deletions

368
README.md Normal file
View File

@@ -0,0 +1,368 @@
# NoEntropy 🗂️
> AI-powered file organizer that intelligently sorts your messy Downloads folder using Google Gemini API
![Rust](https://img.shields.io/badge/rust-2024-orange)
![License](https://img.shields.io/badge/license-MIT-blue)
![Platform](https://img.shields.io/badge/platform-Linux%20%7C%20macOS%20%7C%20Windows-lightgrey)
## About
NoEntropy is a smart command-line tool that organizes your cluttered Downloads folder automatically. It uses Google's Gemini AI to analyze files, understand their content, and categorize them into organized folder structures. Say goodbye to manually sorting through hundreds of downloads!
### Use Cases
- 📂 Organize a messy Downloads folder
- 🤖 Auto-categorize downloaded files by type and content
- 🔍 Smart sub-folder creation based on file content
- 🚀 Batch file organization without manual effort
- 💾 Reduce clutter and improve file system organization
## Features
- **🧠 AI-Powered Categorization** - Uses Google Gemini API for intelligent file sorting
- **📁 Automatic Sub-Folders** - Creates relevant sub-folders based on file content analysis
- **💨 Smart Caching** - Minimizes API calls with metadata-based caching (7-day expiry)
- **⚡ Concurrent Processing** - Parallel file inspection with configurable limits
- **👀 Dry-Run Mode** - Preview changes without moving any files
- **🔄 Retry Logic** - Exponential backoff for resilient API handling
- **📝 Text File Support** - Inspects 30+ text formats for better categorization
- **✅ Interactive Confirmation** - Review organization plan before execution
- **🎯 Configurable** - Adjust concurrency limits and model settings
## Prerequisites
- **Rust 2024 Edition** or later
- **Google Gemini API Key** - Get one at [https://ai.google.dev/](https://ai.google.dev/)
- A folder full of unorganized files to clean up!
## Installation
1. **Clone repository**
```bash
git clone https://github.com/yourusername/noentropy.git
cd noentropy
```
2. **Build the application**
```bash
cargo build --release
```
3. **Run the application**
On first run, NoEntropy will guide you through interactive setup:
```bash
./target/release/noentropy
```
Or manually create config file at `~/.config/noentropy/config.toml`:
```bash
cp config.example.toml ~/.config/noentropy/config.toml
nano ~/.config/noentropy/config.toml
```
## Configuration
NoEntropy stores configuration in `~/.config/noentropy/config.toml` following XDG Base Directory specifications.
### Configuration File Format
```toml
api_key = "AIzaSyDTEhAq414SHY094A5oy5lxNA0vhbY1O3k"
download_folder = "/home/user/Downloads"
```
| Setting | Description | Example |
|---------|-------------|---------|
| `api_key` | Your Google Gemini API key | `AIzaSy...` |
| `download_folder` | Path to folder to organize | `/home/user/Downloads` |
### Getting a Gemini API Key
1. Visit [Google AI Studio](https://ai.google.dev/)
2. Sign in with your Google account
3. Create a new API key
4. Copy the key to your configuration file
### Interactive Setup
NoEntropy provides an interactive setup on first run:
- **Missing API key?** → You'll be prompted to enter it
- **Missing download folder?** → You'll be prompted to specify it (with default suggestion)
- **Both missing?** → You'll be guided through complete setup
Configuration is automatically saved to `~/.config/noentropy/config.toml` after interactive setup.
## Usage
### Basic Usage
Organize your Downloads folder with default settings:
```bash
cargo run --release
```
### Dry-Run Mode
Preview what would happen without moving any files:
```bash
cargo run --release -- --dry-run
```
### Custom Concurrency
Adjust the number of concurrent API calls (default: 5):
```bash
cargo run --release -- --max-concurrent 10
```
### Combined Options
Use multiple options together:
```bash
cargo run --release -- --dry-run --max-concurrent 3
```
### Command-Line Options
| Option | Short | Default | Description |
|--------|-------|---------|-------------|
| `--dry-run` | None | `false` | Preview changes without moving files |
| `--max-concurrent` | None | `5` | Maximum concurrent API requests |
| `--help` | `-h` | - | Show help message |
## How It Works
NoEntropy follows a five-step process to organize your files:
```
┌─────────────────┐
│ 1. Scan Files │ → Read all files in DOWNLOAD_FOLDER
└────────┬────────┘
┌─────────────────────────┐
│ 2. Initial Categorization │ → Ask Gemini to categorize by filename
└────────┬────────────────┘
┌──────────────────────┐
│ 3. Deep Inspection │ → Read text files for sub-categories
│ (Concurrent) │ • Reads file content
│ │ • Asks AI for sub-folder
└────────┬──────────────┘
┌──────────────────────┐
│ 4. Preview & Confirm│ → Show organization plan
│ │ • Ask user approval
└────────┬──────────────┘
┌──────────────────────┐
│ 5. Execute Moves │ → Move files to organized folders
└──────────────────────┘
```
### Example Terminal Output
```bash
$ cargo run --release
Found 47 files. Asking Gemini to organize...
Gemini Plan received! Performing deep inspection...
Reading content of report.pdf...
Reading content of config.yaml...
Reading content of script.py...
Deep inspection complete! Moving Files.....
--- EXECUTION PLAN ---
Plan: image1.png -> Images/
Plan: document.pdf -> Documents/
Plan: setup.exe -> Installers/
Plan: notes.txt -> Documents/Notes/
Plan: config.yaml -> Code/Config/
Plan: script.py -> Code/Scripts/
Do you want to apply these changes? [y/N]: y
--- MOVING FILES ---
Moved: image1.png -> Images/
Moved: document.pdf -> Documents/
Moved: setup.exe -> Installers/
Moved: notes.txt -> Documents/Notes/
Moved: config.yaml -> Code/Config/
Moved: script.py -> Code/Scripts/
Organization Complete!
Files moved: 47, Errors: 0
Done!
```
## Supported Categories
NoEntropy organizes files into these categories:
| Category | Description |
|----------|-------------|
| **Images** | PNG, JPG, GIF, SVG, etc. |
| **Documents** | PDF, DOC, DOCX, TXT, MD, etc. |
| **Installers** | EXE, DMG, APP, PKG, etc. |
| **Music** | MP3, WAV, FLAC, M4A, etc. |
| **Archives** | ZIP, TAR, RAR, 7Z, etc. |
| **Code** | Source code and configuration files |
| **Misc** | Everything else |
## Supported Text Formats
NoEntropy can read and analyze the content of 30+ text file formats:
```
Source Code: rs, py, js, ts, jsx, tsx, java, go, c, cpp, h, hpp, rb, php, swift, kt, scala, lua, r, m
Web/Config: html, css, json, xml, yaml, yml, toml, ini, cfg, conf
Documentation: txt, md, sql, sh, bat, ps1, log
```
## Caching
NoEntropy includes an intelligent caching system to minimize API calls:
- **Location**: `.noentropy_cache.json` in project root
- **Expiry**: 7 days (old entries auto-removed)
- **Change Detection**: Uses file metadata (size + modification time) instead of full content hashing
- **Max Entries**: 1000 entries (oldest evicted when limit reached)
### How Caching Works
1. **First Run**: Files are analyzed and categorized via Gemini API
2. **Response Cached**: Organization plan saved with file metadata
3. **Subsequent Runs**:
- Checks if files changed (size/modification time)
- If unchanged, uses cached categorization
- If changed, re-analyzes via API
4. **Auto-Cleanup**: Removes cache entries older than 7 days
## Troubleshooting
### "API key not configured"
**Solution**: NoEntropy will prompt you for your API key on first run. Alternatively, manually create `~/.config/noentropy/config.toml`:
```toml
api_key = "your_actual_api_key"
download_folder = "/home/user/Downloads"
```
### "Download folder not configured"
**Solution**: NoEntropy will prompt you for the folder path on first run. Alternatively, manually add it to your config:
```toml
download_folder = "/path/to/your/Downloads"
```
### "API rate limit exceeded"
**Solution**:
- Wait a few minutes before trying again
- Reduce `--max-concurrent` to limit API calls
- Use caching to reduce redundant requests
### "Network error"
**Solution**:
- Check your internet connection
- Verify Gemini API service is operational
- Ensure firewall allows outbound HTTPS requests
### "Failed to move file"
**Solution**:
- Check file permissions
- Ensure destination folder is writable
- Verify source files still exist
### "Cache corrupted"
**Solution**: Delete `.noentropy_cache.json` and run again. A new cache will be created.
## Development
### Build in Debug Mode
```bash
cargo build
```
### Build in Release Mode
```bash
cargo build --release
```
### Run Tests
```bash
cargo test
```
### Run Clippy (Linting)
```bash
cargo clippy
```
### Check Code
```bash
cargo check
```
## Project Structure
```
noentropy/
├── src/
│ ├── main.rs # Entry point and CLI handling
│ ├── lib.rs # Library exports
│ ├── config.rs # Configuration management
│ ├── gemini.rs # Gemini API client
│ ├── gemini_errors.rs # Error handling
│ ├── cache.rs # Caching system
│ └── files.rs # File operations
├── Cargo.toml # Dependencies
├── config.example.toml # Configuration template
└── README.md # This file
```
## Future Enhancements
Based on community feedback, we're planning:
- [ ] **Custom Categories** - Define custom categories in `config.toml`
- [ ] **Recursive Mode** - Organize files in subdirectories with `--recursive` flag
- [ ] **Undo Functionality** - Revert file organization changes
- [ ] **Custom Models** - Support for other AI providers
- [ ] **GUI Version** - Desktop application for non-CLI users
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
- Built with [Rust](https://www.rust-lang.org/)
- Powered by [Google Gemini API](https://ai.google.dev/)
- Inspired by the endless struggle to keep Downloads folders organized
## Show Your Support
⭐ Star this repository if you find it useful!
---
Made with ❤️ by the NoEntropy team

9
config.example.toml Normal file
View File

@@ -0,0 +1,9 @@
# NoEntropy Configuration File
# Location: ~/.config/noentropy/config.toml
# Your Google Gemini API Key
# Get one at: https://ai.google.dev/
api_key = "your_api_key_here"
# Path to folder to organize (e.g., ~/Downloads)
download_folder = "/path/to/your/downloads"

View File

@@ -42,14 +42,28 @@ pub struct GeminiClient {
api_key: String, api_key: String,
client: Client, client: Client,
base_url: String, base_url: String,
model: String,
timeout: Duration,
} }
impl GeminiClient { impl GeminiClient {
pub fn new(api_key: String) -> Self { pub fn new(api_key: String) -> Self {
Self::with_model(api_key, "gemini-3-flash-preview".to_string())
}
pub fn with_model(api_key: String, model: String) -> Self {
Self { Self {
api_key, api_key,
client: Client::new(), client: Client::builder()
base_url: "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent".to_string(), .timeout(Duration::from_secs(30))
.build()
.unwrap_or_default(),
base_url: format!(
"https://generativelanguage.googleapis.com/v1beta/models/{}:generateContent",
model
),
model,
timeout: Duration::from_secs(30),
} }
} }
@@ -71,10 +85,10 @@ impl GeminiClient {
let url = format!("{}?key={}", self.base_url, self.api_key); let url = format!("{}?key={}", self.base_url, self.api_key);
// Check cache first if available // Check cache first if available
if let (Some(cache_ref), Some(base_path)) = (cache.as_ref(), base_path) { if let (Some(cache_ref), Some(base_path)) = (cache.as_ref(), base_path)
if let Some(cached_response) = cache_ref.get_cached_response(&filenames, base_path) { && let Some(cached_response) = cache_ref.get_cached_response(&filenames, base_path)
return Ok(cached_response); {
} return Ok(cached_response);
} }
// 1. Construct the Prompt // 1. Construct the Prompt
@@ -101,14 +115,19 @@ impl GeminiClient {
// 4. Parse // 4. Parse
if res.status().is_success() { if res.status().is_success() {
let gemini_response: GeminiResponse = res.json().await.map_err(GeminiError::NetworkError)?; let gemini_response: GeminiResponse =
res.json().await.map_err(GeminiError::NetworkError)?;
// Extract raw JSON string from Gemini using proper structs // Extract raw JSON string from Gemini using proper structs
let raw_text = &gemini_response.candidates let raw_text = &gemini_response
.get(0) .candidates
.ok_or_else(|| GeminiError::InvalidResponse("No candidates in response".to_string()))? .first()
.content.parts .ok_or_else(|| {
.get(0) GeminiError::InvalidResponse("No candidates in response".to_string())
})?
.content
.parts
.first()
.ok_or_else(|| GeminiError::InvalidResponse("No parts in content".to_string()))? .ok_or_else(|| GeminiError::InvalidResponse("No parts in content".to_string()))?
.text; .text;
@@ -147,6 +166,7 @@ impl GeminiClient {
) -> Result<reqwest::Response, GeminiError> { ) -> Result<reqwest::Response, GeminiError> {
let mut attempts = 0; let mut attempts = 0;
let max_attempts = 3; let max_attempts = 3;
let mut base_delay = Duration::from_secs(2);
loop { loop {
attempts += 1; attempts += 1;
@@ -160,21 +180,32 @@ impl GeminiClient {
let error = GeminiError::from_response(response).await; let error = GeminiError::from_response(response).await;
if error.is_retryable() && attempts < max_attempts { if error.is_retryable() && attempts < max_attempts {
if let Some(delay) = error.retry_delay() { let delay = error.retry_delay().unwrap_or(base_delay);
println!("API Error: {}. Retrying in {} seconds (attempt {}/{})", println!(
error, delay.as_secs(), attempts, max_attempts); "API Error: {}. Retrying in {} seconds (attempt {}/{})",
tokio::time::sleep(delay).await; error,
continue; delay.as_secs(),
} attempts,
max_attempts
);
tokio::time::sleep(delay).await;
base_delay *= 2;
continue;
} }
return Err(error); return Err(error);
} }
Err(e) => { Err(e) => {
if attempts < max_attempts { if attempts < max_attempts {
println!("Network error: {}. Retrying in {} seconds (attempt {}/{})", println!(
e, 5, attempts, max_attempts); "Network error: {}. Retrying in {} seconds (attempt {}/{})",
tokio::time::sleep(Duration::from_secs(5)).await; e,
base_delay.as_secs(),
attempts,
max_attempts
);
tokio::time::sleep(base_delay).await;
base_delay *= 2;
continue; continue;
} }
return Err(GeminiError::NetworkError(e)); return Err(GeminiError::NetworkError(e));
@@ -202,27 +233,45 @@ impl GeminiClient {
}] }]
}); });
let res = self.client.post(&url).json(&request_body).send().await; let res = match self.client.post(&url).json(&request_body).send().await {
Ok(res) => res,
Err(e) => {
eprintln!(
"Warning: Failed to get sub-category for {}: {}",
filename, e
);
return "General".to_string();
}
};
if let Ok(res) = res { if res.status().is_success() {
if res.status().is_success() { let gemini_response: GeminiResponse = match res.json().await {
let gemini_response: GeminiResponse = res.json().await.unwrap_or_default(); Ok(r) => r,
let sub_category = gemini_response.candidates Err(e) => {
.get(0) eprintln!("Warning: Failed to parse response for {}: {}", filename, e);
.and_then(|c| c.content.parts.get(0)) return "General".to_string();
.map(|p| p.text.trim())
.unwrap_or("General")
.to_string();
if sub_category.is_empty() {
"General".to_string()
} else {
sub_category
} }
} else { };
let sub_category = gemini_response
.candidates
.first()
.and_then(|c| c.content.parts.first())
.map(|p| p.text.trim())
.unwrap_or("General")
.to_string();
if sub_category.is_empty() {
"General".to_string() "General".to_string()
} else {
sub_category
} }
} else { } else {
eprintln!(
"Warning: API returned error for {}: {}",
filename,
res.status()
);
"General".to_string() "General".to_string()
} }
} }

View File

@@ -74,7 +74,6 @@ impl GeminiError {
pub async fn from_response(response: Response) -> Self { pub async fn from_response(response: Response) -> Self {
let status = response.status(); let status = response.status();
// Try to parse error response body
let error_text = match response.text().await { let error_text = match response.text().await {
Ok(text) => text, Ok(text) => text,
Err(e) => { Err(e) => {
@@ -82,12 +81,10 @@ impl GeminiError {
} }
}; };
// Try to parse structured error response
if let Ok(gemini_error) = serde_json::from_str::<GeminiErrorResponse>(&error_text) { if let Ok(gemini_error) = serde_json::from_str::<GeminiErrorResponse>(&error_text) {
return Self::from_gemini_error(gemini_error.error, status.as_u16()); return Self::from_gemini_error(gemini_error.error, status.as_u16());
} }
// Fallback to HTTP status code based errors
Self::from_status_code(status, &error_text) Self::from_status_code(status, &error_text)
} }
@@ -96,13 +93,11 @@ impl GeminiError {
match error_detail.status.as_str() { match error_detail.status.as_str() {
"RESOURCE_EXHAUSTED" => { "RESOURCE_EXHAUSTED" => {
if let Some(retry_info) = details.iter().find(|d| d.retry_delay.is_some()) { if let Some(retry_info) = details.iter().find(|d| d.retry_delay.is_some())
if let Some(retry_delay) = &retry_info.retry_delay { && let Some(retry_delay) = &retry_info.retry_delay
if let Ok(seconds) = retry_delay.parse::<u32>() { && let Ok(seconds) = retry_delay.parse::<u32>() {
return GeminiError::RateLimitExceeded { retry_after: seconds }; return GeminiError::RateLimitExceeded { retry_after: seconds };
} }
}
}
if let Some(quota_info) = details.iter().find(|d| d.quota_limit.is_some()) { if let Some(quota_info) = details.iter().find(|d| d.quota_limit.is_some()) {
let limit = quota_info.quota_limit.as_deref().unwrap_or("unknown"); let limit = quota_info.quota_limit.as_deref().unwrap_or("unknown");
@@ -177,7 +172,7 @@ impl GeminiError {
500 => GeminiError::InternalError { 500 => GeminiError::InternalError {
details: error_text.to_string() details: error_text.to_string()
}, },
502 | 503 | 504 => GeminiError::ServiceUnavailable { 502..=504 => GeminiError::ServiceUnavailable {
reason: error_text.to_string() reason: error_text.to_string()
}, },
_ => GeminiError::ApiError { _ => GeminiError::ApiError {
@@ -189,14 +184,14 @@ impl GeminiError {
/// Check if this error is retryable /// Check if this error is retryable
pub fn is_retryable(&self) -> bool { pub fn is_retryable(&self) -> bool {
match self { matches!(
GeminiError::RateLimitExceeded { .. } => true, self,
GeminiError::ServiceUnavailable { .. } => true, GeminiError::RateLimitExceeded { .. }
GeminiError::Timeout { .. } => true, | GeminiError::ServiceUnavailable { .. }
GeminiError::NetworkError(_) => true, | GeminiError::Timeout { .. }
GeminiError::InternalError { .. } => true, | GeminiError::NetworkError(_)
_ => false, | GeminiError::InternalError { .. }
} )
} }
/// Get retry delay for retryable errors /// Get retry delay for retryable errors
@@ -217,10 +212,9 @@ impl GeminiError {
fn extract_model_name(message: &str) -> String { fn extract_model_name(message: &str) -> String {
// Try to extract model name from error message // Try to extract model name from error message
// Example: "Model 'gemini-1.5-flash' not found" // Example: "Model 'gemini-1.5-flash' not found"
if let Some(start) = message.find('\'') { if let Some(start) = message.find('\'')
if let Some(end) = message[start + 1..].find('\'') { && let Some(end) = message[start + 1..].find('\'') {
return message[start + 1..start + 1 + end].to_string(); return message[start + 1..start + 1 + end].to_string();
} }
}
"unknown".to_string() "unknown".to_string()
} }

View File

@@ -1,4 +1,5 @@
pub mod cache; pub mod cache;
pub mod config;
pub mod files; pub mod files;
pub mod gemini; pub mod gemini;
pub mod gemini_errors; pub mod gemini_errors;

View File

@@ -1,36 +1,46 @@
use clap::Parser;
use colored::*; use colored::*;
use futures::future::join_all; use futures::future::join_all;
use noentropy::cache::Cache; use noentropy::cache::Cache;
use noentropy::config;
use noentropy::files::{FileBatch, OrganizationPlan, execute_move}; use noentropy::files::{FileBatch, OrganizationPlan, execute_move};
use noentropy::gemini::GeminiClient; use noentropy::gemini::GeminiClient;
use noentropy::gemini_errors::GeminiError; use noentropy::gemini_errors::GeminiError;
use std::path::{Path, PathBuf}; use std::path::Path;
use std::sync::Arc; use std::sync::Arc;
#[derive(Parser, Debug)]
#[command(author, version, about, long_about = None)]
struct Args {
#[arg(short, long, help = "Preview changes without moving files")]
dry_run: bool,
#[arg(
short,
long,
default_value_t = 5,
help = "Maximum concurrent API requests"
)]
max_concurrent: usize,
}
#[tokio::main] #[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> { async fn main() -> Result<(), Box<dyn std::error::Error>> {
dotenv::dotenv().ok(); let args = Args::parse();
let api_key = std::env::var("GEMINI_API_KEY") let api_key = config::get_or_prompt_api_key()?;
.map_err(|_| "GEMINI_API_KEY environment variable not set. Please set it in your .env file.")?; let download_path = config::get_or_prompt_download_folder()?;
let download_path_var = std::env::var("DOWNLOAD_FOLDER")
.map_err(|_| "DOWNLOAD_FOLDER environment variable not set. Please set it in your .env file.")?;
// 1. Setup
let download_path: PathBuf = PathBuf::from(download_path_var.to_string());
let client: GeminiClient = GeminiClient::new(api_key); let client: GeminiClient = GeminiClient::new(api_key);
// Initialize cache
let cache_path = Path::new(".noentropy_cache.json"); let cache_path = Path::new(".noentropy_cache.json");
let mut cache = Cache::load_or_create(cache_path); let mut cache = Cache::load_or_create(cache_path);
// Clean up old cache entries (older than 7 days)
cache.cleanup_old_entries(7 * 24 * 60 * 60); cache.cleanup_old_entries(7 * 24 * 60 * 60);
// 2. Get Files
let batch = FileBatch::from_path(download_path.clone()); let batch = FileBatch::from_path(download_path.clone());
if batch.filenames.is_empty() { if batch.filenames.is_empty() {
println!("No files found to organize!"); println!("{}", "No files found to organize!".yellow());
return Ok(()); return Ok(());
} }
@@ -39,7 +49,6 @@ async fn main() -> Result<(), Box<dyn std::error::Error>> {
batch.count() batch.count()
); );
// 3. Call Gemini for Initial Categorization
let mut plan: OrganizationPlan = match client let mut plan: OrganizationPlan = match client
.organize_files_with_cache(batch.filenames, Some(&mut cache), Some(&download_path)) .organize_files_with_cache(batch.filenames, Some(&mut cache), Some(&download_path))
.await .await
@@ -51,22 +60,26 @@ async fn main() -> Result<(), Box<dyn std::error::Error>> {
} }
}; };
println!("Gemini Plan received! Performing deep inspection..."); println!("{}", "Gemini Plan received! Performing deep inspection...".green());
// 4. Deep Inspection - Process files concurrently
let client = Arc::new(client); let client = Arc::new(client);
let semaphore = Arc::new(tokio::sync::Semaphore::new(args.max_concurrent));
let tasks: Vec<_> = plan.files.iter_mut() let tasks: Vec<_> = plan
.files
.iter_mut()
.zip(batch.paths.iter()) .zip(batch.paths.iter())
.map(|(file_category, path)| { .map(|(file_category, path)| {
let client = Arc::clone(&client); let client = Arc::clone(&client);
let filename = file_category.filename.clone(); let filename = file_category.filename.clone();
let category = file_category.category.clone(); let category = file_category.category.clone();
let path = path.clone(); let path = path.clone();
let semaphore = Arc::clone(&semaphore);
async move { async move {
if noentropy::files::is_text_file(&path) { if noentropy::files::is_text_file(&path) {
if let Some(content) = noentropy::files::read_file_sample(&path, 2000) { let _permit = semaphore.acquire().await.unwrap();
if let Some(content) = noentropy::files::read_file_sample(&path, 5000) {
println!("Reading content of {}...", filename.green()); println!("Reading content of {}...", filename.green());
client.get_ai_sub_category(&filename, &category, &content).await client.get_ai_sub_category(&filename, &category, &content).await
} else { } else {
@@ -79,22 +92,26 @@ async fn main() -> Result<(), Box<dyn std::error::Error>> {
}) })
.collect(); .collect();
// Wait for all concurrent tasks to complete
let sub_categories = join_all(tasks).await; let sub_categories = join_all(tasks).await;
// Apply the results back to the plan
for (file_category, sub_category) in plan.files.iter_mut().zip(sub_categories) { for (file_category, sub_category) in plan.files.iter_mut().zip(sub_categories) {
file_category.sub_category = sub_category; file_category.sub_category = sub_category;
} }
println!("Deep inspection complete! Moving Files....."); println!("{}", "Deep inspection complete! Moving Files.....".green());
// 5. Execute
execute_move(&download_path, plan); if args.dry_run {
println!("Done!"); println!(
"{} Dry run mode - skipping file moves.",
"INFO:".cyan()
);
} else {
execute_move(&download_path, plan);
}
println!("{}", "Done!".green().bold());
// Save cache before exiting
if let Err(e) = cache.save(cache_path) { if let Err(e) = cache.save(cache_path) {
println!("Warning: Failed to save cache: {}", e); eprintln!("Warning: Failed to save cache: {}", e);
} }
Ok(()) Ok(())