Merge pull request #6 from glitchySid/refactor

Refactor
2025-12-30 19:30:03 +05:30
parent 66a665f336 9843303d9a
commit 5a18edf3da
5 changed files with 112 additions and 70 deletions
--- a/HACKATHON_REVIEW.md
+++ b/HACKATHON_REVIEW.md
@@ -1,56 +0,0 @@
 # Hackathon Review: noentropy
 ## Overall Assessment
 **Score: 8.5/10**
 `noentropy` is a highly effective and impressive hackathon project. Its core concept of using a Large Language Model (LLM) to automate the tedious task of file organization is both innovative and genuinely useful. The project is well-scoped for a hackathon, demonstrating a complete and functional loop from analyzing files to executing a plan.
 ### Strengths
 *   **High "Wow" Factor:** Demonstrates a practical and intelligent use of AI that solves a common problem. It's the kind of project that gets people excited.
 *   **Practical Usefulness:** This isn't just a technical demo; it's a tool that people would actually want to use to manage their cluttered "Downloads" folders.
 *   **Solid Technical Foundation:** The choice of Rust with `tokio` for asynchronous API calls is a good one, showing technical competence. The interaction with the Gemini API is direct and effective.
 *   **Complete End-to-End Loop:** The program successfully scans files, communicates with an external API, parses the response, and acts on it.
 ## Suggested Improvements for a Winning Edge
 This project is already strong, but the following improvements could elevate it from a great project to a potential winner.
 ### High-Impact Improvements
 1.  **Configuration File for Categories:**
    *   **Problem:** The file categories (`Images`, `Documents`, etc.) are currently hardcoded in the prompt. This is inflexible.
    *   **Solution:** Create a `config.toml` file where users can define their own categories and maybe even provide rules (e.g., "all `.jpg` files go to `Photos`"). This would make the tool dramatically more powerful and personalizable.
 2.  **Dry-Run Mode:**
    *   **Problem:** Users, especially first-time users, will be hesitant to run a tool that automatically moves their files without knowing what it's going to do.
    *   **Solution:** Add a `--dry-run` command-line flag. In this mode, the tool should print out the proposed file movements without actually touching any files. For example: `[DRY RUN] Would move 'report.pdf' to 'Documents/'`.
 3.  **Interactive Mode:**
    *   **Problem:** The current process is fully automated. What if the AI makes a mistake?
    *   **Solution:** Add an `--interactive` flag. After getting the plan from Gemini, the tool could present the plan to the user and ask for confirmation for each move or for categories of moves. `Move 5 files to 'Images'? [Y/n]`.
 ### Technical & Robustness Improvements
 4.  **Correct the Model Name:**
    *   In `src/gemini.rs`, the model `gemini-3-flash-preview` is likely a typo. It should probably be `gemini-1.5-flash-preview` or another valid, available model.
 5.  **Robust API Response Parsing:**
    *   **Problem:** The code manually traverses the JSON response from Gemini. If the API response structure changes even slightly, the program will crash.
    *   **Solution:** Define Rust structs that mirror the *entire* Gemini API response and use `serde` to deserialize into them. This is far more resilient to API changes.
 6.  **Eliminate `.expect()`:**
    *   **Problem:** The code uses `.expect()` in several places (e.g., for environment variables and creating directories). This can cause the program to panic unexpectedly.
    *   **Solution:** Replace `.expect()` calls with proper `Result` handling and provide more user-friendly error messages. For example, if the `DOWNLOAD_FOLDER` isn't set, print a clear message telling the user how to set it.
 7.  **More Context for the LLM:**
    *   **Problem:** Sending only filenames might not be enough for accurate categorization. Is `resume.pdf` a document or something else?
    *   **Solution:** To improve accuracy, consider sending more metadata to Gemini. The prompt could include file size, creation date, or even the first few lines of text for file types like `.txt` or `.md`. (This would require more complex file handling but would make the AI's job easier).
 ### Feature Expansion
 8.  **Recursive Folder Processing:**
    *   Add a `--recursive` or `-r` flag to allow the tool to organize files in subdirectories as well, not just the top-level directory.
 By implementing a few of these suggestions, particularly the high-impact ones, `noentropy` could be a truly standout project. Great work!
--- a/README.md
+++ b/README.md
@@ -439,22 +439,61 @@ cargo check
 ## Project Structure
 NoEntropy follows a clean modular architecture for better maintainability and testability:
 ```
 noentropy/
 ├── .github/
 │   └── workflows/
 │       └── rust.yml              # CI/CD workflow
 ├── src/
-│   ├── main.rs           # Entry point and CLI handling
+│   ├── cli/
-│   ├── lib.rs            # Library exports
+│   │   ├── mod.rs                # CLI module exports
-│   ├── config.rs         # Configuration management
+│   │   ├── args.rs               # Command-line argument definitions
-│   ├── gemini.rs         # Gemini API client
+│   │   └── orchestrator.rs       # Organization & undo orchestration
-│   ├── gemini_errors.rs  # Error handling
+│   ├── files/
-│   ├── cache.rs          # Caching system
+│   │   ├── mod.rs                # File module exports
-│   ├── files.rs          # File operations
+│   │   ├── batch.rs              # File batch processing
-│   └── undo.rs          # Undo functionality
+│   │   ├── detector.rs           # File type detection
-├── Cargo.toml            # Dependencies
+│   │   ├── mover.rs              # File moving operations
-├── config.example.toml    # Configuration template
+│   │   └── undo.rs               # Undo file operations
-└── README.md             # This file
+│   ├── gemini/
 │   │   ├── mod.rs                # Gemini API module exports
 │   │   ├── client.rs             # Gemini API client
 │   │   ├── errors.rs             # Gemini error handling
 │   │   ├── prompt.rs             # AI prompt construction
 │   │   └── types.rs              # Gemini API types
 │   ├── models/
 │   │   ├── mod.rs                # Data models exports
 │   │   ├── metadata.rs           # File metadata structures
 │   │   ├── move_record.rs        # File move tracking
 │   │   └── organization.rs       # Organization plan structures
 │   ├── settings/
 │   │   ├── mod.rs                # Settings module exports
 │   │   ├── config.rs             # Configuration management
 │   │   ├── prompt.rs             # Interactive configuration prompts
 │   │   └── tests.rs              # Settings tests
 │   ├── storage/
 │   │   ├── mod.rs                # Storage module exports
 │   │   ├── cache.rs              # Caching system
 │   │   └── undo_log.rs           # Undo log management
 │   ├── main.rs                   # Application entry point
 │   └── lib.rs                    # Library exports
 ├── Cargo.toml                    # Dependencies and project metadata
 ├── Cargo.lock                    # Dependency lock file
 ├── config.example.toml           # Configuration template
 └── README.md                     # This file
 ```
 ### Module Overview
 - **cli/** - Command-line interface and orchestration logic for organizing and undoing operations
 - **files/** - File detection, batching, moving, and undo operations with concurrent processing
 - **gemini/** - Google Gemini API integration with retry logic and intelligent prompt engineering
 - **models/** - Core data structures for file metadata, move records, and organization plans
 - **settings/** - Configuration management with interactive prompts and XDG directory support
 - **storage/** - Persistent data layer for caching API responses and tracking undo history
 ## Future Enhancements
 Based on community feedback, we're planning:
--- a/src/cli/orchestrator.rs
+++ b/src/cli/orchestrator.rs
@@ -121,7 +121,7 @@ pub async fn handle_organization(
    );
    let mut plan: OrganizationPlan = match client
-        .organize_files_with_cache(batch.filenames, Some(&mut cache), Some(&download_path))
+        .organize_files_in_batches(batch.filenames, Some(&mut cache), Some(&download_path))
        .await
    {
        Ok(plan) => plan,
--- a/src/files/batch.rs
+++ b/src/files/batch.rs
@@ -45,7 +45,6 @@ impl FileBatch {
 mod tests {
    use super::*;
    use std::fs::{self, File};
    use std::path::Path;
    #[test]
    fn test_file_batch_from_path() {
--- a/src/gemini/client.rs
+++ b/src/gemini/client.rs
@@ -9,8 +9,9 @@ use std::path::Path;
 use std::time::Duration;
 const DEFAULT_MODEL: &str = "gemini-3-flash-preview";
-const DEFAULT_TIMEOUT_SECS: u64 = 30;
+const DEFAULT_TIMEOUT_SECS: u64 = 120;
 const MAX_RETRIES: u32 = 3;
 const BATCH_SIZE: usize = 50;
 pub struct GeminiClient {
    api_key: String,
@@ -89,6 +90,65 @@ impl GeminiClient {
        Ok(plan)
    }
    /// Organizes files in batches to handle large file lists efficiently.
    ///
    /// When the number of files exceeds BATCH_SIZE, splits them into smaller
    /// chunks to avoid API timeout and payload size issues. Each batch is
    /// processed sequentially with progress feedback.
    ///
    /// # Arguments
    /// * `filenames` - Vector of filenames to organize
    /// * `cache` - Optional cache for storing/retrieving results
    /// * `base_path` - Optional base path for cache keys
    ///
    /// # Returns
    /// A combined `OrganizationPlan` with all files categorized
    pub async fn organize_files_in_batches(
        &self,
        filenames: Vec<String>,
        mut cache: Option<&mut Cache>,
        base_path: Option<&Path>,
    ) -> Result<OrganizationPlan, GeminiError> {
        // No batching needed for small file lists
        if filenames.len() <= BATCH_SIZE {
            return self
                .organize_files_with_cache(filenames, cache, base_path)
                .await;
        }
        let total_files = filenames.len();
        let batches: Vec<Vec<String>> = filenames
            .chunks(BATCH_SIZE)
            .map(|chunk| chunk.to_vec())
            .collect();
        let total_batches = batches.len();
        println!(
            "Processing {} files in {} batches...",
            total_files, total_batches
        );
        let mut all_files = Vec::with_capacity(total_files);
        for (batch_index, batch) in batches.into_iter().enumerate() {
            let batch_num = batch_index + 1;
            println!(
                "Processing batch {}/{} ({} files)...",
                batch_num,
                total_batches,
                batch.len()
            );
            let plan = self
                .organize_files_with_cache(batch, cache.as_deref_mut(), base_path)
                .await?;
            all_files.extend(plan.files);
        }
        Ok(OrganizationPlan { files: all_files })
    }
    fn build_url(&self) -> String {
        format!("{}?key={}", self.base_url, self.api_key)
    }