Managing large collections of documents and images efficiently requires automation, especially when aiming for AI-driven categorization and quality assessment. Python, combined with generative AI models like Gemini, provides a robust framework to begin automating these complex file management tasks.

The Problem

Beginners often face the challenge of analyzing, categorizing, and managing thousands of PDF and image files, including tasks like renaming, duplicate removal, quality filtering, and thematic organization, all while leveraging a generative AI like Google Gemini without prior programming experience. A direct, simple method to integrate Python for these operations is sought.

The Solution

The following Python script demonstrates how to list files in a specified directory and initiate basic textual analysis or categorization using the Google Gemini API. This serves as a foundational step for more advanced file management workflows.

import os
import google.generativeai as genai
from dotenv import load_dotenv

# Load environment variables (e.g., your GOOGLE_API_KEY)
load_dotenv()

# Configure the Google Gemini API with your API key
# Ensure GOOGLE_API_KEY is set in your environment variables or a .env file
try:
    genai.configure(api_key=os.environ["GOOGLE_API_KEY"])
except KeyError:
    print("Error: GOOGLE_API_KEY environment variable not set.")
    print("Please set your API key to access the Gemini API.")
    exit()

# Initialize the generative model
model = genai.GenerativeModel('gemini-pro') # Use 'gemini-pro-vision' for multimodal tasks if needed

def analyze_and_categorize_files_basic(directory_path: str):
    """
    Lists PDF and common image files in a directory and prompts Gemini
    for a basic categorization suggestion based on filename and type.

    Args:
        directory_path (str): The path to the directory containing files.
    """
    if not os.path.isdir(directory_path):
        print(f"Error: Directory '{directory_path}' not found.")
        return

    print(f"Scanning directory: {directory_path}")
    processed_count = 0

    # Define file extensions to process
    file_extensions = ('.pdf', '.jpg', '.jpeg', '.png', '.gif', '.bmp', '.tiff', '.webp')

    for root, _, files in os.walk(directory_path):
        for file in files:
            if file.lower().endswith(file_extensions):
                full_path = os.path.join(root, file)
                file_type = file.split('.')[-1].upper()

                print(f"\nProcessing file: {file}")

                # Construct a simple prompt for Gemini based on filename and type
                # For actual content analysis of PDFs/images, you would need
                # additional libraries to extract text/prepare images and
                # use a multimodal model (e.g., 'gemini-pro-vision').
                prompt = (
                    f"Given a file named '{file}' of type '{file_type}', "
                    "suggest a few thematic categories for it. "
                    "Also, briefly mention if it seems like a document or an image."
                )

                try:
                    response = model.generate_content(prompt)
                    print(f"Gemini's analysis: {response.text}")
                    # In a real scenario, you'd parse response.text to automate folder creation, etc.
                    processed_count += 1
                except Exception as e:
                    print(f"Failed to get Gemini response for '{file}': {e}")
                    print("This might be due to API rate limits, invalid API key, or network issues.")

                # Optional: Add a delay to avoid hitting API rate limits
                # import time
                # time.sleep(0.1) 

    if processed_count == 0:
        print("No eligible PDF or image files found for processing.")
    else:
        print(f"\nCompleted basic analysis for {processed_count} files.")

# --- Execution Example ---
if __name__ == "__main__":
    # Create a dummy directory and files for testing if they don't exist
    test_dir = "files_to_process"
    if not os.path.exists(test_dir):
        os.makedirs(test_dir)
        # Create dummy files
        with open(os.path.join(test_dir, "meeting_minutes_2023.pdf"), "w") as f: f.write("dummy")
        with open(os.path.join(test_dir, "vacation_photo_beach.jpg"), "w") as f: f.write("dummy")
        with open(os.path.join(test_dir, "project_report_v2.pdf"), "w") as f: f.write("dummy")
        with open(os.path.join(test_dir, "receipt_groceries.png"), "w") as f: f.write("dummy")
        print(f"Created a sample '{test_dir}' directory with dummy files.")

    # Call the function with your target directory
    # Replace 'files_to_process' with the actual path to your files
    analyze_and_categorize_files_basic(test_dir)

Why It Works

  • File System Interaction (os module): The os module is a standard Python library providing functions for interacting with the operating system, including navigating directories (os.walk), joining paths (os.path.join), and checking for directory existence (os.path.isdir). This allows the script to efficiently locate and iterate through all specified files.
  • Generative AI Integration (google.generativeai): The google.generativeai library provides a Python client for Google’s Gemini API. By configuring it with an API key, the script can send textual prompts to a Gemini model (e.g., gemini-pro) and receive AI-generated responses, enabling a rudimentary form of content analysis or categorization.
  • Environment Variables (dotenv): Using python-dotenv (or directly setting environment variables) is a security best practice for managing sensitive information like API keys. It keeps credentials out of the main codebase, preventing accidental exposure.
  • Modular Design: The solution is encapsulated within a function, analyze_and_categorize_files_basic, promoting reusability and clarity. This structured approach facilitates incremental development for more complex features like full PDF text extraction or image content analysis using additional libraries.

Reference