Anil-matcha/Open-Generative-AI

This document details Open Generative AI, a self-hostable, open-source AI image and video generation studio. Here's a comprehensive breakdown, categorized for clarity:

I. Core Philosophy & Differentiation

Open Source & Community Driven: Unlike proprietary platforms, Open Generative AI is built on open-source principles, fostering community contribution and customization. Unfiltered Creativity: A key differentiator – it has no content filters, granting users full creative control. Self-Hostable: Users can run the entire studio locally, ensuring data privacy and independence. Extensible & Customizable: The architecture is designed for hacking and modification, allowing for personalized workflows and integrations. Cost-Effective: It's free to use (beyond hosting costs) unlike subscription-based alternatives. Broad Model Support: Access to over 200 open-source and commercial AI models.

II. Functionality & Features (The "Studio")

The core of Open Generative AI is the "Studio", offering various specialized interfaces:

Image Studio (T2I/I2I): Text-to-Image and Image-to-Image generation. Supports dual-mode operation. Video Studio (T2V/I2V): Text-to-Video and Image-to-Video generation.

Also dual-mode. Lip Sync Studio: Creates talking head videos from images/videos and audio. Offers two modes: Image + Audio -> Talking Video Video + Audio -> Lipsync Video.

Utilizes 9 different models. Cinema Studio: Photorealistic cinematic shot creation with professional camera controls: Cameras: Modular 8K Digital, Full-Frame Cine Digital, Grand Format 70mm Film, etc. Lenses: Creative Tilt, Compact Anamorphic, Vintage Prime, etc. Focal Lengths: 8mm - 85mm (covering ultra-wide to tight portrait). Apertures: f/1.4 - f/11 (controlling depth of field). Workflow Studio: Visual node-based editor for building and automating multi-step AI pipelines. Includes community templates. Upload History & Picker: Local storage of uploaded images for quick reuse. Supports up to 14 images for multi-image models.

III. Technical Architecture

Monorepo: A Next.js monorepo with a shared `packages/studio` component library. Next.js 14: Utilizes the App Router and server components for performance and scalability. React 18: Used for building the Studio UI components. Tailwind CSS v3: Provides utility-first styling. npm Workspaces: Manages dependencies and builds shared packages. Muapi.ai: The underlying API gateway for accessing AI models. API Integration: Submit: POST requests to `/api/v1/{model-endpoint}` with prompt and parameters. Poll: GET requests to `/api/v1/predictions/{request_id}/result` to check status. Authentication: Uses the `x-api-key` header. * File Uploads: POST to `/api/v1/upload_file` (multipart/form-data).

IV. Setup & Development

Prerequisites: Node.js (v18+) and a Muapi.ai API key. Installation (for contributors): 1. `git clone --recurse-submodules https://github.com/Anil-matcha/Open-Generative-AI.git` 2. `cd Open-Generative-AI` 3. `npm run setup` (builds workspace packages) Development: `npm run electron:dev` (Desktop App - recommended) `npm run dev` (Hosted Web Version) Production Build: `npm run build` & `npm run start` * Desktop App Build: Commands available for macOS, Windows, and Linux.

V. Supported Model Categories (with examples)

Text-to-Image (50+): Flux Dev, Nano Banana 2, Seedream 5.0, Midjourney v7, GPT-4o. Image-to-Image (55+): Nano Banana 2 Edit, Flux Kontext Pro, GPT-4o Edit, Seededit v3. Text-to-Video (40+): Kling v3, Sora 2, Veo 3, Wan 2.6, Seedance 2.0. Image-to-Video (60+): Kling v2.1 I2V, Veo3 I2V, Runway I2V, Seedance 2.0 I2V. * Lip Sync (9): Infinite Talk I2V, Wan 2.2 Speech to Video, LTX 2.3 Lipsync.

VI. Comparison with Other Platforms

| Feature | Other Providers | Open Generative AI | |-------------------|-----------------|--------------------| | Cost | Subscription | Free (Open Source) | | Content Filters| Yes | None | | Restrictions | Platform Guardrails | Full Creative Freedom | | Models | Proprietary | 200+ Open & Commercial| | Multi-Image Input| Limited | Up to 14 images | | Lip Sync | No | 9 models | | Hosted Version| Subscription | Free at muapi.ai | | Self-Hosting | No | Yes | | Customizable | No | Fully Hackable | | Data Privacy | Cloud-based | Your Data Stays Local| | Source Code | Closed | MIT Licensed |

VII. Future Development (Mentioned in the document)

"AI Influencer" Engine: (Details not provided) "Popcorn" Storyboarding Features: (Details not provided)

In conclusion, Open Generative AI presents a compelling alternative to closed-source AI image and video platforms. Its open-source nature, lack of content filtering, and self-hostable architecture empower users with unpreced...