gemini-api google-ai-hackathon google-generativeai streamlit-application

Multimodal image2story app (MI2S / MITS)

Logo image from: https://ideogram.ai/g/O8Jujvj3T5C4wftcDOgS8A/3

To-Do List

Additional Feature that i would add later

Chat With Story
Upload document to continue story

Motivation Behind Creating this Application

Hackathon Inspiration: This app was born from the challenges presented at the Google AI Hackathon. The aim was to create a cutting-edge app using Google's Generative AI tools, specifically focusing on pushing the boundaries of what Gen AI apps can do with Gemini.
Long-Held Passion: Since my teenage years, I've had a deep love for art, especially visual art. Although I'm not currently skilled in drawing, I've spent 2-3 years honing my basic art skills. While I can imagine stories, I prefer expressing them visually. With advancements in technology, like AI that can generate images from text prompts (text-to-image), I can now bring my creative narratives to life.
Embracing New Tech: The rise of advanced technologies such as multimodal models like Gemini-AI opens up endless possibilities. It enables me to turn any creative concept into reality. The idea is to use multimodal models to create stories inspired by images, removing the need for a custom model for image-to-story creation.

App Description

MI2S (Multimodal Image2Stories) is an innovative application designed to transform images into captivating narratives. This cutting-edge tool utilizes multimodal technology, combining visual and textual elements to generate short stories or even full-length novels based on input images. By leveraging Gemini-AI, MI2S analyzes the content, context, and emotions conveyed in the image to craft immersive and engaging storytelling experiences. Whether you're seeking to create compelling short stories or embark on novel-writing adventures, MI2S opens up endless possibilities for creative expression through the fusion of visual and literary arts.

Technologies

Python
Streamlit (python framework and deployment)
Gemini AI (access via API)
String to doc converter library like IO, docx, odt, pdf.

Journals/Reposities/App Reference & Inspiration

seiweiqing - image2story (Github Repo)
Let Storytelling Tell Vivid Stories: An Expressive and Fluent Multimodal Storyteller (Journal)
LLaVA: Large Language and Vision Assistant (Journal & Github Repo)
Photo story apps (Website apps)

My team:

Muhammad Rizqi: me, develop it alone. I take responsible why my code is dirty.
Muhammad Azka Nabhan Sauqi: he help me make a video about this app

Damn, i do this all alone. I hope it's works as expected. It's easy to finish project if i handle it myself. I code directly in Github because my laptop suck. That's why you'll find lot of commit in this repo.

About

Create captivating story from inspiring images. I made this to do challenge for Google AI Hackaton

https://multimodal-image2story.streamlit.app/

gemini-api google-ai-hackathon google-generativeai streamlit-application

Other

Languages

Language:Python 100.0%

Kingki19 / Multimodal-image2story-app