Sequential Stories

The Show and Tell model is a image-to-text model for Tensorflow, developed by Google DeepMind and based on this paper, that takes an input and learns how to describe the content of images. This experimental iOS app uses this feature to generate a series of captions and create a story.

Example using stills from the 'The Gran Budapest Hotel' by Wes Anderson:

Setup

Install im2txt and its dependencies. Follow Edouard Fouché setup and used the same pre trained model described in his instructions. The only change was that in line 49 in im2txt/im2txt/inference_utils/vocabulary.py I didn't change this:

reverse_vocab = [line.split()[0] for line in reverse_vocab] # to:
reverse_vocab = [eval(line.split()[0]).decode() for line in reverse_vocab]

Download or clone this repo.
Install the app located in platforms/ios in Xcode. You can also run cordova plaform add ios from the root and then cordova prepare ios and then upload. (Install Cordova first)
Connect your phone to a Wifi network. Your computer should be connected to the same network.
Open the file server_im2txt.py and change line 15: ip = '172.16.220.255' to match the ip assigned by the network. (To know your ip type ifconfig | grep "inet " | grep -Fv 127.0.0.1 | awk '{print $2}' in OSX)
Run python server_im2txt.py
Open the app, and click the top left icon. Enter the same IP address from before. A green light should turn on the right top corner.

Running a MacBook Pro from 2014 it takes around 7 seconds to caption an image.

Dependencies:

Bazel
TensorFlow 1.0 or greater
NumPy
Natural Language Toolkit (NLTK)
Checkpoint

Versions

The file server_im2txt runs the im2txt model on every request from to the /upload route and returns a string with a sentence for the story. The app loads an image to the /upload folder.

The file server_lstm runs a classification model in keras and then a LSTM network trained on the 25 most download books from the Gutenberg Project. This was the first approach to the app and it's still a WIP.

Outputs

Interaction

Links

Original Model: Show and Tell: A Neural Image Caption Generator
Paper: Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge
Configuration: Tensorflow - im2txt

TODO

~~Configure IP from app.~~
Create Reacte Native version?
Add more nlp to the output or maybe add the lstm version to it?

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
__pycache__		__pycache__
hooks		hooks
im2txt		im2txt
models		models
platforms		platforms
plugins		plugins
source_text		source_text
uploads		uploads
www		www
.gitignore		.gitignore
README.md		README.md
config.xml		config.xml
echo		echo
lstm.py		lstm.py
lstm.pyc		lstm.pyc
phrases.py		phrases.py
server_im2txt.py		server_im2txt.py
server_lstm.py		server_lstm.py
verbs.py		verbs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sequential Stories

Setup

Versions

Outputs

Interaction

Links

TODO

About

Releases

Packages

Languages

cvalenzuela/sequential-stories

Folders and files

Latest commit

History

Repository files navigation

Sequential Stories

Setup

Versions

Outputs

Interaction

Links

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages