🎉🎉🎉 Keras part is public now
Pull requests and issues: @litleCarl
A CoreML implementation of the image-to-text model described in the paper:
"Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge."
Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan.
IEEE transactions on pattern analysis and machine intelligence (2016).
Full text available at: http://arxiv.org/abs/1609.06647
let showAndTell = ShowAndTell()
let results = showAndTell.predict(image: uiimage2predict, beamSize: 3, maxWordNumber: 30)
// Parameter explaination
// image: The image to be used to generate the caption.
// beamSize: Max caption count in result to be reserved in beam search.(Affect the performance greatly)
// maxWordNumber: Max number of words in a sentence to be predicted.
class ShowAndTell {
...
func predict(image: UIImage, beamSize: Int = 3, maxWordNumber:Int = 20) -> PriorityQueue<Caption>
...
}
maxWordNumber = 20 | maxWordNumber = 30 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
Line chart for Time vs Beam Size (When maxWordNumber = 30
)
So it is recommeneded to set beamSize=1
on mobile devices due to less gpu/cpu time usage for saving battery life.
- iOS 11.0+
- Xcode 9.0+ (Swift 4.x)
This coreml model is exported from keras which is trained with MSCOCO dataset for about 40k steps. And presently it is not in the state of art yet. You may not use this in production. I trained the dataset with only one GTX Force 1080Ti for about 48 hours and currently don't have more time to train on it.Hope for community to keep on it.
- Train
For example:
python ./train.py --weight_path WEIGHT_FILE_PATH_TO_CONTINUE_TRAINING --TFRecord_pattern TFRECORD_FILE_PATTERN
python ./train.py --weight_path ./keras_weight/weights_full.h5 --TFRecord_pattern ./tfrecords/train-?????-of-00256
- Test
For example:
python ./inference.py --weight_path WEIGHT_FILE_PATH --image_path TEST_IMAGE_PATH --max_sentence_length 20
python ./inference.py --weight_path ./keras_weight/weights_full.h5 --image_path ./test.jpg --max_sentence_length 20
- Convert to CoreML Model
python ./convert_coreml.py --export_lstm False
export_lstm
determine whether to export the inception part or lstm part model.(The whole model is split into 2 parts. One for image encoding, one for decoding words)
Pretained Keras weight file will be uploaded to google driver in short time.
We use MS-COCO dataset, you can fetch raw data and build them into tfrecords according to the origin tensorflow im2txt
- Train on the dataset to 100k steps. (currently 40k)
Open source origin model based on Keras which is trained with.- More language support (Chinese).
- 曹佳鑫 (tsao)An iOS developer with experience in deep learning living in Shanghai.
- Pull requests and issues are welcome.
- Mail: cjx5813@foxmail.com
ShowAndTell is available under the MIT license. See the LICENSE file for more info.