Don't aim for success if you want it; just do what you love and believe in, and it will come naturally.
A QUICK LINK TO My Jumble of Computer Vision
Pub. Date:April 21, 2017, 9:20 a.m. Topic:Computer Vision Tag:Tool


A tool is implemented to merge caffemodels according to a config file. The source code can be found in this repo

Under development...


Any version of Caffe that has been successfully compiled.


A configuration file should be prepared. The file content should be like

[path to dst prototxt] [path to dst caffemodel]
[path to src prototxt] [path to src caffemodel] [auto_flag (0 or 1)] [num_pairs(int)] [src_layer_name_(1):dst_layer_name_(1)] ... [src_layer_name_(num_pairs):dst_layer_name_(num_pairs)]
[path to src prototxt] [path to src caffemodel] [auto_flag (0 or 1)] [num_pairs(int)] [src_layer_name_(1):dst_layer_name_(1)] ... [src_layer_name_(num_pairs):dst_layer_name_(num_pairs)]

The first line gives the path to wanted caffemodel's prototxt and the path to save the wanted caffemodel.

From the second line, source information are given. The first two strings gives the path to the source caffemodel's prototxt and the path to source caffemodel. Then a variable (0 or 1) is used to indicate whether to automatically copy weights when the src and dst layer names are same. (1 true, others false). The forth parameter tells how many pairs of src and dst layer would be processed. From then on, pairs are give with formant of "src_layer_name:dst_layer_name"

An example can be found below

dst.prototxt dst.caffemodel
src0.prototxt src0.caffemodel 1 0
src1.prototxt src1.caffemodel 0 1 layer_0:layer_dst_0

It means that all the layers in dst.caffemodel that share the same name with those in src0.caffemodel will be filled by the weights in src0.caffemodel as long as they have the exact same number of parameters. And the layer_dst_0 in dst.caffemodel will be filled by the weigths of layer_0 in src1.caffemodel.

An example code is as follows

#include "mergeModel.h"

int main()
    string config_file_path = "test_config.txt";
    MergeModelClass test_class(config_file_path);

    return 0;

Of course you can write your own main function, which can take inputs from command line.

NOTE only float type is supported.


A class is defined to merge caffemodels.

Now the project can copy src weights to dst weights automatically if the two layers have same name. Further test will be carried out to confirm weights can be copied from src caffemodel using pairs.

Pub. Date:April 18, 2017, 6:42 p.m. Topic:Computer Vision Tag:Reading Note

TITLE: FastMask: Segment Multi-scale Object Candidates in One Shot

AUTHOR: Hexiang Hu, Shiyi Lan, Yuning Jiang, Zhimin Cao, Fei Sha

ASSOCIATION: UCLA, Fudan University, Megvii Inc.

FROM: arXiv:1703.03872


  1. A novel weight-shared residual neck module is proposed to zoom out feature maps of CNN while preserving calibrated feature semantics, which enables efficient multi-scale training and inference.
  2. A novel scale-tolerant head module is proposed which takes advantage of attention model and significantly reduces the impact of background noises caused by unmatched receptive fields.
  3. A framework capable for one-shot segment proposal is made up, namely FastMask. The proposed framework achieves the the state-of-the-art results in accuracy while running in near real time on MS COCO benchmark.


Network Architecture

The network architecture is illustrated in the following figure.

With the base feature map, a shared neck module is applied recursively to build feature maps with different scales. These feature maps are then fed to a one-by-one convolution to reduce their feature dimensionality. Then we extract dense sliding windows from those feature maps and do a batch normalization across all windows to calibrate and redistribute window feature maps. With a feature map downscaled by factor $m$, a sliding window of size $(k, k)$ corresponds to a patch of $(m \times k, m \times k)$ at original image. Finally, a unified head module is used to decode these window features and produce the output confidence score as well as object mask.

Residual Neck

The neck module is actually used to downscale the feature maps so that features with different scales can be extracted.

There are another two choices. One is Max pooling neck, which produces uncalibrated feature in encoding pushing the mean of downscaled feature higher than original. The other one is Average pooling neck, which smoothes out discriminative feature during encoding, making the top feature maps appear to be blurry.

Residual neck is then proposed to learn parametric necks that preserve feature semantics. The following figure illustrates the method.

Attentional Head

Given the feature map of a sliding window as the input, a spatial attention is generated through a fully connected layer, which takes the entire window feature to generate the attention score for each spatial location on the feature map. The spatial attention is then applied to window feature map via the element-wise multiplication across channels. Such operation enables the head module to enhance features on the salient region, where is supposed to be the rough location of the target object. Finally, the enhanced feature map will be fed into a fully connected laye to decode the segmentation mask of the object. This module is illustrated in the following figure.

The feature pyramid is sparse in this work because of the downscale operation. The sparse feature pyramid raises the probability that there exists no suitable feature maps for an object to decode, and also raises the risk of introducing background noises when the object is decoded from an unsuitable feature map with too larger receptive field. So salient region is introduced in this head. With the capability of paying attention to the salient region, a decoding head could reduce the noises from the backgrounds of a sliding window and thus produce high quality segmentation results when the receptive field is unmatched with the scale of object. Also the salient region attention has the tolerance to shift disturbance.


  1. This work shares the similar idea with most one-shot alogrithms, extracting sliding window in the feature map and endcode them with a following network.
  2. How to extract sliding windows?

Pub. Date:April 17, 2017, 10:44 p.m. Topic:Life Discovery Tag:Little Things

This photo reflects my life.

I just wanted to take a picture of the flowers at first. When I had a better look at this photo, it turned out to be very interesting that it happened to record an epitome of my life.

There are too books, one of which is about algorithms while the other one introducing how to sketch. I need to develop my own core ability to survive in this world with fierce competition. On the other hand, a hobby is needed to enjoy life, which can help me forget troubles for a while and look down to my own heart to be a better man. With those two, beautiful flowers bloom in my life.

Pub. Date:April 15, 2017, 11 p.m. Topic:Life Discovery Tag:Little Things

Pub. Date:April 10, 2017, 8:13 p.m. Topic:Life Discovery Tag:Little Things


—— 杨坚华 《遇见德国》