Discriminative power of the filter bank is further enhanced by enforcing the features from the same category to be close to each other in the feature space, while features from different categories to be far away from each other. We introduce a binary selection variable vector to adaptively select what filters to share, and among what categories. not all the patches from the same categories are close, as they are very diverse. not all the local patches from different classes should be forced to be separable. directly learning features from image pixel values [4–9,14–18] emerges as a hot research topic in computer vision because it is able to learn data adaptive features。 discriminative information can be critical for classification and discriminative patterns can be learned. By multiplying W with x, and applying an activation function F (·), we expect to generate feature f i = F (Wxi), which is discriminative and as compact as possible. We aim to minimize the distance between each feature to its positive nearest neighbours, while maximize the distance between each feature to its negative nearest neighbours |