We have recently witnessed an explosive growth in machine learning research focused on modelling and real-world inference problems. Notably, deep learning models such as deep neural networks (DNN) are a particularly powerful and biologically inspired class of learning algorithms that have consistently demonstrated state-of-the-art performance on tasks such as object recognition, image classification, image segmentation, and speech recognition. A particular type of DNN that has proven to be very effective in recent year are convolutional neural networks (CNNs) (see (Hubel & Wiesel, 1968)) which are architecturally made up of layers of neurons modelled after simple and complex cells in the visual cortex.
In order to train a DNN for a task such as classification, the synaptic strengths of the network are optimized based on training data. Optimizing a large-scale artificial neural architecture such as a CNN for classification in a generalizable manner, however, requires on a large number of input image samples. This may be prohibitive in many practical scenarios where labeled data is limited. To ameliorate this dependence, we explore whether it is possible to sidestep the training of a large portion of learnable parameters—synaptic strengths—in a neural network. More particularly, we are motivated by (Eliasmith et al., 2012) where strong modelling and inference performance was exhibited when random synaptic strengths are leveraged in modelling of functional brain computationally. This suggests that the inherent structure of deep neural networks may itself be enough to elicit a powerful modelling and inference performance even when the formation of synaptic strengths are random.
In particular, we draw inspiration from a number of studies that investigated the distribution of synaptic strengths in the biological brain. For example, it has been observed that the synaptic strengths of certain synapses such as the excitatory synapses can be well modelled as random variables following well-known distributions such as truncated Gaussians (Barbour, Brunel, Hakim, & Nadal, 2007). Furthermore, Song et al. (Song, Sj¨ostr¨om, Reigl, Nelson, & Chklovskii, 2005) found that the underlying synaptic strengths follows a log-normal distributions. Other studies (Martinez & Alonso, 2003; Cheong, Tailby, Solomon, & Martin, 2013) suggested a correlated relationship between synaptic strengths in earlier layers of the visual cortex, specifically circular concentric receptive fields modelled after Lateral Geniculate (LGN) cells.
Inspired by the aforementioned observations (Song et al., 2005; Martinez & Alonso, 2003; Cheong et al., 2013), we perform an exploratory study on different uncorrelated and correlated probabilistic generative models for synaptic strength formation in deep neural networks and the potential influence of different distributions on modelling performance particularly for the scenario associated with small data sets.
Here we model the synaptic strength distribution of the deep neural network as is the set of synaptic strengths
is the number of synapses. In order to explore the effect of different probabilistic generative models for synaptic formation on modelling and inference performance in a focused manner, in this study we restrict the network architecture to be a convolutional neural network (CNN) architecture. More specifically, the synaptic strengths in the convolutional layers are synthesized based on
not fine-tuned, whereas the synaptic strengths of fully connected layers are synthesized and then trained to reach to their complete modelling capabilities. This setup allows us to localize the effect of
on synaptic strengths and fairly compare the modelling and inference performance of different synaptic formation drawn from various underlying biologicallyinspired probability distributions. Furthermore, each random variable corresponding to a synaptic strength denoted as
are drawn from a probabilistic generative model
study, we explore three different distribution models based on past biological studies:
I Normal Gaussian:
This approach to synapse strength formation can enable a drastic reduction in the number of parameters that need to be trained, which is an important factor in scenarios with small number of training data.
Experimental Setup
Followed by biological observations, the effect of three different are examined on a same convolutional neural
Table 1: Impact of different probabilistic generative models for synaptic strength generation on modelling performance for 3 small datasets (see text on how datasets were generated). The synaptic strengths of the convolutional layers were generated from distributions describing synaptic strengths in the visual cortex. The convolutional layer synapses are frozen and not trained, whereas the fully connected layers of the CNN are trained over. Highest performing setups are in bold.
network (CNN) architecture here: I) normal Gaussian distribution, II) log-normal Gaussian distribution (from (Song et al., 2005)), and III) correlated center-surround distribution.
In order to experiment the effect of different synaptic strength distributions on modelling performance, a CNN is utilized consisting of a convolutional layer comprising of 64 kernels with receptive fields of size , a max-pooling layer with stride 2, and a rectified non-linear unit, as well as two fully connected layers inspired by LeNet’s fully connected layer architecture (LeCun, Bottou, Bengio, & Haffner, 1998) and have a
structure (input - hidden - output).
In this exploratory study, we examined three standard and publicly available object classification datasets including MNIST hand-written digits (LeCun et al., 1998), Street View House Numbers SVHN (Netzer et al., 2011), and CIFAR-10 object recognition dataset (Krizhevsky & Hinton, 2009) for the scenario of small training datasets. To mimic such a scenario 38 samples per each class label (i.e., 10 class labels for each dataset) were randomly selected from the available training data in each dataset to form a small dataset. However to compute the test accuracy, the models are tested with all available testing samples. The reported results (mean and std) are computed based on three runs.
Table 1 summarizes the results of our experiments. We also report the classification performance of the same CNN architecture on these datasets where the CNN is completely trained, and all synaptic strengths are fine-tuned. As expected, the small number of training samples (i.e., 38 per class) results in the CNN’s relatively poor classification performance, as is evident from the right-most column of Table 1 named “Fully Trained”.
Interestingly, sampling the convolutional synaptic strengths from a normal Gaussian distribution (“Normal” column) yields a classification performance comparable to that of “Fully Trained” for CIFAR-10 and SVHN. The most surprising of the preliminary results can be seen in the “Log-Normal” and “Center-Surround” columns. One possibility that these results suggest is that sampling the synaptic strengths of a CNN from well-known distributions that model synaptic strengths in the visual cortex can result in a classification system that potentially outperforms carefully fine-tuned CNNs on small datasets. This may suggest that in the scenario with very little data, learning a generalizable classification system may not be worth the effort put into training as the performance may be outperformed by random convolutional synaptic strengths. This result is a powerful first step towards designing deep neural networks that do not require many data samples to learn, and can sidestep / reduce the burden of current training procedures while maintaining or boosting classification and modelling performance. In future work, we are excited to explore this same effect on deeper networks with more synapses, and to investigate how and whether these synaptic strength distributions may be used to design more efficient architectures and training algorithms.
This work was supported by the Natural Sciences and Engineering Research Council of Canada, Ontario Ministry of Economic Development and Innovation and Canada Research Chairs Program. The authors also thank Nvidia for the GPU hardware used in this study through the Nvidia Hardware Grant Program.
Barbour, B., Brunel, N., Hakim, V., & Nadal, J.-P. (2007). What can we learn from synaptic weight distributions? TRENDS in Neurosciences, 30(12), 622–629.
Cheong, S. K., Tailby, C., Solomon, S. G., & Martin, P. R. (2013). Cortical-like receptive fields in the lateral geniculate nucleus of marmoset monkeys. Journal of Neuroscience, 33(16), 6864–6876.
Eliasmith, C., Stewart, T. C., Choo, X., Bekolay, T., DeWolf, T., Tang, Y., & Rasmussen, D. (2012). A large-scale model of the functioning brain. science, 338(6111), 1202–1205.
Hubel, D. H., & Wiesel, T. N. (1968). Receptive fields and functional architecture of monkey striate cortex. The Journal of physiology, 195(1), 215–243.
Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. Citeseer.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Martinez, L. M., & Alonso, J.-M. (2003). Complex receptive fields in primary visual cortex. The neuroscientist, 9(5), 317–331.
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., & Ng, A. Y. (2011). Reading digits in natural images with unsupervised feature learning. In Nips workshop on deep learning and unsupervised feature learning (Vol. 2011, p. 5).
Song, S., Sj¨ostr¨om, P. J., Reigl, M., Nelson, S., & Chklovskii, D. B. (2005). Highly nonrandom features of synaptic connectivity in local cortical circuits. PLoS Biol, 3(3), e68.