The process of identifying art styles and discovering artistic influences is essential to the study of art history, or more generally, the study of the visual history of humanity. Art historians have commonly sought answers to these questions through their extensive historical and cultural knowledge as well as exceptional skills in visual analysis. Now, the rise in computing power and advancement in machine learning algorithms have also allowed the machine to enter the field and contribute its own insight via data-driven methods. This study explores the potential of using convolutional neural networks (CNN) to conduct art style classification and study artistic influences among styles and artists. Several previous studies have shown the abilities of the machine in classifying art styles and generating features for analysis. However, the results of those studies fail to reflect the full capabilities of machine. Our study shows that a machine has the potential to perform art style classification better than humans without expertise in art history, and also help experts to discover underlying relationships among styles and artists.
To determine the style of a painting, art historians may consider the depicted subject matter, brushstrokes, the use of color, perspective, and its similarities to other paintings. In the first stage of this study, we evaluate how accurately machines can classify art styles. Elgammal et al.[1] attained a 20-class classification accuracy of 63.7% with 77K images of paintings using a pre-trained ResNet. However, we believe that a 20-class labeling may not be ideal, as there could be high similarities among certain styles, which tend to overcomplicate the problem and cause confusion for the classifier. For example, style element overlap exists between Post-Impressionism and Fauvism, as well as between Expressionism and Abstract-Expressionism. Meanwhile, we noticed that many artworks were originally mislabeled and thus require additional cleaning and polishing. To simplify the task, we reduced the number of classes to nine in order to better highlight distinctive elements between styles. We also manually relabeled some paintings to increase quality. With this curated dataset, we expect to achieve a better result even with less training data.
Our dataset contains 24,110 paintings, which are mostly accurately labeled into nine classes. We use state-of-art convolutional neural networks for art style classification and trained the models on GPUs initially with a low resolution to select viable models, and retrained high performing models on CPU in higher resolution to bump up the accuracy. In particular, we implemented VGG-16 [2], ResNet-152 [3], Inception V3[4] and Inception-ResNet V2 [5] and compare their performance on the testing data. To assess whether the networks are focusing on salient parts of the images when making decisions, we also implemented Grad-CAM [6] to create heat maps. Additionally, we visualize the filters of selected convolutional layers to better understand the mechanics of the model.
The best-performing model we found from the first stage is based on Inception V3. We use this as a proxy model in the second stage to study how machines may perceive the world. In order to explore relationships and determine similarities among styles and artists, we compute both average Euclidean and Cosine distances between every pair of artists using the output of the final fully connected layer with 512 neurons as a feature embedding for each painting. An artist network analysis is then performed to reveal Cosine linkages among nearby artists, as each artist may be considered to be connected to another artist when the distance between the two artists is minimized (for maximum similarity). Finally, we visualize the connections in chronological order to infer unidirectional influences among artists along the timeline.
We use images of paintings from the publicly available WikiArt database [7], which features a total of 250,000 artworks by 3,000 artists. With a focus on nine art styles ranging from the 13th century to the 21st century, we choose 24,110 paintings from 235 artists as our dataset for this study. For the purposes of this paper, style primarily refers to the visual appearance of a painting and that does
Table 1: Style classes summary
not necessarily relate it to other artists and works of any art movements. The details of our dataset are summarized in Table 1.
To promote quality of the analysis, we restrict the scope of our study to a subset of data. The labeling of the styles is based on both the information on WikiArt and our best knowledge. For artists featuring works of multiple styles, we designate them to one primary style and only keep images of paintings corresponding to the selected style. We have tried our best to conduct data cleaning based on the following rules:
∗ Removing images that are not paintings (e.g. sculptures & architecture) ∗ Removing images that contain large portions of non-paintings (e.g. frames & walls) ∗ Removing images of sketches and drawings ∗ Removing black and white images ∗ Removing images with significant distortion or are of poor quality
It is worth noting that due to the nature of art production and conservation across the centuries, paintings of certain styles are available in greater quantities than others; thus, the classes are not quite balanced. While augmenting examples from minority classes by rotating, flipping, shifting images may be an option, we decided not to apply it so that the model can more realistically reflect the distribution of style availability in the real world.
We randomly split 90% of the 24,110 paintings into a training dataset and use the remaining 10% for testing. As mentioned earlier, we experiment with four convolutional neural networks: VGG-16, ResNet-152, Inception V3, and Inception-ResNet V2. For each of the four models, the final softmax layer is replaced with a layer of nine softmax nodes, one for each style class. We train our models both with pre-trained weights from ImageNet and without pre-trained weights. It turns out that the models trained without using pre-trained weights perform better than those with pre-trained weights in terms of test accuracy, especially for ResNet-152, InceptionV3 and Inception-ResNet V2. This may be an effect of inadequate domain adaption where the domain of natural images that these models were trained on differs significantly from the abstract domain of artistic paintings.
For training efficiency, we explore the tradeoff of image size on model performance, as smaller models typically train faster. After several rounds of experiments, we realize that resizing the input image smaller than 256x256 pixels did not produce accurate models. Ideally, we would like to train the models in large batches with images of high resolution. However, since we are limited to 8 GB of memory on our NVIDIA GTX 1080 GPU, we decided to use an image size of 300x300 across all models as a way to balance image size and batch size. After selecting the initial GPU-trained models, we re-trained using the Intel AI DevCloud with a Xeon Gold 6128 CPU, Intel-optimized Tensorflow, and 192 GB memory to better understand how the model may perform with higher resolution. We were able to utilize a larger 400x400 size as we were no longer limited by memory size. The model classification results are summarized in Table 2 and Table 3. The increased resolution does pay off and deliver several additional percentage points of accuracy.
Table 2: Performance of Four Models on Art Style Classification (GPU)
Table 3: Performance of Four Models on Art Style Classification (CPU)
Figure 1: Inception V3 Training (solid in blue) and Validation Accuracy (dashed in pink)
Figure 2: Inception V3 Training (solid in black) and Validation Loss (dashed in yellow)
As Table 3 shows, Inception V3 is the best-performing model that reaches an exciting validation accuracy of 87.35% at an image size of 400x400, which marks a significant improvement from 63.7% achieved by the previous study [1]. We then retrain the Inception V3 with an even larger 500x500 size and achieved an even higher validation accuracy of 88.35%. For experts with adequate training in art history, we expect the human classification accuracy on a small sample of data would be near 100%. However, for those with a basic understanding of art styles, we expect the human accuracy to be around 70% to 80%. Generally, the results exceed our prior expectations and demonstrate that a machine is able to perform the classification task better than most non-professional human beings.
We hypothesize that one of the the underlying reasons behind misclassification for both human and machine is that some of the styles are quite similar aesthetically and iconically. For example, certain paintings of Early Renaissance & High Renaissance, Baroque & Realism, as well as Abstract Art & Cubism share a great amount of visual similarities and might pose challenges to both human and machine alike.
Figure 3: Style Classification Confusion Matrix
The confusion matrix depicted in Figure 3 largely supports this hypothesis. As it shows, the model predicts ~95% Ukiyo-e paintings correctly since this group is more aesthetically distinctive from other groups. Those styles sharing visual and iconic similarities not only pose classification challenges to humans, but to machines as well. For example, as indicated in the confusion matrix, the model predicts:
∗ 26 out of 401 (~6%) “Realism” paintings as “Baroque” paintings
∗ 7 out of 126 (~5%) “Cubism” paintings as “Abstract” paintings
∗ 13 out of 119 (~11%) “Early-Renaissance” paintings as “High-Renaissance” paintings
∗ 11 out of 105 (~11%) “Pop” paintings as “Abstract” paintings
To delve deeper into these issues, we also manually inspected several misclassified paintings and realized that some of the misclassifications can be explained logically. For example, in Figure 4 It¯o Jakuch¯u’s Phoenix and Sun is predicted as abstract art with a probability of ~0.52 instead of Ukiyo-e.
Figure 4: Highlights of Misclassified Paintings: It¯o Jakuch¯u Phoenix and Sun, mid-Edo period (left); Frans Hal, Portrait of a Woman, known as The Gypsy Girl, 1629 (middle); Fyodor Vasilyev, The Trunk of an Old Oak, 1867-1869 (right)
This prediction is reasonable since the red sun and monotonic background can be interpreted as large color chunks with sparse details, which resemble elements of abstract art. Even though the two red seals on the lower left indicate the painting’s East Asian origin, additional training and tuning is necessary before the machine can recognize those specific style patterns. The second example, Frans Hal’s The Gypsy Girl is classified as Realism with a probability of ~0.87 as opposed to Baroque. We believe this decision is also reasonable as the hard brushstrokes, which seem to be made with palette knives, on the collar and sleeves are similar to the certain Realism styles. The third work, a sketch by Fyodor Vasilyev, is classified as Impressionism art with a probability of ~0.58 as opposed to Realism. Here, the model actually catches a rather abstract and incomplete pencil sketch that was supposed to be removed from the dataset during data cleaning.
Furthermore, to assess whether the model focuses on salient parts of images during its decision generating process, we implement Gradient-weighted Class Activation Mapping (Grad-CAM) [6] to create heat maps. Figures 5-7 highlight a few correctly-classified examples from the testing dataset.
Figure 5: Leonardo da Vinci, Mona Lisa, 1503 - Correctly Predicted as High Renaissance Original, Color Heat Map, and B&W Heat Map using Grad-CAM
Figure 6: Claude Monet, Port of Dieppe, Evening, 1882 - Correctly Predicted as Impressionism Original, Color Heat Map, and B&W Heat Map using Grad-CAM
Figure 7: Pablo Picasso, Les Demoiselles d’Avignon, 1907 - Correctly Predicted as Cubism Original, Color Heat Map, and B&W Heat Map using Grad-CAM
According to the heat maps, it seems that the model is able to focus on certain important elements. For example, in Leonardo da Vinci’s well-known Mona Lisa, the model focuses on the woman’s face and upper chest (Figure 5). Regarding Claude Monet’s landscape Port of Dieppe, Evening, the model pays attention to several boats on the water, wave patterns as well as the lower skyline (Figure 6). We believe the choices made here are reasonable enough as most impressionist landscape paintings do not contain a single central object. When evaluating Pablo Picasso’s Les Demoiselles d’Avignon, the model expresses special interest in the faces and bodies of two figures on the right as well as the fruits in the foreground (Figure 7). These selected objects demonstrate clear cubic forms and mostly agree with where a human might attend to.
Figure 8: Selected Filter Visualizations from ‘conv2d_6’(3 out of the 48 filters)
Figure 9: Selected Filter Visualization from ‘conv2d_26’ (3 out of the 384 filters)
Figure 10: Selected Filter Visualization from ‘conv2d_56’ (3 out of the 160 filters)
Additionally, filter visualizations help us understand the behaviors of the network. Figure 8 shows that the network tends to focus on edges in the initial 2D convolutional layers. As the layers go deeper, the patterns on the visualizations become more abstract (Figure 9), indicating that the network can detect brushstrokes. However, visualizations of the ‘conv2d_56’ layer (Figure 10) do not feature recognizable object patterns as we expected even after 10,000 iterations. We believe the major reasons that affects the quality of filter visualisation here are the model complexity and model saturation, which are reflected in the high training accuracy (Figure 1).
Exploring connections among painting styles and artists is the primary focus of the second stage of our study. To better understand the similarities among paintings, we first implement t-distributed Stochastic Neighbor Embedding (T-SNE) [8], a nonlinear dimensionality reduction technique well-suited for embedding high-dimensional data for visualization in a low-dimensional space. By projecting the nine-class predicted probabilities into a two-dimensional space, we are able to see that the most clusters are well separated (Figure 11).
In order to improve model interpretability, we add a fully connected layer with 512 neurons before the softmax layers to the best-performing model based on Inception V3 in order to extract features of paintings. Aiming to expedite the re-training process and reuse as many weights from the first stage as possible, we freeze all the prior layers and only trained the fully connected layers. With the additional layer, the updated nine-class classification accuracy of the model changes to 87.77%, which is close to highest accuracy from stage 1 (88.35%). With this high accuracy, we believe that features extracted from the fully connected layer are representative and useful for conducting further analysis.
Figure 11: 2D T-SNE Plot on Painting Level with 9 Features
Figure 12: 2D T-SNE Plot on Painting Level with 512 Features
Interested in exploring connections among the 2412 paintings in the testing dataset on a painting level, we apply T-SNE to project the 512-dimensional features of each painting onto a two-dimensional space (Figure 12) and a three-dimensional space (Figure 13). The points on the 2D plot form a circular pattern around the center of the space. Focusing on the upper part of the plot, we are impressed to find that the model can uncover chronological trends of development for the five major styles. From Early-Renaissance moving in a counter-clockwise direction, High-Renaissance, Baroque, Realism and Impressionism are all sequentially following each other, depicting the developments of art styles across time. Overlapping points on the borders of the clusters may highlight
Figure 13: 3D T-SNE Plot on Painting Level with 512 Features
style overlap during periods of transition. Additionally, clusters of Abstract art, Pop art, and Cubism are also well separated from the Impressionism cluster, indicating gaps in time and differences in style. Yet, their proximity to each other implies commonalities among those styles emerged after the 20th century. It is also worth noting that Ukiyo-e, the only non-western style category in our study, is well separated from the rest clusters, highlighting its distinctive style.
Additionally, we analyze influence at an artist level. To generate features for each artist, we take the average of the feature values of paintings created by each artist. In other words, each artist would have 512 features averaged across their paintings. We utilize both the Euclidean distance
and cosine similarity
to estimate the similarity among artists. After comparing the results using two distance metrics, we realize that artist connections generated using cosine similarity are more plausible from an art historical perspective. One explanation could be that cosine similarity is more suitable for handling high-dimensional spaces, which enables it to better capture the similarities among artists.
Next, we perform network analysis to visualize the global connections and influences among artists. An artist A is considered connected to artist B if this pair has the maximum cosine similarity
Figure 14: Network Analysis of Artist Similarities by Style Using Cosine Distance (Index-Artist Mapping Available at Appendix A)
among all possible pairs involving A. Such connections are represented as linkages in the network graph (Figure 14). While each node on the graph represents an artist, the size of a particular node is proportional to the number of edges connecting with it. Different colors of the nodes correspond to different style classes that artists belong to.
Several linkages generated by the network are consistent with art historical explanations. From the network graph, we notice that Pablo Picasso (Index 99) is closely connected with Georges Braque (Index 92). This is reasonable since they worked closely in the early 20th century and developed Cubism together. To some extent, the style of their Cubist works were indistinguishable for years. As another example, the linkages among Duccio (Index 107), Ambrogio Lorenzetti (Index 100) and Pietro Lorenzetti (Index 119) also accord with art historical facts. Duccio and Pietro Lorenzetti is connected through Ambrogio Lorenzetti. While Pietro Lorenzetti was the younger brother of Ambrogio Lorenzetti, he was also a follower of Duccio and many believe that he also studied under Duccio. Additionally, the network also correctly discovers the connections among the French Impressionists. For example, Pierre-Auguste Renoir (Index 163) is connected with Camille Pissarro (Index 150) though Berthe Morisot (Index 149), and Mary Cassatt (Index 159) is connected to Edouard Manet (Index 154) and Edgar Degas (Index 153). This body of artists played crucial roles in the Impressionism movement and held series of independent exhibitions in the second half of the 19th century. Regarding abstract art, Paul Klee (Index 26) and Wassily Kandinsky (Index 36), who are regarded as the founding fathers of classical modernism and abstract art, are also connected on the graph. From Der Blaue Reiter avant-garde movement to Bauhaus, the two close friends critically engaged with each other’s work and explore new possibilities of virtual means.
Furthermore, certain non-obvious linkages that cannot be explained by existing studies in art history may shed light upon new directions for further exploration. For example, William Dobson (Index 87), a portraitist and one of the first notable English painters appears to be closely related to Caravaggio (Index 48), the famous Baroque artist known for his dramatic use of chiaroscuro. However, few works in the literature have explored the influence of Caravaggio on William Dobson; thus, this may be an interesting linkage that people may have overlooked. In addition, Théodore Rousseau (Index 217), a French landscape painter of the Barbizon school might have influenced Fyodor Vasilyev (Index 196), a Russian lyrical landscape painter. The hue, style, and subject matters of their paintings share a significant amount of visual similarities; yet, few people have studied possible connections between them yet.
Figure 15: Network Analysis of Artist Similarities by Styles Using Cosine Distance
Finally, in order to better uncover the influences among artists with respect to time, we also visualize the chronological artistic lineage by plotting the network graph along the timeline (Figure 15). The year corresponding with each artist is calculated by averaging production years of all her or his works in the dataset. The connections here were restricted to be unidirectional, since naturally only earlier artists can influence later ones. Certain generated linkages are also plausible from an art historical perspective. For example, Annibale Carracci is not only connected with his brother Agostino Carracci, but also with Caravaggio in Figure 16(a). Even though he and Caravaggio were rivals, they both reacted against the Mannersism and and favored a more naturalistic style. Additionally, while Giotto was a student of Cimabue, Duccio was also a follower of Cimabue and many believe that he also studied under Cimabue Figure 16(b). The three important artists all played important roles in the rise of individualism during the Early-Renaissance period.
Figure 16: Detail, Network Analysis of Artist Similarities by Styles Using Cosine Distance
Figure 17: 1850 - 1900, Primarily Realism and Impressionism
However, limitations do exist in our current approach. First, using the average of available production years may not be ideal especially when analyzing artists with their close contemporaries. In Figure 17, both Monet and Renoir were leading painters in the development of Impressionist style. Yet, since a high portion of works by Renoir in the dataset were created relatively later than Monet’s works, Renoir inaccurately appears to be a follower of Monet according to the plot. Secondly, for each artist A, we only pair it with one other artist, if their pairwise similarity is the highest among all possible pairs involving A. Thus, the results may not be able to capture the actual degree of complex interactions among artists. Nevertheless, the study does show the potential of using computational means to discover influences among artists.
In this study, we explore the potential of using convolutional neural networks (CNN) to semiautomatically classify paintings by styles and discover relationships among styles and artists. The model based on Inception V3, our best-performing algorithm reaches an exciting validation accuracy of 88.35%. Heat maps created using Grad-CAM confirm that the model successfully focuses on reasonable painting components during its decision-making process.
Moreover, we extract features from our classification model and perform network analysis looking for patterns of art style developments. From 2D and 3D T-SNE visualizations, we discover both chronological trends of development as well as distinctiveness among styles. On an artist level, our graphs based on the cosine similarity metric reveal not only certain connections consistent with the extensive studies of art history, but also non-obvious ones, which may point to new directions for further exploration by art historians. Even though there are still opportunities for further enhancements, our study has shown the great potential of using deep learning to help experts derive data-driven insights for art historical questions.
[1] Ahmed Elgammal, Marian Mazzone, Bingchen Liu, Diana Kim, and Mohamed Elhoseiny. The Shape of Art History in the Eyes of the Machine. arXiv e-prints, page arXiv:1801.07729, Jan 2018.
[2] Karen Simonyan and Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv e-prints, page arXiv:1409.1556, Sep 2014.
[3] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. arXiv e-prints, page arXiv:1512.03385, Dec 2015.
[4] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. Rethinking the Inception Architecture for Computer Vision. arXiv e-prints, page arXiv:1512.00567, Dec 2015.
[5] Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alex Alemi. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. arXiv e-prints, page arXiv:1602.07261, Feb 2016.
[6] Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-CAM: Visual Explanations from Deep Networks via Gradientbased Localization. arXiv e-prints, page arXiv:1610.02391, Oct 2016.
[7] WikiArt. http://wikiart.org. Accessed on 2019-04-10.
[8] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(Nov):2579–2605, 2008.
A Index - Artist Mapping