ShapeVis: High-dimensional Data Visualization at Scale

2020·Arxiv

ABSTRACT

ABSTRACT

We present ShapeVis, a scalable visualization technique for point cloud data inspired from topological data analysis. Our method captures the underlying geometric and topological structure of the data in a compressed graphical representation. Much success has been reported by the data visualization technique Mapper, that discreetly approximates the Reeb graph of a filter function on the data. However, when using standard dimensionality reduction algorithms as the filter function, Mapper suffers from considerable computational cost. This makes it difficult to scale to high-dimensional data. Our proposed technique relies on finding a subset of points called landmarks along the data manifold to construct a weighted witness-graph over it. This graph captures the structural characteristics of the point cloud, and its weights are determined using a Finite Markov Chain. We further compress this graph by applying induced maps from standard community detection algorithms. Using techniques borrowed from manifold tearing, we prune and reinstate edges in the induced graph based on their modularity to summarize the shape of data. We empirically demonstrate how our technique captures the structural characteristics of real and synthetic data sets. Further, we compare our approach with Mapper using various filter functions like t-SNE, UMAP, LargeVis and show that our algorithm scales to millions of data points while preserving the quality of data visualization.

KEYWORDS

visualization, topological data analysis, manifold learning

1 INTRODUCTION

With ever-increasing amounts of data and advances in hardware to store and query such datasets, it has become critical to have scalable and robust systems for analyzing big data. Understanding and mining insights from the massive amount of data has made a significant impact in fields like marketing, business, education and healthcare. However, we continue to lack the tools and techniques to produce insight-generating visualizations of large-scale and high-dimensional data. Traditional visualization approaches like scatter plots and heat-maps have been proven to be effective only for small or intermediate sizes of data. These techniques are extremely intuitive and can be used to determine the bivariate relationships between variables. However, they require laying out the data points on a lower-dimensional space which is computationally intractable when data is large-scale and high-dimensional. Moreover, in some cases, these visualizations can lead to inconclusive results.

Dimensionality reduction is the transformation of high-dimensional data into a low dimensional representation while preserving some desirable structure of data. It is a core problem in machine learning and data mining since, generally, real-world data is found to lie on a low-dimensional manifold embedded in high-dimensional space [2]. Of late, the usage of dimensionality reduction for visualization of high-dimensional data has become common practice following the success of techniques such as PCA[50], MDS [23] t-SNE [31], tsNET [22], and UMAP[32]. These techniques are being applied in a wide range of fields and on ever-increasing sizes of datasets. Broadly, dimension reduction algorithms tend to fall into two categories. Algorithms such as PCA [50] and MDS [23] seek to preserve the distance structure within the data whereas algorithms like t-SNE [31], Isomap [38], LargeVis [44], UMAP [32] and Laplacian Eigenmaps [1] favor the preservation of local distances over global distance.

The class of techniques known as Stochastic Neighbor Embedding (SNE) [20] are considered the current state-of-the-art in dimension reduction for visualization. Intuitively, SNE [20] techniques encode local relationships in the original and embedding spaces as probability distributions. Then it minimizes the loss of information between the two distributions with respect to the locations of points in the map. SNE-based approaches have revealed many interesting structures in real-world and synthetic data [39].

However, there are issues with dimensionality reduction methods. Since these methods compress a large number of attributes down to a few, they suffer from projection losses. As a result, points well separated in high-dimensional space might appear to be in the same neighborhood in the lower-dimensional projection. In particular, due to local neighborhood preservation SNE techniques might miss structures at different sizes. Though SNE is fairly robust to changes in perplexity [31], multiple plots with different perplexities are typically needed to arrive at a conclusively useful embedding [49]. They are also known to not always produce similar output on successive runs making them hard to use. Moreover, application of SNE techniques to large datasets is problematic, as the computational complexity is usually ]. Using approximations it can be reduced to O(n log(n))[48]. Even when appropriately projected onto lower-dimensional spaces, interpreting and extracting insights can be cumbersome when there are too many data points or when the data is high-dimensional. This is largely because 2D and 3D representations do not provide any obvious means to explore and understand critical patterns and global relationships in the data.

Visualization approaches that make use of abstractions and concise representations are, therefore, essential in capturing and interpreting structural information in high-dimensional data. In this paper, we propose ShapeVis, that tries to address this issue by computing a compressed representation of large-scale and high-dimensional data in the form of a graph. One important goal of data visualization is to let the user obtain knowledge about the intrinsic structure of data, i.e. to understand how it is organized on a large scale. To this end, our proposed algorithm focuses on preserving global distances and topology. ShapeVis encodes the geometric properties of the original data by capturing small-neighborhood relationships within its nodes and topological features like connected components and loops in the graph structure. ShapeVis first finds a subset of points called landmarks along the data manifold and constructs a weighted graph over it that approximates a 1-witness complex[13] on the landmarks. A second level of landmarks are selected in such a manner that their neighborhoods partition the witness graph. The weights between these landmarks are determined by modelling the random movement of a hypothetical particle on the landmarks using a Finite Markov Chain similar to hSNE[37]. This graph is compressed by applying induced maps from standard community detection algorithms. We then prune and reinstate edges in the induced graph based on their modularity to summarize the shape of data. Given its construction, it is scalable to millions of data-points, and the final visualization graph is easy to analyze.

We perform extensive experiments on real-world, large-scale and high-dimensional data sets, including images, text (word embedding) and networks. Experimental results show that our proposed algorithm for constructing the compressed representation captures concise and intuitive visualizations. We compare our approach to Mapper[43] with standard dimensionality reduction functions like LargeVis, UMAP and tSNE as its filter function. We find ShapeVis to be much more efficient when the dataset size is huge. To summarize, we make the following contributions:

(1) We propose a visualization technique that captures the intrinsic shape of large-scale and high-dimensional data efficiently

(2) We propose a weighting scheme based on a Finite Markov Chain (FMC) built on a witness complex to encode similarities between landmarks

(3) We propose a manifold tearing procedure that captures essential loops and connectivity of the data manifold.

(4) We conduct experiments on large, real-world data sets and compare ShapeVis with Mapper on LargeVis, UMAP and tSNE, both computationally and visually.

2 RELATED WORK

Mapper. Techniques in Topological Data Analysis (TDA) compute and analyze topological features of generally high-dimensional and possibly noisy data sets. In particular, the mapper algorithm [43] is the most scalable among TDA approaches. It converts data into a graphical representation where nodes represent collections of similar data points and edges between nodes represent the existence of shared data points between the nodes. It discreetly approximates the Reeb graph[3] of a filter function on the manifold. Despite its success in several business problems, Mapper is an ad hoc technique requiring a significant amount of parameter tuning[8]. Also, Mapper only provides a global scale selection for the covers of data which often results in bad visualizations when data is of non-uniform density. Multimapper [15] tried to address this issue by a scale selection scheme sensitive to the local density of data points. Moreover, when using standard dimensionality reduction algorithms for filter function, Mapper suffers from considerable computational cost. TDA techniques like Mapper has been applied to a variety of domains namely shape analysis [10], sensor-network coverage [14], proteins [21], images [7] [29], social network analysis [9][36], computational neuroscience [25], genomics [6] [5], periodicity in time series [40], cancer [35], materials science [4] [27], financial networks [17] and more.

Landmark selection. Manifold landmarking has been widely used to find a subset of data-points that capture the structural characteristics of the underlying manifold. This helps in reducing the time and space complexity of the subsequent steps in the algorithm. So far, several landmark selection methods have been proposed. Landmarks can be selected randomly from the given data as given in [42]. Another interesting approach is the landmarking procedure described in Fast-Isomap proposed in [28] based on integer optimization. In [30], landmarks are chosen from a central part of the given data.

Witness complex. The witness complex[13] is a computationally feasible approximation of the underlying topological structure of a point cloud. It is built in reference to a subset of points, the landmarks, rather than considering all the points as in the Čech and Vietoris-Rips complexes [19]. The witness complex is a weak approximation[18] of the Delaunay triangulation[11] on the data.

Manifold tearing. Lee and Verleysen [26] introduce a manifold tearing procedure to cut out essential loops to aid in downstream dimensionality-reduction. First, a k-nearest neighbor (kNN) graph on the point cloud sample is constructed. Then, a minimum spanning tree (MST) or a shortest path tree (SPT) containing no cycles is computed on this kNN graph. Finally, edges not generating noncontractible cycles with more than 4 edges are reintroduced to form the final torn graph for downstream dimensionality reduction.

3 SHAPEVIS

Given a large-scale and high-dimensional dataset 1, 2, ..., N }, our goal is to create a compressed graphical representation G of X while preserving the intrinsic structure of the data. Below, we explain the components of our visualization technique.

3.1 Manifold landmarking

Our landmark selection procedure proceeds in two stages. In the first stage, for the given original input data with N records, points in d dimensional space, we uniformly sample M points, . We then construct an undirected, unweighted neighborhood graph , where each node , corresponding to the point , is connected to its k-nearest neighbors. Explicitly, an edge if is in the k-nearest neighborhood set of or vice versa. This graph is further augmented using the remaining points in to build a 1-witness complex. For a point be its 2-nearest neighbors from . Then, we say the point is witnessing the 1-simplex and we add an edge if not already present in the edge set

Figure 1: An example pipeline of ShapeVis. (a): Original point-cloud data. (b): Sampled landmarks in the point cloud. (c): Weighted graph GL built on it. (d): Final graph generated by community detection and manifold tearing on GL.

In the second stage, we select the landmarks inductive procedure similar to the one proposed in [41]. We start by selecting the first landmark from uniformly at random. At the i-th iteration, we mark the -neighbors of the previously selected landmarkas covered and remove them from -neighbors are termed as neighborhood set of landmark then inductively select another random point from the remaining set to beuntil all points in are marked. This algorithm ensures a selection of landmarks whose neighborhood-sets partition the graph.

3.2 Random-walk based weighting

Once we have sampled landmarks L covering the underlying manifold, we construct a weighted, undirected graphon this set using the graph to capture its topology. Let each node corresponds to the landmark . The edges and their weightsW are determined using a Finite Markov Chain to model the random movement of a hypothetical particle on the data manifold. The states are given by the landmarks. For each landmark random walks of fixed length now define

where denotes the number of random walks that started from landmark and have their endpoint in the neighborhood set of landmark . This method generates a sparse asymmetric matrix and the final weight matrix is given by the symmetric matrixis the Hadamard (or pointwise) product.

3.3 Nerve complex of the graph

Nerve of a cover[19]. An open cover of a space X is a collection of open sets such that each point in the space is in at least one of these open sets. We shall refer to the individual elements of open cover as bins. We can conceptualize covering a space as putting each element in the space in one or more of these bins. Given a cover U of a space X, the nerve N(U) is a simplicial complex constructed as follows:

• If k + 1 bins of U have a mutual non-empty intersection in X, N(U) contains a k-simplex with the corresponding nodes as its vertices A topological space is said to be contractible if it is homotopy equivalent[34] to a point. Basic examples of contractible spaces are the balls and more generally, the convex sets in . Open covers for which both elements and their intersections are contractible have the following remarkable property.

Theorem 3.1 (Nerve Theorem). [19, Corollary 4G.3] If U is an open cover of a paracompact space X such that every non-empty intersection of finitely many sets in U is contractible, then X is homotopy equivalent to the nerve N(U).

Graph obtained in Section 3.2 captures the shape of data, but we only want the higher-level homological features for insightful

Figure 2: Visualization of MNIST dataset using different approaches

Figure 3: Visualization of FMNIST dataset using different approaches

visualization. Nerve Theorem [19, Corollary 4G.3], as explained above, provides a way to capture the topological structure of a space through a covering of the space. Inspired by this, we want to find a good covering of X that captures its shape through the graph . We use community detection algorithms [12, 46] to partition the graph into well-separated communities. An induced graph is constructed on this partition whose nodes represent the communities and an edge of weight w exists between partitions if the sum of the weights of the edges between their elements is w. Using techniques from manifold tearing, we remove redundant/weak edges from the induced graph while preserving as much as possible the structural characteristics of the data manifold.

3.3.1 Community Detection on Landmark graph. Commu-

nity detection is performed on the graph obtained in Section 3.2 to obtain sets that cover the set . This cover partitions the graph such that each node belongs to only one community. Since our visualization technique is unsupervised, we use modularity-based community-detection algorithms that use network structure properties, such as edges and node degrees to find communities. We can use any of the following standard algorithms for community detection:

Louvain. [12] This method uses a greedy optimization method that maximizes the modularity of a partition of the network. The optimization is done in two phases. First, individual nodes are moved to a neighboring community that yields the largest increase in modularity. Then an induced graph is created based on the partition obtained in the first phase. Each community in this partition then becomes a node in the induced network. These two phases are repeated until the modularity cannot be increased further.

Leiden. [46] This method is similar to Louvain except for a refinement phase. Here the optimization proceeds in three phases. First, nodes are moved based on modularity optimizations. In the refinement phase, nodes are not necessarily greedily merged with the community that yields the largest increase in modularity. Instead, a node may be merged with any community for which the modularity simply increases. The community with which a node is merged is selected randomly. The aggregation phase then proceeds similarly to the one in Louvain.

Both these methods give rise to a dendrogram structure where each level is a partition of the graph nodes. At the zeroth level is the first partition, which contains the smallest communities, and the final level contains the coarsest communities. The higher the level is, the bigger are the communities. Let the induced graph at partition level

3.4 Modularity-based Manifold Tearing

The induced graph obtained in Section 3.3 is dense with spurious edges and hence does not lead to a comprehensible representation of the data. This step aims at determining a graph G = (V ; E), having the same vertices as the graph , but with a smaller edge set E, such that G represents the overall topological structure of . We introduce a two-phase tearing procedure to construct G.

In the first phase we construct a spanning sub-graph For each edge of the graph , its modularity is computed and inserted in an ordered heap of edges. We then iteratively pop from the heap and introduce the corresponding edge into if it results in increased connectivity of the graph until the graph has as many connected components as . This phase results in a sub-graph that spans the vertices of the induced graph. This procedure differs significantly from the classical manifold tearing one in the following sense. Instead of constructing a minimum spanning tree (MST) or a shortest path tree (SPT) with no cycles on the graph; our procedure constructs a spanning subgraph whose edges are chosen based on their modularity.

Once the spanning subgraph is computed, we execute the second phase of the tearing procedure. Whereas classical techniques cut out essential loops to aid in downstream dimensionality-reduction, we introduce as few loops as possible to capture the

Figure 4: Visualization of LiveJournal (Rows 1 and 2) and GoogleNews word vectors (Rows 3 and 4) using different approaches. Nodes are colored by the pseudo label whose data points are maximum in the node.

structure of the data manifold as much as possible. We initialize G with the spanning subgraph and gather the edges discarded during the first phase in a set . In this phase, our procedure only reintroduces those edges from S that generate essential loops. By essential loops we mean those cycles whose sum of edge modularities is more than or equal to c, a user-defined hyperparameter. The idea is to preserve the homological characteristics of the data manifold like connected components and loops.

Thus, the final output is the graph G = (V, E) where local coverings of data are represented as nodes and edges represent the geodesic proximity between them. Branches and loops reflect the topological features of data manifold and are useful in the interactive and unsupervised discovery of segments. In the next section, we show the visualization graph obtained through our approach on high-dimensional and large scale datasets. An example pipeline on synthetic data is shown in Fig 1.

4 EXPERIMENTS

In this section, we evaluate the effectiveness of our approach in visualizing high-dimensional and large-scale datasets. We compare our visualization approach with the following approaches:

• Mapper(t-SNE): Mapper with t-SNE filter function. • Mapper(UMAP): Mapper with UMAP filter function. • Mapper(LargeVis): Mapper with LargeVis filter function.

We choose these baselines as Mapper also returns a compressed visualization of data in the form of a graph similar to ShapeVis. Dimensionality reduction algorithms are a standard choice of filter function in Mapper to compress the data into a 2-dim space, and tSNE[31], UMAP[32], LargeVis[44] are widely used state-of-the-art dimensionality reduction algorithms. More details about Mapper algorithm can be found in [43]. In the next section, we show that visualization quality of our approach is comparable and sometimes on-par to the above-mentioned approaches while being scalable to millions of data-points.

4.1 Datasets

We use the following datasets for comparison and visualization purposes. It includes both high-dimensional and large-scale real-world datasets.

• MNIST [24]. The dataset consists of 70000 28x28images of handwritten digits (0-9). Each data-point is a 784-dimensional vector.

• F-MNIST [51]. Fashion MNIST is a dataset of 28x28images of fashion items like clothing, shoes etc. There are 10 classes and total of 70000 images.

• GoogleNews Vectors [33]. It is a dataset of 3 million words and phrases from GoogleNews dataset. Each word is embedded into a 300-dimensional vector space using word2vec [33] approach.

• LiveJournal [52]. It is a social network dataset from an online blogging community with around 4 million nodes. For common comparison with other methods, we first learn a 100-dimensional representation of each node using LINE [45] algorithm before visualizing it. Note that our algorithm can be easily modified to work directly over graphs as well.

4.2 Implementation Details

We explain our choice of hyperparameters for ShapeVis implementation. We initialize M, which is the number of points sampled for creating witness complex , as a minimum of 1 Million or N/3. N is the total number of data points. For the construction of k-nn graph we keep the number of nearest neighbours k fixed at 10 for all datasets. Note that k should be such that captures the local connectivity in the dataset and therefore a small value of k is good if the sample size is large enough. For finding k-nearest neighbors we use the nn-descent algorithm of [16]. Number of random walks is 1000 and is fixed as l/2, l where l = 50 for all datasets. We found the algorithm to be stable for various choices of greater than a threshold and chose the minimum to optimize time. For we choose partition level p as 0 and find it to work best for all datasets. Similarly, the parameter c during manifold tearing step is kept fixed at 2 (Modularity of

For Mapper, we show the best visualization obtained by tuning hyper-parameters wherever possible. For LargeVis and UMAP we use the reference implementation of [44] and [32], and for t-SNE, we use Multicore t-SNE package [47].

4.3 Qualitative Comparison

Labeled datasets. For comparing visualization quality on datasets with ground truth labels, we color each node in the visualization graph with its dominant label. Fig 2 shows the visualization obtained on MNIST dataset. We can see that ShapeVis as well as Mapper(UMAP) and Mapper(LargeVis) coherently capture the global relationship between different digits with clusters of (1,2), (3,5,8), (0,6) and (4,7,9). But ShapeVis also captures the local within-cluster relationships, e.g. the two branches of ’4’ in Fig 2 (a) corresponds to the two different ways of writing it: upside-down lower-case ’h’ and closed digit ’4’. Fig 3 shows the visualization obtained on F-MNIST dataset. Though all the visualizations show the separation between the two broad category clothing and footwear, ShapeVis captures the relationship between different classes more coherently. For example, Trouser class is connected to Dress class through a single node instead of being disconnected with the graph as is in Mapper(UMAP) and Mapper(LargeVis). Similarly Bag class is connected with T-shirt/Top as compared to Ankle-Boot class. The loop in ShapeVis visualization captures the similarity chain of the classes Dress, T-shirt, Pullover and Coat, which other approaches fail to capture. Another detail which ShapeVis captures more vividly is that images of sleeveless tees though labeled T-shirt is clubbed with Dress nodes because of its visual similarity to short dresses than the T-shirt node.

Unlabeled datasets. For LiveJournal and GoogleNews vectors, no ground truth class label is available. Therefore, for comparing the visualization quality of ShapeVis to other approaches, we assign

Table 1: Time comparison (in seconds) on different datasets of all approaches.

Figure 5: Running time (in seconds) of all approaches with increase in dataset size of points sampled from a uniform 25-dim sphere

pseudo labels to each data point and then color each with its dominant pseudo label. For assigning a pseudo label, we run Louvain community detection [12] on ShapeVis (Mapper) graph to partition it into segments, and each data point is assigned the label of the segment to which it belongs. Fig 4, rows 1 and 2, show the visualization on LiveJournal dataset when pseudo labels are assigned using segments of Mapper(UMAP) and ShapeVis, respectively. We can see in Fig 4 row 1, that the segments found through ShapeVis correspond well with the segments in Mapper(UMAP) and Mapper(LargeVis). Similarly, segments of Mapper (UMAP) align with the segments of ShapeVis and Mapper(LargeVis). This shows that visualization obtained through our approach is qualitatively similar to Mapper with UMAP or LargeVis filter functions. We do not show Mapper(t-SNE) as we were unable to run t-SNE on these datasets because of its huge time-complexity. Fig 4, rows 3 and 4, show the visualization on GoogleNews vectors. Both Mapper(UMAP) and Mapper(LargeVis) fail to bring any clear segmentation of dataset in the visualization. Moreover, we do not see any alignment between segments of different visualizations. We also compute the average cosine similarity between word-vectors belonging to the same segments for all three visualizations. For ShapeVis, Mapper(UMAP) and Mapper(LargeVis) average cosine similarity between words of a segment is 0.224, 0.186 and 0.132, respectively. Thus ShapeVis performs slightly better by this measure.

4.4 Time Comparison

We compare the running time of ShapeVis against other approaches on all the above mentioned datasets. All the results are executed on a machine with 48GB memory and 6 cores. For LiveJournal and GoogleNews vectors, we compare on 2 million and 1 million subsets respectively since UMAP returned memory overflow error on the complete dataset. Table 1 shows the running time of all approaches in seconds. We can see that ShapeVis significantly outperforms other methods for large datasets and is comparable on smaller datasets.

We further analyze the scalability of ShapeVis with dataset size by running it on random samples of points from a uniform sphere of 25-dimension. Fig 5 shows the plot of running time (in seconds) vs the number of sampled points for all approaches. It shows that as the dataset size increases ShapeVis is more and more efficient as compared to Mapper with t-SNE, UMAP or LargeVis filter functions.

5 CONCLUSION

In this paper, we proposed ShapeVis, a graph-based visualization technique that aims to preserve the topological structure of data in the form of a summarized graph. The 2-step landmark sampling in ShapeVis helps it to scale to millions of data-points with consistency. Experiments on labeled real-world datasets show that ShapeVis captures the global relationship between different labels coherently. For unlabeled datasets, the visualization of ShapeVis is qualitatively similar to existing approaches. It captures the relationship between different local neighourhoods of data in a concise manner and scales with significantly lower running time. In the future, we aim to incorporate hierarchical visualization into ShapeVis. Although we show only high level details in the current visualization, it can be easily extended to interactively explore the segments of the graph at a finer scale.

REFERENCES

[1] Mikhail Belkin and Partha Niyogi. 2003. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation 15, 6 (2003), 1373–1396.

[2] Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35, 8 (2013), 1798–1828.

[3] Silvia Biasotti, Daniela Giorgi, Michela Spagnuolo, and Bianca Falcidieno. 2008. Reeb graphs for shape analysis and applications. Theoretical computer science 392, 1-3 (2008), 5–22.

[4] Mickaël Buchet, Yasuaki Hiraoka, and Ippei Obayashi. 2018. Persistent homology and materials informatics. In Nanoinformatics. Springer, Singapore, 75–95.

[5] Pablo G Cámara. 2017. Topological methods for genomics: present and future directions. Current opinion in systems biology 1 (2017), 95–101.

[6] Pablo G Camara, Daniel IS Rosenbloom, Kevin J Emmett, Arnold J Levine, and Raul Rabadan. 2016. Topological data analysis generates high-resolution, genomewide maps of human recombination. Cell systems 3, 1 (2016), 83–94.

[7] Gunnar Carlsson, Tigran Ishkhanov, Vin De Silva, and Afra Zomorodian. 2008. On the local behavior of spaces of natural images. International journal of computer vision 76, 1 (2008), 1–12.

[8] Mathieu Carriere, Bertrand Michel, and Steve Oudot. 2018. Statistical analysis and parameter selection for Mapper. The Journal of Machine Learning Research 19, 1 (2018), 478–516.

[9] Corrie J Carstens and Kathy J Horadam. 2013. Persistent homology of collaboration networks. Mathematical problems in engineering 2013 (2013).

[10] Anne Collins, Afra Zomorodian, Gunnar Carlsson, and Leonidas J Guibas. 2004. A barcode shape descriptor for curve point cloud data. Computers & Graphics 28, 6 (2004), 881–894.

[11] Mark de Berg, Otfried Cheong, Marc van Kreveld, and Mark Overmars. 2008. Delaunay triangulations: height interpolation. Computational Geometry: Algorithms and Applications (2008), 191–218.

[12] Pasquale De Meo, Emilio Ferrara, Giacomo Fiumara, and Alessandro Provetti. 2011. Generalized louvain method for community detection in large networks. In 2011 11th International Conference on Intelligent Systems Design and Applications. IEEE, 88–93.

[13] Vin de Silva and Gunnar Carlsson. 2004. Topological estimation using witness complexes. In Proceedings of the First Eurographics conference on Point-Based Graphics. Eurographics Association, 157–166. https://doi.org/10.2312/SPBG/ SPBG04/157-166

[14] Vin de Silva and Robert Ghrist. 2007. Coverage in sensor networks via persistent homology. Algebr. Geom. Topol. 7, 1 (2007), 339–358. https://doi.org/10.2140/agt. 2007.7.339

[15] Bishal Deb, Ankita Sarkar, Nupur Kumari, Akash Rupela, Piyush Gupta, and Balaji Krishnamurthy. 2018. Multimapper: Data Density Sensitive Topological Visualization. In 2018 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, 1054–1061.

[16] Wei Dong, Charikar Moses, and Kai Li. 2011. Efficient k-nearest neighbor graph construction for generic similarity measures. In Proceedings of the 20th international conference on World wide web. ACM, 577–586.

[17] Marian Gidea. 2017. Topological data analysis of critical transitions in financial networks. In International Conference and School on Network Science. Springer, 47–59.

[18] Leonidas J Guibas and Steve Y Oudot. 2008. Reconstruction using witness complexes. Discrete & computational geometry 40, 3 (2008), 325–356.

[19] Allen Hatcher. 2002. Algebraic topology. Cambridge University Press, Cambridge. xii+544 pages.

[20] Geoffrey E Hinton and Sam T Roweis. 2003. Stochastic neighbor embedding. In Advances in neural information processing systems. 857–864.

[21] Violeta Kovacev-Nikolic, Peter Bubenik, Dragan Nikolić, and Giseon Heo. 2016. Using persistent homology and dynamical distances to analyze protein binding. Statistical applications in genetics and molecular biology 15, 1 (2016), 19–38.

[22] Johannes F Kruiger, Paulo E Rauber, Rafael M Martins, Andreas Kerren, Stephen Kobourov, and Alexandru C Telea. 2017. Graph Layouts by t-SNE. In Computer Graphics Forum, Vol. 36. Wiley Online Library, 283–294.

[23] Joseph B Kruskal and Myron Wish. 1978. Multidimensional scaling. Vol. 11. Sage.

[24] Yan Lecun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, and L.D. Jackel. 1989. Backpropagation applied to handwritten zip code recognition.

[25] Hyekyoung Lee, Moo K Chung, Hyejin Kang, Bung-Nyun Kim, and Dong Soo Lee. 2011. Discriminative persistent homology of brain networks. In 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro. IEEE, 841– 844.

[26] John Aldo Lee and Michel Verleysen. 2005. Nonlinear dimensionality reduction of data manifolds with essential loops. Neurocomputing 67 (2005), 29–53.

[27] Yongjin Lee, Senja D Barthel, Paweł Dłotko, Seyed Mohamad Moosavi, Kathryn Hess, and Berend Smit. 2018. High-throughput screening approach for nanoporous materials genome using topological data analysis: application to zeolites. Journal of chemical theory and computation 14, 8 (2018), 4427–4437.

[28] Ying-Ke Lei, Zhu-Hong You, Tianbao Dong, Yun-Xiao Jiang, and Jun-An Yang. 2013. Increasing reliability of protein interactome by fast manifold embedding. Pattern Recognition Letters 34, 4 (2013), 372–379.

[29] David Letscher and Jason Fritts. 2007. Image segmentation using topological persistence. In International Conference on Computer Analysis of Images and Patterns. Springer, 587–595.

[30] Dong Liang, Chen Qiao, and Zongben Xu. 2015. Enhancing both efficiency and representational capability of isomap by extensive landmark selection. Mathematical Problems in Engineering 2015 (2015).

[31] Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579–2605.

[32] Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018).

[33] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.

[34] James Munkres. 2014. Topology. Pearson Education.

[35] Monica Nicolau, Arnold J Levine, and Gunnar Carlsson. 2011. Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proceedings of the National Academy of Sciences 108, 17 (2011), 7265–7270.

[36] Alice Patania, Giovanni Petri, and Francesco Vaccarino. 2017. The shape of collaborations. EPJ Data Science 6, 1 (2017), 18.

[37] Nicola Pezzotti, Thomas Höllt, B Lelieveldt, Elmar Eisemann, and Anna Vilanova. 2016. Hierarchical stochastic neighbor embedding. In Computer Graphics Forum, Vol. 35. Wiley Online Library, 21–30.

[38] Ashutosh Saxena, Abhinav Gupta, and Amitabha Mukerjee. 2004. Non-linear dimensionality reduction by locally linear isomaps. In International Conference on Neural Information Processing. Springer, 1038–1043.

[39] Michael Sedlmair, Tamara Munzner, and Melanie Tory. 2013. Empirical Guidance on Scatterplot and Dimension Reduction Technique Choices. IEEE Trans. Vis. Comput. Graph. 19, 12 (2013), 2634–2643. http://dblp.uni-trier.de/db/journals/ tvcg/tvcg19.html#SedlmairMT13

[40] Lee M Seversky, Shelby Davis, and Matthew Berger. 2016. On time-series topological data analysis: New data and opportunities. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 59–67.

[41] Hao Shi, Baoqun Yin, Yu Kang, Chao Shao, and Jie Gui. 2017. Robust l-Isomap with a novel landmark selection method. Mathematical Problems in Engineering 2017 (2017).

[42] Vin D Silva and Joshua B Tenenbaum. 2003. Global versus local methods in nonlinear dimensionality reduction. In Advances in neural information processing systems. 721–728.

[43] Gurjeet Singh, Facundo Mémoli, and Gunnar E Carlsson. 2007. Topological methods for the analysis of high dimensional data sets and 3d object recognition.. In SPBG. 91–100.

[44] Jian Tang, Jingzhou Liu, Ming Zhang, and Qiaozhu Mei. 2016. Visualizing large-scale and high-dimensional data. In Proceedings of the 25th international conference on world wide web. International World Wide Web Conferences Steering Committee, 287–297.

[45] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web. International World Wide Web Conferences Steering Committee, 1067–1077.

[46] Vincent A Traag, Ludo Waltman, and Nees Jan van Eck. 2019. From Louvain to Leiden: guaranteeing well-connected communities. Scientific reports 9 (2019).

[47] Dmitry Ulyanov. 2016. Multicore-TSNE. https://github.com/DmitryUlyanov/ Multicore-TSNE.

[48] Laurens Van Der Maaten. 2014. Accelerating t-SNE using tree-based algorithms. The Journal of Machine Learning Research 15, 1 (2014), 3221–3245.

[49] Martin Wattenberg, Fernanda ViÃľgas, and Ian Johnson. 2016. How to Use t-SNE Effectively. Distill (2016). https://doi.org/10.23915/distill.00002

[50] Svante Wold, Kim Esbensen, and Paul Geladi. 1987. Principal component analysis. Chemometrics and intelligent laboratory systems 2, 1-3 (1987), 37–52.

[51] Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv:cs.LG/cs.LG/1708.07747

[52] Jaewon Yang and Jure Leskovec. 2015. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems 42, 1 (2015), 181–213.