b

DiscoverSearch
About
My stuff
Region adaptive graph fourier transform for 3d point clouds
2020·arXiv
ABSTRACT
ABSTRACT

We introduce the Region Adaptive Graph Fourier Transform (RAGFT) for compression of 3D point cloud attributes. The RA-GFT is a multiresolution transform, formed by combining spatially localized block transforms. We assume the points are organized by a family of nested partitions represented by a rooted tree. At each resolution level, attributes are processed in clusters using block transforms. Each block transform produces a single approximation (DC) coefficient, and various detail (AC) coefficients. The DC coefficients are promoted up the tree to the next (lower resolution) level, where the process can be repeated until reaching the root. Since clusters may have a different numbers of points, each block transform must incorporate the relative importance of each coefficient. For this, we introduce the Q-normalized graph Laplacian, and propose using its eigenvectors as the block transform. The RA-GFT achieves better complexity-performance trade-offs than previous approaches. In particular, it outperforms the Region Adaptive Haar Transform (RAHT) by up to 2.5 dB, with a small complexity overhead.

Index Termsgraph fourier transform, 3D point cloud compression, graph signal, multiresolution transform, block transform

Driven by applications in virtual and augmented reality, remote sensing, and autonomous vehicles, it is now possible to capture, in real time and at low cost, time varying 3D scenes, public spaces with moving objects, and people. The preferred representation for such data are 3D point clouds, which consist of 1) a list of 3D point coordinates, and 2) attributes associated with those coordinates, such as color. In many applications, large point clouds need to be compressed for storage and transmission, leading to the recent development of a standard by the moving pictures expert group (MPEG) [1].

We present a new transform for attribute compression, which often takes up more than half of the overall bit budget for typical point clouds. Motivated by transforms used in image, video and point cloud compression [2, 3, 4], we construct our transform with the goal of achieving three fundamental properties: 1) orthogonality, 2) a constant basis function, and 3) low complexity. Orthogonality ensures that errors in the transform and point cloud attribute domains are equal. A constant basis function guarantees that an attribute with the same value at all points (the smoothest signal) has the most compact representation (1-sparse signal) in the transform domain. Finally, for the transform to scale to point clouds with a large number of points, N, we require a complexity of O(N log N), instead of a naive implementation that uses matrix vector products, which would require  O(N 2)complexity, if the transform is available explicitly, or  O(N 3), if it has to be obtained via eigendecomposition.

We propose the Region Adaptive Graph Fourier Transform (RAGFT), a multiresolution transform formed by combining spatially localized block transforms, where each block corresponds to a cube in 3D space. The points are organized as a set of nested partitions represented by a tree. Leaf nodes represent points in the original point cloud, while each internal node represents all points within the corresponding subtree (see Fig. 1). Resolution levels are determined by the levels of the tree, with higher resolution corresponding to the deepest level (representing single points), and the coarsest resolution corresponding to the root (representing all points). At each resolution level, attributes belonging to the same group (points that have the same parent in the tree) are passed through an orthogonal transform for decorrelation. Each block transform produces a single approximation (DC) coefficient, and several detail (AC) coefficients. The DC coefficients are promoted to a lower resolution level where the same process can be repeated until reaching the root1.

Since each internal node in the tree represents a set, possibly containing a different number of points, the block transforms should incorporate the relative importance of the nodes, based on their respective number of descendants. To address this issue, we propose a new graph Fourier transform (GFT) given by the eigenvectors of the Q-normalized graph Laplacian  LQ = Q−1/2LQ−1/2, whereQ is a diagonal matrix with diagonal entries containing the number of descendants of each node, and L is the combinatorial graph Laplacian. In contrast to the normalized or combinatorial Laplacian matrices [5], our new variation operator encodes the local geometry (distances between points) in L as well as the relative importance of a given set of points. The proposed transform is closely related to the Irregularity Aware Graph Fourier Transform (IAGFT) [6]. Our code is available at: https://github.com/STAC-USC/RA-GFT.

Multiresolution decompositions for point cloud coding have been proposed based on graph filter bank theory [7, 8, 9]. However, they often lack orthogonality, and the multiresolution representations are built through complex graph partitioning and reduction algorithms, which make them impractical for large point clouds. A Haar-like basis was proposed in [10] for any data that can be represented by a hierarchical tree. This construction is orthogonal and has a constant basis function. Although the other basis functions are spatially localized, they do not exploit local geometry information (distances between points). In addition, there is no efficient algorithm for computing transform coefficients, and matrix vector products need to be used. Also inspired by the Haar transform, [11] proposed sub-graph based filter banks, where a graph is partitioned into connected sub-graphs. For each sub-graph, a local Laplacian based GFT is used. Although [11] and the RA-GFT follow a similar strategy based on nested partitions, the graph construction, transforms, coefficient arrangement, and design goals are different.

The RA-GFT can be applied to any type of dataset as long as a nested partition is available. For 3D point clouds, a natural choice is the octree decomposition [12, 13], which can be used to implement RA-GFT for point clouds with O(N log N) complexity. This data structure has already been used to design transforms for point cloud attributes [4, 14, 15, 16, 17]. In the block based graph Fourier transform (block-GFT) [14], the voxel space is partitioned into small blocks, a graph is constructed for the points in each block, and the corresponding graph Fourier (GFT) transform is used to represent attributes in the block. Another popular approach is RAHT [4], which is a multiresolution transform formed by the composition of  2 × 2orthogonal transforms. The block-GFT can achieve excellent performance if the block size is large enough (more points per block), but this has a significant computational cost, since it requires computing GFTs of graphs with possibly hundreds of points. On the other hand, RAHT has an extremely fast implementation, with a competitive coding performance. Our proposed RA-GFT combines ideas from block-GFT and RAHT: it generalizes the block-GFT approach to multiple levels, while RAHT can be viewed as a special case of RA-GFT that is separable and uses only  2 × 2 × 2blocks. We demonstrate through point cloud attribute compression experiments that when RA-GFT is implemented with small block transforms, it can outperform RAHT by up to 2.5dB, with a small computational overhead. When the RA-GFT is implemented with larger blocks, we outperform the block-GFT with a negligible complexity overhead.

The rest of this paper is organized as follows. Section 2 introduces the RA-GFT for arbitrary datasets, while Section 3 explains how to implement it for 3D point clouds. Compression experiments and conclusions are presented in Sections 4 and 5 respectively.

image

Fig. 1: Nested partition represented by a hierarchical tree. Each level of the tree partitions the nodes in the next level. Thus at level  ℓ = 1,v11, v12, and v13 respectively represent the sets  V11 = {v1, v2, v3},V12 = {v4, v5, v6, v7}, and V13 = {v8, v9}, while at level  ℓ = 0,v01 represents the set  V01 = {v11, v12, v13}.

2.1. Notation and preliminaries

We use lowercase normal (e.g.  α, a), lowercase bold (e.g. v) and uppercase bold (e.g. A) for scalars, vectors and matrices respectively. Vectors and matrices may also be denoted using their entries as x = [xi], or A = [aij]. Let G = (V, E, W)denote a weighted undirected graph with vertex set V, edge set E and edge weight matrix W ≥ 0. An edge weight is positive, that is  wij = wji > 0 if andonly if  (i, j) ∈ E. The graph has |V| = n nodes. Let  D = diag(di),with  di = �j wij, be the n × ndegree matrix and let  L = D − Wbe the  n × ncombinatorial graph Laplacian matrix. L is symmetric positive semidefinite, and has eigendecomposition  L = ΦΛΦ⊤,where the eigenvalues matrix is  Λ = diag(λ1, · · · , λn), and theeigenvalues are  λ1 = 0 ≤ λ2, · · · , ≤ λn. For connected graphs, λ2 > 0. The eigenvector associated to  λ1 is (1/√n)1.

2.2. RA-GFT block transform

The Region-Adaptive Graph Fourier Transform (RA-GFT) is an orthonormal transform formed from the composition of smaller block transforms. We start by describing the latter. Let G = (V, E, W, Q) denote a graph as defined in Section 2.1. In addition, define the n×nnode weight matrix  Q = diag(qi) (qi > 0), the Q-normalized Laplacian  LQ, and its eigendecomposition are given by

image

where  Φis the matrix of eigenvectors of  LQ and Λ = diag(λi) isthe matrix of eigenvalues. Since  LQis symmetric and positive semi-definite,  λi ≥ 0 and Φis orthonormal. Moreover, if we order the eigenvectors by their eigenvalues, and assume a connected graph, we have  0 = λ1 < λ2 ≤ · · · ≤ λnand the first eigenvector is proportional to  φ0 = [√q1, . . . , √qn]⊤. Hence Φ⊤ maps the vector  φ0to  [√q1 + · · · + qn, 0, . . . , 0]. We define Φ⊤ to be the elementary block transform of the RA-GFT with inverse  Φ.

2.3. Relation to other transforms

Relation to RAHT. As a special case, consider the two-node graph with  V = {v1, v2, E = {(1, 2)}, edge weights  w12 = w21 = 1,and node weights  q1 > 0 and q2 > 0. Then L =� 1 −1−1 1�, andQ = diag(q1, q2), hence the Q-normalized Laplacian is

image

where  a = √q1/√q1 + q2, b = √q2/√q1 + q2, and λmax =q−11 + q−12 . The 2 × 2 matrix Φ⊤ =� a b−b a�is the RAHT butterfly [4]. Hence RAHT is the  2 × 2case of the RA-GFT.

Relation to IAGFT. Consider  Z = Q−1L, the fundamental matrix of the (L, Q)-IAGFT [6]. Clearly Z is related to  LQ by asimilarity transform [6, Remark 1]:  Z = Q−1/2LQQ1/2. ThusZ and LQhave the same set of eigenvalues,  Λ, and the matrix U of eigenvectors of Z is related to  Φ by U = Q−1/2Φ, thatis,  Z = UΛU−1 = Q−1/2(ΦΛΦ⊤)Q1/2.It can be shown [6, Thm. 1] that the columns of U are orthonormal under the Q-norm, i.e.,  U⊤QU = I. The IAGFT is defined as  F = U−1. So F =U⊤Q = Φ⊤Q1/2, or Φ⊤ = FQ−1/2. Hence the IAGFT is a block transform of the RA-GFT applied to a  Q1/2-weighted signal.

2.4. Definition of full RA-GFT

We now show how the RA-GFT block transforms are composed to form the full RA-GFT. Consider a list of N points V, a real-valued attribute signal a and node weights q on those points:

image

Here,  viis an abstract point, and can be considered a node or vertex of a discrete structure. Now suppose we are given a hierarchical partition or nested refinement of the points, as illustrated in Fig. 1. Let L be the depth of the hierarchy, and for each level  ℓfrom the root (ℓ = 0) to the leaves (ℓ = L), let vℓi be the ith of Mℓ nodes at level ℓ. At level L, we have ML = N and vLi = vi. For level  ℓ < L, letCℓi ⊆ {1, . . . , Mℓ+1}denote the indices of the children of node  vℓi

image

Fig. 2: RA-GFT for point cloud represented by tree structured partition of Figure 1. Circles represent DC coefficients that are further processed by block transforms, while squares contain RA-GFT coefficients.

and let  Dℓi ⊆ {1, . . . , N}denote the indices of the descendants of vℓi. We are also given for all  ℓ < L and i ∈ {1, . . . , Mℓ}, a graphGℓi = (Vℓi , Eℓi , Wℓi, Qℓi), where Vℓi = {vℓ+1k : k ∈ Cℓi }is the set of children of node  vℓi, Eℓi is a set of edges between the children,  Wℓiis a matrix of edge weights, and  Qℓi = diag(qℓ+1k : k ∈ Cℓi ) is thediagonal matrix of child node weights, where  qℓ+1k = �j∈Dℓ+1k qjis the sum of the weights of all  nℓ+1k = |Dℓ+1k |descendants of child vℓ+1k . Let Φ⊤ℓ,i be the RA-GFT block transform of the graph  Gℓi .

The full  N × N RA-GFT Tis a composition of  N × N or-thogonal transforms,  ˆa = Ta = T0T1 · · · TL−1a, with inverse

image

is an  N × Northonormal block diagonal matrix and block  Tℓ,i isan  nℓi × nℓi orthonormal matrix for transforming the  nℓi descendants of the  ith node vℓi at level ℓ. Specifically,

image

where  Pℓ,i is an nℓi × nℓi permutation that collects the lowpass co- efficients of the child nodes  vℓ+1k , k ∈ Cℓi for processing by the RA-GFT block transform  Φ⊤ℓ,i to produce lowpass and highpass co- efficients for parent node  vℓi. When the RA-GFT is implemented using all levels of the tree, it produces a single approximation or low pass (DC) coefficient, and  N − 1detail or high pass (AC) co-efficients. Figure 2 depicts the application of the RA-GFT for the nested partition depicted in Figure 1. It can be seen inductively that T maps the signal  [√qi]to a single lowpass (DC) coefficient equal to  (�i qi)1/2, while all other, highpass (AC), coefficients are equal to 0. Thus the first, lowpass (DC) basis function of T is proportional to  [√qi]. In the usual case when  qi = 1 for all i, the firstbasis function of T is constant, as desired. This can be verified using Figure 2. Assume the attributes  aiand weights  qiare all equal to 1. Since the block transforms at level  ℓ = 1have sizes 3, 4 and 2, the only non zero coefficients at level  ℓ = 1 are ˆa11,0 =√3,ˆa12,0 = √4 and ˆa13,0 = √2. Then at level  ℓ = 0, the weight matrix is  Q00 = diag(3, 4, 2), hence the first column of  Φ0,1 is(1√9)[√3,√4,√2]⊤. Then the only nonzero transform coefficient produced at level  ℓ = 0 is ˆa01,0 =√9. Therefore the RA-GFT of the a = 1 vector is a 1-sparse vector.

In a point cloud, the vertices  V = [vi]represent the coordinates (x, y, z) of real points in space; the attributes  a = [ai] representcolors or other attributes of the points; and the weights  q = [qi] rep-resent the relative importance of the points. The weights are usually set to be constant (qi = 1), but may be adjusted to reflect different regions of interest [18]. We assume points are voxelized. A voxel is a volumetric unit of the domain of a 3-dimensional signal, analogous to a pixel in the 2-dimensional case. Let J be a positive integer, and partition the space into  2J × 2J × 2J voxels. We say a point cloud is voxelized with depth J if all the point coordinates take values in the integer grid  {0, 1, . . . , 2J − 1}. A voxelized point cloud can be organized into a hierarchical structure. The process is described in [4, 17]. The voxel space  {0, 1, . . . , 2J − 1}3 is hierarchically partitioned into sub-blocks of size  bℓ × bℓ × bℓ, where bℓis a power of 2. These block sizes allow for a hierarchy with L levels, where �Lℓ=1 bℓ = 2J. Levels are ordered according to resolution. The par- tition is constructed by generating a point cloud for each resolution level. That is, beginning with  VL = V, we have for  ℓ < L:

image

where the unique function removes points with equal coordinates. Each point  vℓi in Vℓhas children

image

With the children we form a graph  Gℓi = (Vℓi , Eℓi , Wℓi, Qℓi), wherethere is an edge between nodes  vℓ+1j , vℓ+1k ∈ Vℓi if the distance between  vℓ+1j and vℓ+1kis less than a threshold, in which case the weight  wjkis set to a decreasing function of that distance. Using this hierarchy and set of graphs, the RA-GFT is constructed and applied to the attributes. The point hierarchies described in (6) and (7) can be obtained in O(N log N) time [4, 17]. At resolution  ℓ+1 we needto construct  Mℓ = O(N)block transforms. For constant block sizes (bℓ = O(1)), each block transform can be obtained and applied in O(1) time. Since there are L = O(log N) resolution levels, and each level is O(N), the RA-GFT has complexity O(N log N).

In this section, we evaluate the RA-GFT in compression of color at- tributes of the “8iVFBv2” point cloud dataset2 [19], which consists of four sequences: “longdress”, “redandblack”, “loot” and “soldier”, and compare its performance to block-GFT [14] and RAHT [4]. Colors are transformed from RGB to YUV spaces, and each of the Y, U, and V components are processed independently by the transform. For all transforms, we perform uniform quantization and entropy code the coefficients using the adaptive run-length Golomb-Rice algorithm [20]. Distortion for the Y component is given by

image

image

Table 1: Complexity estimate of the RA-GFT and block-GFT as a function of the block size.

image

Fig. 3: Distortion rate curves for color compression.

where T = 300 is the number of frames in the sequence,  Nt is thenumber of points in the t-th frame, and  Yt and ˆYtrepresent the original and decoded signals of frame t. Rate is reported in bits per voxel [bpv]  R =��Tt=1 Bt�/��Tt=1 Nt�, where Btis the number of bits used to encode the YUV components of the t-th frame.

4.1. Compression of color attributes

Each point cloud in the 8iVFB dataset is voxelized with depth J = 10. Therefore  L ≤ 10and its value will depend on the block sizes. We implement several RA-GFTs, each with a different block size at the highest resolution (level L), but with the same block sizes  bℓ = 2for all other resolutions (ℓ < L). When bLis equal to 2, 4, 8, and 16, L is equal to 10, 9, 8 and 7 respectively. For the block-GFT we choose block sizes b = 8 and b = 16. Graphs are constructed by adding edges if the distance between a pair of point coordinates is below a fixed threshold, while edge weights are the reciprocal of the distance, as in [14]. Distortion rate curves are shown in Figure 3.

The RA-GFT provides substantial gains over RAHT. When the block size is smallest  bL = 2, the corresponding RA-GFT outperforms RAHT up to 0.5db for the “longdress” sequence, and up to 1db for the “loot” sequence. Similar results were obtained for other sequences, not shown due to lack of space. Coding performance improves as the block size  bLincreases, up to 2.5 dB over RAHT on both sequences. The block-GFT also has this property, however, it requries a large block size (b = 16) to consistently outperform the RAHT for all sequences. This could occur because for smaller blocks, the block-GFT DC coefficients may still be highly correlated. Since the RA-GFT with small blocks can be viewed as an extension of the block-GFT to multiple levels, the transform coeffi-cients of the proposed transform are less correlated.

4.2. Complexity analysis

At each level of RA-GFT, multiple GFTs of different sizes are constructed, so that the overall computational complexity is dominated by the number and size of those transforms. At resolution level  ℓ,the ith transform is a  |Vℓi |×|Vℓi |matrix, which is obtained by eigen- decomposition in roughly  |Vℓi |3operations. As a proxy for the total number of operations required by the RA-GFT we use

image

Eq. (8) is consistent with Section 3, since  |Vℓi |3 = O(1), Mℓ =O(N) and L ≤ log(N). We consider a collection of T point clouds. The t-th point cloud has  Ntpoints, and the quantity (8) for the RAGFT on that point cloud is denoted by  Kt. The complexity proxy for the RA-GFT with a given tree structured nested partition is defined as  C = (�Tt=1 Kt)/(�Tt=1 Nt). We compute this quantity for the RA-GFT and block-GFT depicted in Figure 3. Our results are shown in Table 1, and are computed from the first 10 point clouds of each sequence of the “8iVFB” dataset, for a total of T = 40 point clouds. For  bL = 8, the increase in complexity from the block-GFT to the RA-GFT is only 2.5%; for  bl = 16the increase is negligible. More importantly, for smaller blocks, the complexity of the RA-GFT is orders of magnitude lower than of block-GFT.

By allowing multiple block sizes, and multiple levels of resolution, the proposed RA-GFT can be viewed as a generalization of the block-GFT and the RAHT, reaching coding efficiency comparable to block-GFT, with computational complexity slightly higher than RAHT. Thefore, the RA-GFT achieves better performancecomplexity trade-offs than these transforms. By using a nonseparable transform on larger blocks the RA-GFT can exploit local geometry more efficiently than the RAHT. By applying transforms with small blocks at multiple resolutions, the RA-GFT can approach the performance of the block-GFT with a reduced complexity. In addition, for large transform blocks at resolution L, the RA-GFT outperforms the block-GFT, with a negligible complexity increase.

[1] S. Schwarz, M. Preda, V. Baroncini, M. Budagavi, P. Ce- sar, P.A. Chou, R.A. Cohen, M. Krivoku´ca, S. Lasserre, and Z. Li, “Emerging MPEG standards for point cloud compression,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 1, pp. 133–148, 2018.

[2] M. Vetterli and J. Kovacevic, Wavelet and Subband Coding, Englewood Cliffs, NY: Prentice-Hall, 1995.

[3] N. Ahmed, T. Natarajan, and K.R. Rao, “Discrete cosine trans- form,” IEEE transactions on Computers, vol. 100, no. 1, pp. 90–93, 1974.

[4] R.L. De Queiroz and P.A. Chou, “Compression of 3D point clouds using a region-adaptive hierarchical transform,” IEEE Transactions on Image Processing, vol. 25, no. 8, pp. 3947– 3956, 2016.

[5] D.I Shuman, S.K. Narang, P. Frossard, A Ortega, and P. Van- dergheynst, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,” Signal Processing Magazine, IEEE, vol. 30, no. 3, pp. 83–98, May 2013.

[6] B. Girault, A. Ortega, and S.S. Narayanan, “Irregularity-aware graph fourier transforms,” IEEE Transactions on Signal Processing, vol. 66, no. 21, pp. 5746–5761, 2018.

[7] A. Anis, P.A. Chou, and A. Ortega, “Compression of dynamic 3D point clouds using subdivisional meshes and graph wavelet transforms,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016, pp. 6360–6364.

[8] D. Thanou, P.A. Chou, and P. Frossard, “Graph-based com- pression of dynamic 3D point cloud sequences,” IEEE Transactions on Image Processing, vol. 25, no. 4, pp. 1765–1778, 2016.

[9] P.A. Chou, M. Koroteev, and M. Krivoku´ca, “A volumetric approach to point cloud compression, part I: Attribute compression,” IEEE Transactions on Image Processing, 2019.

[10] M. Gavish, B. Nadler, and R.R. Coifman, “Multiscale wavelets on trees, graphs and high dimensional data: theory and applications to semi supervised learning,” in Proceedings of the 27th International Conference on International Conference on Machine Learning, 2010, pp. 367–374.

[11] N. Tremblay and P. Borgnat, “Subgraph-based filterbanks for graph signals,” IEEE Transactions on Signal Processing, vol. 64, no. 15, pp. 3827–3840, 2016.

[12] C.L. Jackins and S. L. Tanimoto, “Oct-trees and their use in representing three-dimensional objects,” Computer Graphics and Image Processing, vol. 14, no. 3, pp. 249–270, 1980.

[13] D. Meagher, “Geometric modeling using octree encoding,” Computer graphics and image processing, vol. 19, no. 2, pp. 129–147, 1982.

[14] C. Zhang, D. Florencio, and C. Loop, “Point cloud attribute compression with graph transform,” in 2014 IEEE International Conference on Image Processing (ICIP). IEEE, 2014, pp. 2066–2070.

[15] R.A. Cohen, D. Tian, and A. Vetro, “Attribute compression for sparse point clouds using graph transforms,” in 2016 IEEE International Conference on Image Processing (ICIP). IEEE, 2016, pp. 1374–1378.

[16] R.A. Cohen, D. Tian, and A. Vetro, “Point cloud attribute com- pression using 3-D intra prediction and shape-adaptive transforms,” in 2016 Data Compression Conference (DCC). IEEE, 2016, pp. 141–150.

[17] E. Pavez, P.A. Chou, R.L. De Queiroz, and A. Ortega, “Dy- namic polygon clouds: representation and compression for VR/AR,” APSIPA Transactions on Signal and Information Processing, vol. 7, 2018.

[18] G. P. Sandri, V. Figueiredo, P. A. Chou, and R. L. de Queiroz, “Point cloud coding incorporating region of interest coding,” in 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 2019.

[19] E. dEon, B. Harrison, T. Myers, and P.A. Chou, “8i voxelized full bodies-a voxelized point cloud dataset,” ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) input document WG11M40059/WG1M74006, 2017.

[20] H.S. Malvar, “Adaptive run-length/Golomb-Rice encoding of quantized generalized gaussian sources with unknown statistics,” in Data Compression Conference (DCC’06). IEEE, 2006, pp. 23–32.


Designed for Accessibility and to further Open Science