Deep HyperNetwork-Based MIMO Detection

2020·arXiv

Abstract

Abstract

Optimal symbol detection for multiple-input multiple-output (MIMO) systems is known to be an NP-hard problem. Conventional heuristic algorithms are either too complex to be practical or suffer from poor performance. Recently, several approaches tried to address those challenges by implementing the detector as a deep neural network. However, they either still achieve unsatisfying performance on practical spatially correlated channels, or are computationally demanding since they require retraining for each channel realization. In this work, we address both issues by training an additional neural network (NN), referred to as the hypernetwork, which takes as input the channel matrix and generates the weights of the neural NN-based detector. Results show that the proposed approach achieves near state-of-the-art performance without the need for re-training.

Index Terms—MIMO Detection, Deep Learning, Hypernetworks, spatial channel correlation

I. INTRODUCTION

To keep up with the always increasing mobile user traffic, cellular communication systems have been driven by continuous innovation since the introduction of the first generation in 1979. The attention is now turning from the fifth to the sixth generation, which some predict should be able to deliver data rates up to 1 TB/s with high energy efficiency [1]. A key enabler is to serve multiple single-antenna users on the same time-frequency resource using a base station (BS) equipped with a large number of antennas. However, optimal detection in such multiple-input multiple-output (MIMO) systems is known to be NP-hard [2], and approaches introduced in recent years suffers from unsatisfying performance or become impractical when the number of antennas or users is large. Examples of recent approaches include the iterative algorithm AMP [3] or its extension to correlated channels OAMP [4].

Recently, advances in MIMO detection have been made by using machine learning (ML) in conjunction or in place of standard algorithms [5], [6]. A promising approach is to add trainable parameters to traditional iterative algorithms and interpret the whole structure as a neural network (NN) [7]. However, these schemes still either suffer from a performance drop on correlated channels or from high complexity. One of these approaches is the recently proposed MMNet [8], which achieves state-of-the-art performance on correlated channels. However, it needs to be retrained on each channel realization, which makes its practical implementation challenging.

In this work, we alleviate this issue by leveraging the emerging idea of hypernetworks [9], [10]. Applied to our setup, it consists in having a secondary NN, referred to as the hypernetwork, that generates for a given channel matrix

Fig. 1. HyperMIMO: A hypernetwork generates the parameters of a NNbased detector (MMNet [8] in this work)

an optimized set of weights for an NN-based detector. This scheme, which we refer to as HyperMIMO, is illustrated in Fig. 1. Used with the MMNet detector from [8], HyperMIMO replaces the training procedure that would be required for each channel realization by a single inference of the hypernetwork.

We have evaluated the proposed approach using exhaustive simulations on spatially correlated channels. Our results show that HyperMIMO achieves a performance close to that of MMNet trained for each channel realization, and outperforms the recently proposed OAMPNet [7]. Our results also reveal that HyperMIMO is robust to user mobility up to a certain point, which is encouraging for practical use.

Notations : Matrices and column vectors are denoted by bold upper- and lower-case letters, respectively. is the element of the vector x, and the element of the matrix X. diagpxq is the diagonal matrix composed of the elements of x, and the identity matrix. is the Frobenius norm of X, and its conjugate transpose.

II. BACKGROUND

A. Problem formulation

We consider a conventional MIMO uplink channel. We denote by the number of single-antenna users that aim to reliably transmit symbols from a constellation X to a BS equipped with antennas. The channel transfer function is

where is the vector of transmitted symbols, is the vector of received distorted symbols, is the channel matrix, and is the independent and identically distributed (i.i.d.) complex Gaussian noise with power in each complex dimension. It is assumed that H and are perfectly known to the receiver. The optimal receiver would implement the maximum likelihood detector

Unfortunately, solving (2) is known to be an NP-hard problem due to the finite alphabet constraint [2]. One wellknown scheme is the linear minimum mean squared error (LMMSE) estimator which aims to minimize the mean squared error (MSE)

Because the transmitted symbols are known to belong to the finite alphabet X, the closest symbol is typically selected for each user:

Although sub-optimal, this approach has the benefit of being computationally tractable. Multiple schemes have been proposed to achieve a better performance-complexity tradeoff among which ML-based algorithms form a particularly promising lead.

B. Machine learning-based MIMO detectors

ML has been leveraged to perform MIMO detection in multiple ways. In [5], Chaudhari et al. used an NN to select a traditional detection algorithm from a predefined set. The algorithm with lowest complexity that enables a block error rate (BLER) lower than a predefined threshold is chosen.

Another technique is to design an NN that performs the detection. On example is DetNet [6] which can be viewed as an unfolded recurrent neural network (RNN). Although it achieves encouraging results on Rayleigh channels, DetNet’s performance on correlated channels is not satisfactory and it suffers from a prohibitive complexity. In [11], Mohammad et al. partially addressed this drawback by weights pruning.

A promising approach is to enhance existing schemes by adding trainable parameters. Traditional iterative algorithms are particularly suitable since they can be viewed as NN once unfolded. Typically, each iteration aims to further reduce the MSE and comprises a linear step followed by a non-linear denoising step. The estimate at the iteration is

where the superscript ptq is used to refer to the iteration and is set to denotes the estimated variance of the components of the noise vector at the input of the denoiser, which is assumed to be i.i.d.. Iterative algorithms differ by their choices of matrices , bias vectors , and denoising functions . A limitation of most detection schemes is their poor performance on correlated channels. OAMP [4] mitigates this issue by constraining both the linear step and the denoiser. OAMPNet [7] improves the performance of OAMP by adding two trainable parameters per iteration, which respectively scales the matrix and the channel noise variance . MMNet [8] goes one step further by making all matrices trainable and by relaxing the constraint on being identically distributed. Although MMNet achieves state-of-the-art performance on spatiallycorrelated channels, it needs to be re-trained for each channel matrix, which makes it unpractical.

C. Hypernetworks

Hypernetworks were introduced in [12] as NNs that generate the parameters of other NNs. The concept was first used in [9] in the context of image recognition. The goal was to predict the parameters of a NN given a new sample so that it could recognize other objects of the same class without the need for training. More recently, this same idea was leveraged to generate images of talking heads [10]. In this later work, a single picture of a person is fed to a hypernetwork that computes the weights of a second NN. This second NN then generates realistic images of the same person with different facial expressions. Motivated by these recent achievements, we propose in this work to alleviate the need of MMNet to be retrained for each channel realization using hypernetworks.

III. HYPERMIMO

The key idea of this work is to replace the training process required by MMNet for each channel realization by a single inference through a trained hypernetwork. This section first presents a variation of MMNet which reduces its number of parameters. The second part of this section introduces the architecture of the hypernetwork, where a relaxed form of weight sharing is used to decrease its output dimension. Both reducing the number of parameters of MMNet and weight sharing in the hypernetwork are crucial to obtain a system of reasonable complexity. The combination of the hypernetwork together with MMNet form the HyperMIMO system visible in Fig. 1.

A. MMNet with less parameters

To reduce the number of parameters of MMNet, we leverage the QR-decomposition of the channel matrix, , where Q is an orthogonal matrix and R an upper triangular matrix. It is assumed that , and therefore

where has size . We define and , and rewrite (1) as

Note that . MMNet sets to 0 for all t and uses the same denoiser for all iterations, which are defined by

Fig. 2. Detailed architecture of HyperMIMO

where is an complex matrix whose components need to be optimized for each channel realization. The main benefit of leveraging the QR-decomposition is that the dimension of the matrices to be optimized is instead of , which is the dimension of in (6). This is significant since the number of active users is typically much smaller than the number of antennas of the BS.

The noise at the input of the denoiser is assumed to be independent but not identically distributed in MMNet. The vector of estimated variances at the iteration is denoted by and computed by

(9) where , and needs to be optimized for each channel realization. Further details on the origin of this equation can be found in [4]. The denoising function in MMNet is the same for all iterations, and is chosen to minimize the MSE assuming the noise is independent and Gaussian distributed. This is achieved by applying element-wisely to

where . MMNet consists of T layers performing (8), and a hard decision as in (5) to predict the final estimate . One could also use to predict bitwise log likelihood ratios (LLRs).

B. HyperMIMO architecture

Fig. 2 shows in details the architecture of HyperMIMO. As our variant of MMNet operates on , the hypernetwork is fed with and the channel noise standard deviation . Note that because is upper triangular, only non-zero elements need to be fed to the hypernetwork. Moreover, using this matrix as input instead of H has been to found to be critical to achieve high performance. As detailed previously, the number of parameters that need to be optimized in MMNet was reduced by leveraging the QR-decomposition. To further decrease the number of outputs of the hypernetwork, we adopt a relaxed form of weight sharing inspired by [9]. Instead of computing the elements of each , the hypernetwork outputs a single matrix as well as T vectors . For each iteration is computed by

The idea is that all matrices differ by a per-column scaling different for each iteration. We have experimentally observed that scaling of the rows leads to worse performance.

Because is complex-valued, a R2C layer maps the complex elements of to real ones, by concatenating the real and imaginary parts of the complex scalar elements. To generate a complex-valued matrix , a C2R layer does the reverse operation of R2C.

The hypernetwork also needs to compute the values of the T vectors . Because the elements of these vectors must be positive, a small constant is added and an absolute-value activation function is used in the last layer, as shown in Fig. 2.

HyperMIMO, which comprises the hypernetwork and MMNet, is trained by minimizing the MSE

Note that this loss differs from the one of [8], which is

. When training HyperMIMO, the hypernetwork and MMNet form a single NN, such that the output of the hypernetwork are the weights of MMNet. The only trainable parameters are therefore the ones of the hypernetwork. When performing gradient descent, their gradients are backpropagated through the parameters of MMNet.

IV. EXPERIMENTS

HyperMIMO was evaluated by simulations. This section starts by introducing the considered spatially correlated channel model. Next, details on the simulation setting and training process are provided. Finally, the obtained results are presented and discussed.

A. Channel model

The local scattering model with spatial correlation presented in [13, Ch. 2.6] and illustrated in Fig. 3 is considered. The BS is assumed to be equipped with a uniform linear array of antennas, located at the center of a 120-cell sector in which single-antenna users are dropped with random nominal angles . Perfect power allocation is assumed, leading to all users appearing to be at the same distance r from the BS and an average gain of one. The BS is assumed to be elevated enough to have no scatterers in its near field, such that the scattering is only located around the users. Given a user u, the multipath components reach the BS with normally distributed angles with mean and variance .

Fig. 3. Considered channel model. The BS has no scatters in its near field, and scattering is only located near users.

Fig. 4. Ten randomly generated user drops

For small enough , a valid approximation of the channel covariance matrix is with components

where d is the antenna spacing measured in multiples of the wavelength. For a given user u, a random channel vector is sampled by computing

where e is sampled from and is the eigenvalue decomposition of . The signal-to-noise ratio (SNR) of the transmission is defined by

B. Simulation setting

The number of antennas that equip the BS was set to , and the number of users to . Quadrature phase-shift keying (QPSK) modulation was considered. The standard deviation of the multipath angle distribution was set to , which results in highly correlated channel matrices. The number of layers of MMNet in the HyperMIMO detector was set to . The hypernetwork was made of 3 dense layers (see Fig. 2). The first layer had a number of units matching the number of inputs, the second layer 75 units, and the last layer a number of units corresponding to the number of parameters required by the detector. The first two dense layers

Fig. 5. symbol error rate (SER) achieved by different schemes

used exponential linear unit (ELU) activation functions, and the last dense layer linear activation functions.

Our experimentations revealed that training with randomly sampled user drops leads to sub-optimal results. Therefore, HyperMIMO was trained with fixed channel statistics, i.e., fixed user positions. If this might seem unpromising, our results show that HyperMIMO is still robust to user mobility (see Section IV-C). Moreover, our scheme only has more parameters than MMNet as proposed in [8], which allows it to be quickly re-trained in the background when the channel statistics change significantly. Note that this is different from MMNet that needs to be retrained for each channel matrix, which is considerably more computationally demanding. Moreover, it is possible that further investigations on the hypernetwork architecture alleviate this issue.

Given a user drop, HyperMIMO was trained by randomly sampling channel matrices H, SNRs from the range [0,10]dB, and symbols from a QPSK constellation for each user. Training was performed using the Adam [14] optimizer with a batch size of 500 and a learning rate decaying from to .

C. Simulation results

All presented results were obtained by averaging over 10 randomly generated drops of 6 users, shown in Fig. 4. Fig. 5 shows the SER achieved by HyperMIMO, LMMSE, OAMPNet with 10 iteration, MMNet with 10 iterations and trained for each channel realization, and the maximum likelihood detector. As expected, MMNet when trained for each channel realization achieves a performance close to that of maximum likelihood. One can see that the performance of OAMPNet are close to that of LMMSE on these highly correlated channels. HyperMIMO achieves SER slightly worse than MMNet, but outperforms OAMPNet and LMMSE. More precisely, to achieve a SER of , HyperMIMO exhibits a loss of 0.65dB compared to MMNet, but a gain of 1.85dB over OAMPNet and 2.85dB over LMMSE.

Fig. 6. SER achieved by the compared approaches under mobility

The robustness of HyperMIMO to user mobility was tested by evaluating the achieved SER when users undergo angular mobility (Fig. 6a) or move in random 2D directions (Fig. 6b) from the positions for which the system was trained. Fig. 6a was generated by moving moving all users by a given angle, and evaluating HyperMIMO for these new users positions (and therefore new channel spatial correlation matrices) without retraining. Note that averaging was done over the two possible directions (clockwise or counterclockwise) for each user. One can see that the SER achieved by HyperMIMO gracefully degrades as the angular displacement increases, and never get worse thant LMMSE nor OAMPNet.

Fig. 6b was generated by randomly moving the users in random 2D directions. Users were located at an initial distance of . The SER was computed by averaging over 100 randomly generated displacements. As in Fig. 6a, the SER achieved by HyperMIMO gracefully degrades as the displacement distance increases. These results are encouraging as they show that, despite having being trained for a particular set of user positions, HyperMIMO is robust to mobility.

V. CONCLUSION

This work proposed to leverage the recent idea of hypernetworks to alleviate the need for retraining ML-based MIMOdetector for each channel realization, while still achieving competitive performance. The proposed system, referred to as HyperMIMO, uses a variation of the state-of-the-art MMNet detector [8]. To reduce the complexity of the hypernetwork, MMNet was modified to decrease its number of trainable parameters, and a form of weights sharing was leveraged. Simulations revealed that HyperMIMO achieves near state-of-the-art performance under highly correlated channels when trained on fixed user positions. We also show that its performance degrades slowly under user mobility, indicating that it is sufficient to re-train our scheme in the background when the channel statistics change significantly.

REFERENCES

[1] K. B. Letaief, W. Chen, Y. Shi, J. Zhang, and Y. A. Zhang, “The Roadmap to 6G: AI Empowered Wireless Networks,” IEEE Commun. Mag., vol. 57, no. 8, pp. 84–90, Aug 2019.

[2] A. D. Pia, S. S. Dey, and M. Molinaro, “Mixed-Integer Quadratic Programming is in NP,” Math. Program., vol. 162, no. 1–2, p. 225–240, Mar. 2017.

[3] C. Jeon, R. Ghods, A. Maleki, and C. Studer, “Optimality of Large MIMO Detection via Approximate Message Passing,” IEEE Int. Symp. on Inf. Theory (ISIT), pp. 1227–1231, Jun 2015.

[4] J. Ma and L. Ping, “Orthogonal AMP,” IEEE Access, vol. 5, pp. 2020– 2033, 2017.

[5] S. Chaudhari, H. Kwon, and K.-B. Song, “Reliable and Low-Complexity MIMO Detector Selection using Neural Network,” arXiv:1910.05369, Oct 2019.

[6] N. Samuel, T. Diskin, and A. Wiesel, “Deep MIMO Detection,” arXiv:1706.01151, Jun 2017.

[7] H. He, C.-K. Wen, S. Jin, and G. Y. Li, “A Model-Driven Deep Learning Network for MIMO Detection,” IEEE Global Conf. on Signal and Inf. Process. (GlobalSIP), pp. 584–588, 2018.

[8] M. Khani, M. Alizadeh, J. Hoydis, and P. Fleming, “Adaptive Neural Signal Detection for Massive MIMO,” arXiv:1906.04610, Jun 2019.

[9] L. Bertinetto, J. a. F. Henriques, J. Valmadre, P. Torr, and A. Vedaldi, “Learning Feed-Forward One-Shot Learners,” in Advances in Neural Inf. Process. Syst. 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, Eds., 2016, pp. 523–531.

[10] E. Zakharov, A. Shysheya, E. Burkov, and V. Lempitsky, “Few- Shot Adversarial Learning of Realistic Neural Talking Head Models,” arXiv::1905.08233, May 2019.

[11] A. Mohammad, C. Masouros, and Y. Andreopoulos, “Complexity- Scalable Neural Network Based MIMO Detection With Learnable Weight Scaling,” arXiv:1909.06943, Sep 2019.

[12] D. Ha, A. Dai, and Q. V. Le, “Hypernetworks,” arXiv:1609.09106, Jun 2016.

[13] E. Björnson, J. Hoydis, and L. Sanguinetti, “Massive MIMO Networks: Spectral, Energy, and Hardware Efficiency,” Foundations and Trends Rin Signal Processing, vol. 11, no. 3-4, pp. 154–655, 2017. [Online]. Available: http://dx.doi.org/10.1561/2000000093

[14] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2015, pp. 1–15.

Designed for Accessibility and to further Open Science