Distributed Learning for Channel Allocation Over a Shared Spectrum

2019·Arxiv

Abstract

Abstract

I. INTRODUCTION

Channel allocation in wireless communication is one of the fundamental management tasks. It and has been widely studied for various wireless networks [1]–[5]. In the traditional centralized systems, Orthogonal Frequency Division Multiplexing Access (OFDMA) was investigated extensively to meet the high demand for efficient spectrum utilization. If users can be assigned to sub-channels efficiently, certain gains can be derived from the diversity of the channel. The main issue for the OFDMA systems is joint power and sub-carrier allocation in the downlink direction [6]–[9] and sub-carrier assignment in the uplink direction [10]–[12]. Due to the global view of the whole network, the centralized approach is able to obtain the optimal

S.~M.~ Zafaruddin was with the Faculty of Engineering, Bar-Ilan University, Ramat Gan 5290002, Israel (e-mail: smzafar@biu.ac.il). Currently, he is with the Department of Electrical and Electronics Engineering, BITS Pilani, Pilani-333031 (email: syed.zafaruddin@pilani.bits-pilani.ac.in). I. Bistritz is with the Faculty of Engineering, Bar-Ilan University, Ramat Gan 5290002, Israel (e-mail: ilaibist@gmail.com). A. Leshem is with the Faculty of Engineering, Bar-Ilan University, Ramat Gan 5290002, Israel (e-mail: leshema@eng.biu.ac.il). Dusit (Tao) Niyato is with School of Computer Science and Engineering (SCSE), Nanyang Technological University, Singapore 639798. (e-mail: dniyato@ntu.edu.sg).

This research was supported by the ISF-NRF Joint research Program, under grant ISF 2277/16. S. M. Zafaruddin was partially funded by the Israeli Planning and Budget Committee (PBC) post-doctoral fellowship.

solution of a desired performance metric. The optimal channel allocation can be computed using the well-known Hungarian method [13].

However, there are some disadvantages that limit the practicality of the centralized approach such as significant signaling overhead, increased implementation complexity and higher latency in dealing with resource allocation problems. Moreover, emerging wireless networking paradigms such as cognitive radio networks, ad-hoc networks, and D2D communications are inherently distributed. A complete information about the network state is typically not available online, which makes the computation of optimal policies intractable for these networks. Hence, it is desirable to develop a distributed learning algorithm for dynamic spectrum access that can effectively adapt for general complex real-world settings in dense and heterogeneous wireless environments.

Open sharing model employs spectrum sharing among peer users as the basis for managing a spectral band. Advocates of this model draw support from the phenomenal success of wireless services operating in the unlicensed industrial, scientific, and medical (ISM) radio band (e.g., WiFi). Centralized and distributed spectrum sharing strategies have been initially investigated to address technological challenges under this spectrum management model.

The center of the channel allocation task is the combinatorial optimization assignment problem. Solving the assignment problem distributively is a major challenge that has received considerable attention. The famous auction algorithm [14] proposed a distributed method to solve the assignment problem where users send their bids to an auctioneer. In [15] a fully distributed version of the auction algorithm was suggested that exploits carrier sense multiple access (CSMA) in order to avoid the need for an auctioneer.

If the resources (channels) values are not known in advance by the users, they have to learn these values online. Learning the CSI in real-time comes at the expense of using the best known channels so far. This introduces the well-known trade off between exploration and exploitation, that is captured by the multi-armed bandit (MAB) problem. In this case, there are several decision makers facing this problem, and when two or more choose the same channel, they receive zero reward. Similarly to other MAB problems, the performance is measured by the expected difference between the actual sum of rewards and the sum of rewards that could have been achieved if the users had perfect knowledge of the CSI. However, as opposed to classical MAB problems, the interaction between the users significantly complicates the learning aspects of the problem. To address that, deep reinforcement learning and Q-learning methods have been proposed for these problems [16]–[18], and have been shown to perform well for small-size models. However, for large-scale networks these methods perform poorly since the number of states of the learning algorithm increases exponentially in the number of users.

In [19], the auction algorithm [14] was used as a basis for a distributed algorithm that achieves an expected sum regret of O(log T ). However, since it relies on [14] , this algorithm requires communication between users in order to communicate the bids and deduce which player won each auction. To implement this algorithm, users need to know which user transmitted on which channel. In this manner, they can use their public channel choices as a signaling method. In practice, this knowledge requires that users decode at least part of the transmission to identify the ID of the transmitting users. Besides being computationally demanding, this might be highly non-trivial when multiple users transmit on the same channel and all their IDs need to be decoded from the mixture.

In this paper, we overcome this requirement by proposing a distributed algorithm that relies on [15] instead of [14]. The algorithm in [15] assumes that the CSI is known. It also uses a continuous back-off time and assumes no tied bids. We lift all of these assumptions in our novel MAC protocol. Our protocol achieves an expected sum of regret of O(log T ), but in contrast to [19], only requires each user to sense the channel that the user is using and detect if there are other transmissions on this channel. Users do not need to know which user transmitted on which channel or how many of them did. Therefore, our algorithm offers the same order optimal performance as [19] but with dramatically simpler implementation.

A. Related Works

Developing multi-armed bandit (MAB)-based methods for solving dynamic spectrum allocation (DSA) problems is a relatively new research direction, motivated by recent developments of MAB in various other fields, and many works have been done in this direction recently. A couple of these works [20]–[23] considered a cognitive radio scenario where a set of channels can be either empty or occupied by a primary user that interferes all secondary users. A generalized scenario was considered in [24]–[26], where the channel qualities are not binary, but still all users have the same vector of channel qualities. Recently, the case of a full channel allocation scenario where different users have different channel qualities (a matrix of channel qualities) have been considered in [27], and later improved in [19], by the same authors, to have an order optimal sum-regret of O (log T ).

Recently, it has been shown in [28] (which improved [29]) that achieving a sum-regret of near-O (log T ) is possible even without communication between users and with a matrix of expected rewards. The algorithm in [28] is general and has a slow convergence rate in T that makes it unsuited for realistic communication scenarios. In this paper, we adopt a more practical and communication oriented approach and achieve an order optimal sum-regret of O (log T ). Our algorithm still does not require any communication between users, and each device only needs to sense a single channel at a time (instead of simultaneously all of them as in [19]). It is made possible by adding assumptions that are always valid from a practical perspective - the expected rewards (QoS) are integer multiplications of a common resolution , and a device can choose not to transmit on any channel and instead only to sense a single channel of its choice. Our algorithm is much easier and less costly to implement than that of [19] and has a much better convergence time that that of [28].

The literature on distributed channel allocation without learning, where the CSI is assumed to be known, is vast and we can only cover part of it here. Recently there has been growing interest in distributed spectrum optimization for frequency selective channels, where the assignment problem arises. However, most of the work done in this field relies on explicit exchange of CSI. Several suboptimal approaches that do not require information sharing have been suggested. In [30], a greedy approach to the channel assignment problem was introduced. In [31] and [32], the use of opportunistic carrier sensing was combined with the Gale-Shapley algorithm for stable matching [33] to provide a fully distributed stable channel assignment. This solution basically achieves the greedy channel assignment and analysis of this technique for Rayleigh fading channels was done in [34].

Game theory is often used to design distributed channel allocation algorithms. In [35] the channel assignment problem was formulated as a many-to-one matching game under the limitation that each primary channel can only be assigned to one secondary user. In [36], an algorithm was proposed based on a game with utility design that leads to an asymptotically optimal performance in all Nash equilibria. In [37] the spectrum sharing problem between D2D pairs and multiple co-located cellular networks was formulated as a Bayesian non-transferable utility overlapping coalition formation game. Nash bargaining solutions for channel allocation were considered in [38]–[40], and distributed allocation using multichannel ALOHA and potential games was considered in [41], [42].

The auction algorithm has been extensively used to solve a variety of assignment problems. It gets its name from operating similarly to an auction. As in this paper and many others, the auction algorithm may have nothing to do with actual auctions that rely on economic and game-theoretic principles, as was done in [43]–[46]. In [47] the auction algorithm was used to solve the channel assignment problem for the uplink, using the base station as the auctioneer. In [48] a distributed auction algorithm with shared memory was used for switch scheduling. In [49] it was shown that a modification of the auction algorithm is equivalent to max product belief propagation. However, all these modified auction algorithms require a base station or shared memory, which prevents them from being fully distributed. In addition, all these algorithms, including [15] that is being used here, assume that the CSI is known to the users. Our algorithm generalizes the distributed CSMA auction algorithm [15] to an online learning framework.

B. Outline

The paper is organized as follows: in Section II we describe the system model and our network assumptions. Section III discusses the novel MAC protocol we propose. Section IV and Section V analyze the exploration phase and auction phases of our algorithm, respectively. Section VI provides simulation results of our algorithm on practical LTE channels, together with a performance comparison. Finally, Section VII concludes the paper.

II. SYSTEM MODEL

We consider an Ad-Hoc network with a set of transmitter-receiver pairs (links) N = {1, . . . , N} and a set of channels K = {1, . . . , K}, where . Each channel consists of several OFDMA subcarriers and each link uses a single channel. In the case of more users than channels (N > K), a combined OFDMA-TDMA can be used instead in order to have enough resources for all users. However, since this is a trivial consequence of our analysis which only complicates the notation, we choose to avoid considering TDMA. The number of channels K is chosen by the protocol designer to be large enough to support N links in an environment with outside interferers where some of the channels can be very poor and practically unavailable. The identity and number of subcarriers that constitute each channel can also be optimized with respect to the typical channels used by the significant interferes. Links may use multiple-input multiple-output (MIMO) transmission, with different capabilities for each link. Time is slotted and indexed by t, such that in each time slot L OFDM symbols are transmitted. The number of OFDM symbols per time slot L can be designed to match the coherence time of the channel, such that the CSI typically changes every time slot. Hence, we assume a fast-fading scenario where the coherence time is proportional to an OFDM symbol duration. The links are active for a total of T time slots, where T is unknown in advance by the links. We assume that each link can sense a single channel at each time slot, which is the channel they use, and detect whether other links are transmitting on this channel. The chosen channel of link n at time t is denoted by . Naturally, links can choose not to transmit at all at a given time slot, which is denoted . Non-transmitting links can still sense transmissions on a single chosen channel.

The links are located in a geographical proximity in an area that typically includes other coexisting networks nearby. As a result, each receiver experiences alien interference from the transmission of other protocols. Due to the geometry of the links and the different channels used by different interferers, the average interference is different for each receiver in our network. A toy example of our network with K = N = 6 is depicted in Fig. 1. The channel used by each link is indicated by the color of the arrow between its transmitter and receiver. Outside the area of the network there are four major interferers that use four of the six available channels. In this example, links successfully avoid using channels with significant interference at their receiver side.

This outside transmissions can be constant over time or bursty, and may overlap any part of the subcarriers used by a particular link. In addition, the fading of the channel may cause significant changes to the channel gains of the subcarriers. As any modern device, the transmitter and receiver of each link adopt techniques such as adaptive beamforming and modulation together with interleaving and coding for fading channels in order to provide a stable (on average) and reliable communication for the users. However, since the channel statistics and the interference pattern are initially unknown, each link needs to learn them online as fast as possible in order to deduce which Quality of Service (QoS) it can support. As in any practical system, there is some resolution for the supported QoS (e.g., 100Kbps), we denote by . The supported QoS set is where for each for a non-negative integer and . The QoS experienced by link n using channel i is denoted by . A value in this set may represent the weighted quality of a combination of parameters, e.g., 1Mbps for internet, 256kbps for voice and 10Mbaps for video. In general, different links have a subset of different possible QoS values from Q due to different capabilities, e.g., number of transmitting and receiving antennas. Being part of the standard of the protocol, we assume that the parameters and are known to all links.

In each time slot t, each link measures the instantaneous QoS by using a finer resolution than that of Q, in order for the estimation of the average to be accurate. We model as an i.i.d. sequence in time, independent for different n or i. The distribution of is bounded since , and can be either discrete or continuous due to arbitrarily fine measurements.

Define the set of links that are transmitting on channel i at time t by

Define the no-collision indicator of channel i at time t by

The instantaneous reward of link n at time t from transmitting on channel is

Fig. 1. System Model

The theoretical guarantees of our algorithm are formulated using the well-known notion of regret, defined as follows.

Definition 1. The total regret is defined as the random variable

The expected total regret is the average of (4) over the randomness of the rewards , that dictate the random channel choices .

III. PROTOCOL DESCRIPTION

We design a novel MAC protocol that each link runs distributedly in order to maximize the accumulated sum of QoS. In the original auction algorithm, an auctioneer is needed to collect the bids and compute the highest bidder. Such an auctioneer is not available in a distributed wireless network. The algorithm in [15] exploits the CSMA mechanism to bypass the need for an auctioneer and by doing that, implements the auction algorithm distributedly. For this purpose, links compute a continuous back-off time that is decreasing with their bid. The highest bidder for a particular channel is simply the first link that accesses this channel. Since we assume all links can sense the channel they chose, all links will agree on which link was the highest bidder for their channel. Note that we are not analyzing selfish links but devices that are programmed to run our designed MAC protocol.

The key advantage of our algorithm is that it only requires from each receiver to sense if there are transmissions on a single channel, which is a basic requirement. We assume that all links are of a sensing distance from each other (a fully-connected network). However, as opposed to [19], links do not know which transmission belongs to which link. This is the scenario in practice with wireless links located in close enough proximity. In our protocol, links do not need to distinguish between the transmission of other links, which might have required decoding an ID for each link. Moreover, it can be extremely demanding in practice to separate colliding transmissions and discern the IDs involved. Sensing a single channel at a time instead of all the K channels is another major advantage of our algorithm over [19].

Definition 2. We divide the T time slots into packets with a dynamic length, one starting immediately after the other. Each packet is further divided into three phases. In the k-th packet:

1) Exploration Phase - this phase has a length of time slots in each packet, and is used for estimating the expected

2) Auction Phase - this phase has a length oftime slots in the k-th packet, which is the

3) Exploitation Phase - this phase has a length of time slots for some constant . During this phase, the links

The fact that the exploitation phase takes an exponential number of time slots does not mean it takes a long time in practice. In fact, it only means that the lengths of the exploration and auction phases are much shorter. Note that T is finite and can be limited by the designer, so even the last (longest) exploitation phase can still consist of just a couple of thousands of OFDM symbols, which amounts to only a few milliseconds. From a practical point of view, this is the desirable packet structure since the actual transmission takes the vast majority of the OFDM symbols while the equivalents of the synchronization header do not cause a significant overhead. The overhead caused by the exploration and auction phases is naturally measured by the sum of regrets as in (4). The structure of the k-th packet of our algorithm is depicted in Fig. 2.

Our main Theorem is formulated as follows.

Theorem 3 (Main Theorem). Assume that the instantaneous QoS are independent in n and i.i.d. in time t, with expectations such that for a non-negative integer and a positive , and

. Denote . Let each link run Algorithm 1 with and an exploration phase length of

Then, the expected sum of regrets is .

Fig. 2. The k-th packet of Algorithm 1

used for the CSMA back-off quantization, then the exploitation phase contributes no regret to the sum of regret. Lemma 5 in Section IV, proved in Appendix C, bounds from above the error probability of the exploration phase, showing that it decreases exponentially with k. The proof follows by bounding from above the expected regret using these two facts. For details see Appendix A.

A. Implementation Issues

In the problem formulation, the length of the time slots is not specified. This is done in order to keep the theoretical framework identical to other multi-armed bandits algorithms and measure the regret using the same scale. However, when implementing Algorithm 1 in practice, there is no need to assume that all time slots are of equal length. In particular, the time slots used to implement the CSMA back-off time can be much shorter than time slots that are used to transmit a frame of L OFDM symbols. The result, depicted in Fig. 2, is the well-known structure of a CSMA frame, like that used in WiFi. At the beginning of the k-th frame, a contention window of short slots is used, followed by the transmission in the chosen channel, over L OFDM symbols. During the exploration and exploitation phases, no contention window is needed, which makes the overhead of the contention window negligible compared to T .

We also note that the computational complexity of running Algorithm 1 for each link is O (K), since maximization over a K-sized vectors is required.

IV. EXPLORATION PHASE - ESTIMATION OF THE QOS

In this section, we analyze the performance of the exploration phase and its contribution to the expected sum-regret. The distributed algorithm of [15] assumes each link knows its CSI, or the possible QoS each channel supports. Our algorithm lifts this assumption by working on online estimations of the CSI (or QoS) instead. Each link obtains these estimations by randomly exploring the different K channels and averaging the instantaneous measurements of the QoS of each channel.

The exploration phase does not require the links to know the total number of links N or the total duration of transmission T . Hence, links cannot use a single long enough exploration phase at the beginning, since they want the exploration error

probability to be designed according to T and N. The packet structure in Fig. 2 maintains the required balance. In each packet, only a constant number of time slots is dedicated to exploration, but the estimation of the k-th exploration phase uses all the previous exploration phases.

The estimated QoS of the channels is needed for the next auction phase to converge to the optimal allocation. However, due to its distributed nature, ties cannot be arbitrarily broken. Hence, the exploration phase needs to output accurate enough estimates that guarantee that there will be no ties in the bids in the auction algorithm. For that purpose, after the estimation of the expected QoS is completed, artificial dither noise is added to the estimated values. This dither values are generated in advance independently and uniformly at random on a small interval. The following lemma characterizes the required estimation accuracy of the exploration phase, taking into account the dither noise.

Lemma 4. Denote the dithered estimations of the expected QoS values in packet k by. Assume that

for each link n and channel i for some positive . If then

assignment onand must be identical. For details see Appendix B.

The following lemma concludes this section by providing an upper bound for the probability that the estimation for packet k failed. The fact that this error probability exponentially vanishes with k, allows us to limit the number of exploration time slots to , keeping the overhead caused by the exploration phase negligible.

Lemma 5 (Exploration Error Probability). Denote the dithered estimations of the expected QoS values in packet k by.

For details see Appendix C.

V. AUCTION PHASE - CONVERGING TO THE OPTIMAL ALLOCATION

We adopt the distributed auction in [15] as the basis for our auction phase. The multi-armed bandit problem uses a discrete time axis. Hence, a continuous back-off time as used in [15] is not possible. From a practical perspective, links cannot implement a truly continuous delay but a quantized one. With integer quantized delays, it is possible that two links use the same delay for the same channel although their continuous bids are different. In this case, they cannot agree on which of them won the bid and got the channel. It is clear that for a fine enough quantization, these bidding collisions will be avoided. However, due to the distributed nature of the problem, links do not know in advance what is considered a fine enough quantization. We propose a collision resolution algorithm that increases the quantization bits, described in step 3 in the Auction phase in Algorithm 1. Links coordinate their quantization by employing a “voting turn” that only uses the fact that all links can sense a single channel of their choice. In this special time slot, links listen to channel 1 which is used to signal if a collision occurred for some of the links.

Another issue to be resolved is where the continuous bids of two links m and n are identical, . Since there is no auctioneer, the links cannot agree on an arbitrary tie braking without communication. Hence, identical bids can prevent the CSMA auction algorithm from converging to the optimal solution. In order to avoid this problem, the auction phase uses a noisy version of the estimated expected rewards from the exploration phase. This noise is an artificial dither added by the links independently such that the probability for identical bids will be zero.

Lemma 6. After the k-th exploration phase we have for any and any i, j.

is zero. Since any bid is a linear combination of rewards and , also the

probability that at a certain iteration of the auction algorithm is zero.

We emphasize that Lemma 6, and Lemma 7 below, only help to show (in Lemma 8) that Algorithm 1 eventually converges to the optimal solution. Links start transmitting data from the first packet, using a possibly suboptimal allocation in the exploitation phase. Hence, Algorithm 1 is likely to perform well much before convergence to the optimal allocation occurred. Nevertheless, our simulations in Section VI suggest that convergence to the optimal allocation occurs very fast, already in the first or the second packet.

Lemma 7. Algorithm 1 converges to some final value , i.e., there exists a such that for all .

bits we have . In this case, the links will detect a collision after the auction phase and will increase the number of bits used for quantization. Since is a sum of rewards and some multiplication of , for large enough , we have for any m, n, i, j such that and . Hence, b (k) will not increase above , since collisions between cannot occur with . Collisions from identical bids do not occur simply because their probability is zero, as shown in Lemma 6.

Lemma 8. Assume that for all . If the k-th exploration phase succeeded, then the k-th auction phase converges

to an allocation such that

details see Appendix D.

VI. SIMULATION RESULTS

In this section, we demonstrate the performance of Algorithm 1 using computer simulations. We compared Algorithm 1 with the centralized Hungarian method, random channel selection and the E3 algorithm in [19]. The Hungarian method requires some central entity to know the CSI of all users. Requiring much less information, the E3 algorithm assumes that each user can decode which channel each of the other users has chosen. Our algorithm requires even much less information - each user only needs to sense if there is a transmission on a given channel. The role of the simulations of this section is to show that despite our much stricter information constraints, our algorithm performs almost exactly well as the E3 algorithm and even the optimal Hungarian algorithm. The comparison with the random channel selection assures that an algorithm that does not strive to converge to the optimal allocation performs very badly. This serves to show that the problem is far from being degenerated or trivial.

We verified our algorithm under various network scenarios consisting of different path losses and fading environments. The channel was divided into N sub-channels and we used N = K = 10. The transmit power spectral density (PSD) was fixed at 12dBm for each user. The users were assumed to be moving at a speed of 3km/h. We used a transmission duration of time slots, with a single OFDM symbol per time slot (L = 1). Our transmission packet (see Fig. 2) has an exploration phase of 800 OFDM symbols and an auction phase of 500 OFDM symbols. Each experiment consists of averaging 100 independent realizations.

First, we considered an ad-hoc network of N links that are uniformly distributed on disk with a radius of 500 m. The central carrier frequency was 2 GHz with a per-user transmission bandwidth of 200 KHz. The path loss was computed using path loss exponent of . We considered two types of channel models: i.i.d. Rayleigh fading channel and the extended pedestrian A model (EPA) of the LTE standard with 9 random taps. In Fig. 3 the sum-regret of our algorithm is compared to that of the E3 algorithm [19] under an i.i.d. Rayleigh fading channel. It is evident that the performance of both algorithms is essentially identical, despite the fact that our algorithm uses no communication between users as the E3 algorithm [19] does. Both algorithms have an expected sum-regret that increases like log T and both converge to the optimal allocation already at the first packets. In Fig. 5, we present the spectral efficiency performance of both algorithms together with the confidence intervals of 90% and 95%, where again all performances are very similar between our algorithm and the E3 algorithm [19]. It also shows that the proposed algorithm approaches the optimal performance within a few packets, does much better than a random selection and behaves very similarly in all realizations. We have repeated the above experiment for the more realistic scenario of LTE channels. Fig. 4 again confirms that our performance is identical to that of the E3 algorithm [19].

Next, in Fig. 5 we demonstrate the performance of the proposed algorithm in the presence of alien interference for LTE channels. In this scenario, we considered four interferers that use four out of K = 10 available channels. These interfering nodes are randomly located outside the network disk and within a distance of 500 m from the annular region of the disk. It can be seen from the right graph in Fig. 5 that the spectral efficiency is reduced by ~2 bits/sec/Hz. However, the proposed algorithm achieves the optimal performance within few thousand symbols similar to the interference-free case, as shown in Fig. 4. This scenario again confirms that our performance is identical to that of the E3 algorithm [19].

Finally, we considered a 5G system with more realistic channel scenarios consisting of pathloss, short-term fading, and long-term shadowing. We computed the path loss from empirical models of urban macro (UMa) in the distance range of 45m to 1429m and urban micro-street canyon (UMi-SC) in the distance range of 19m to 272m [51], [52]. The shadowing factor is 6dB and 7.8dB for the UMa and UMi-SC models, respectively. The fading channel consists of tapped delay line (TDL-A) model with 23 taps. The central carrier frequency was 6GHz with a per-user transmission bandwidth of 720KHz. The results in Fig. 6 demonstrate that the all the realistic channel phenomena we simulated do not prevent the proposed algorithm from quickly converging to the optimal solution.

The simulations in this section provide an additional solid support that our algorithm offers the same performance as [19] but with significantly less requirements from the devices. Specifically, we require no information exchange between different links as required in [19] and we only use the sensing of a single channel each time slot instead of sensing all channels simultaneously.

VII. CONCLUSIONS

In this paper, we suggested a distributed algorithm for channel allocation with time varying-channels where links initially have no estimation for the statistics of the channels. Learning the statistics of the channels in real-time (exploration) comes at

Fig. 3. Performance evaluation over i.i.d. Rayleigh fading channel. Simulation parameters are: N = K = 10, explore length= 800 OFDM symbols, and auction length = 500 OFDM symbols.

Fig. 4. Performance evaluation over LTE fading channel. Simulation parameters are: N = K = 10, explore length= 800 OFDM symbols, and auction length = 500 OFDM symbols.

the expanse of using the best known channels (exploitation). The scenario is described by a multi-armed bandit game where a collision occurs if two or more links transmit on the same channel. We proved that our algorithm achieves the optimal order of regret - O (log T ). Our algorithm is based on a distributed auction algorithm that uses CSMA to avoid the need for an auctioneer (base station). In contrast to the state-of-the-art algorithms, our algorithm requires neither centralized management nor any communication between different links, which makes it very relevant to cognitive ad-hoc networks. Our algorithm only requires sensing a single channel at each time slot, which is K times less than the state-of-the-art algorithms, where K is the number of channels. Only a detection of whether there are transmissions on this channel is required, and no decoding and demixing operations are needed to discern which user chose which channel. From a practical point of view, this results in a significant complexity reduction of the physical layer design. Simulations show that our algorithm performs very well on realistic LTE and 5G channels.

Fig. 5. Performance evaluation over LTE fading channel with alien interference. Simulation parameters: N = K = 10, explore length= 500 OFDM symbols, and auction length = 500 OFDM symbols.

Fig. 6. Performance evaluation over 5G fading channels with different path loss models. Simulation parameters: N = K = 10, explore length= 800 OFDM symbols, and auction length = 500 OFDM symbols.

We compute the expected total regret as follows:

where is the expected total regret of packet k and is a constant with respect to T . Denote by the error probability of the exploration of packet k. In Lemma 8, we prove that if the exploration phase succeeded and the number of quantization bits b (k) for the CSMA delay is large enough, then the auction phase is guaranteed to converge to the optimal solution of (5) for any . This optimal allocation is played in the exploitation phase, which adds no additional regret to the total regret. We prove in Lemma 5 that if (6) holds then, . Hence, we obtain for a large enough k such that b (k)

is sufficiently large that

for some constant . We conclude that

where in (a) we used the fact that completing the last packet to be a full packet only increases . In (b) we used T > , which yields .

and we assume that . In the perturbed assignment problem, an optimal assignment

performs at least as well as

Any non optimal assignment a performs at most as well as

where is an assignment with the second best objective. For any two assignments with a different sum of QoS we

We conclude that for any non optimal a

where (a) follows from (12) and (13), (b) from (14) and (c) holds for .

. Let be the indicator that is equal to one if only link n chose channel i at time slot t. Also define

Denote by E the event in which there exists a link n that has for some channel i. We have

where (a) follows by taking the union bound over all links and channels and (b) from using Hoeffding’s inequality for bounded

variables [53]. Since the exploration phase consists of uniform and independent arm choices we have

Therefore

where (a) follows from the union bound, (b) from Hoeffding’s inequality for Bernoulli random variables and (c) since

where (a) follows from (17) and (19). We choose and

bounded by

Hence

Denote by the allocation that the auction phase converges to. If , then Theorem 1 in [15] guarantees that

which, due to (24), is only possible if

where (a) follows from Lemma 4 since we assume that the k-th exploration phase succeeded.

REFERENCES

[1] I. Katzela and M. Naghshineh, “Channel assignment schemes for cellular mobile telecommunication systems: A comprehensive survey,” IEEE Communications Surveys Tutorials, vol. 3, no. 2, pp. 10–31, Second 2000.

[2] S. Chieochan, E. Hossain, and J. Diamond, “Channel assignment schemes for infrastructure-based 802.11 WLANs: A survey,” IEEE Communications Surveys Tutorials, vol. 12, no. 1, pp. 124–136, First 2010.

[3] G. Ku and J. M. Walsh, “Resource allocation and link adaptation in LTE and LTE advanced: A tutorial,” IEEE Communications Surveys Tutorials, vol. 17, no. 3, pp. 1605–1633, thirdquarter 2015.

[4] E. Z. Tragos, S. Zeadally, A. G. Fragkiadakis, and V. A. Siris, “Spectrum assignment in cognitive radio networks: A comprehensive survey,” IEEE Communications Surveys Tutorials, vol. 15, no. 3, pp. 1108–1135, Third 2013.

[5] M. E. Tanab and W. Hamouda, “Resource allocation for underlay cognitive radio networks: A survey,” IEEE Communications Surveys Tutorials, vol. 19, no. 2, pp. 1249–1276, Secondquarter 2017.

[6] C. Y. Wong, R. S. Cheng, K. B. Lataief, and R. D. Murch, “Multiuser OFDM with adaptive subcarrier, bit, and power allocation,” IEEE Journal on Selected Areas in Communications, vol. 17, no. 10, pp. 1747–1758, Oct 1999.

[7] Z. Shen, J. G. Andrews, and B. L. Evans, “Adaptive resource allocation in multiuser OFDM systems with proportional rate constraints,” IEEE Transactions on Wireless Communications, vol. 4, no. 6, pp. 2726–2737, Nov 2005.

[8] S. Sadr, A. Anpalagan, and K. Raahemifar, “Radio resource allocation algorithms for the downlink of multiuser OFDM communication systems,” IEEE Communications Surveys Tutorials, vol. 11, no. 3, pp. 92–106, rd 2009.

[9] S. Huberman, C. Leung, and T. Le-Ngoc, “Dynamic spectrum management (DSM) algorithms for multi-user xDSL,” IEEE Communications Surveys Tutorials, vol. 14, no. 1, pp. 109–130, First 2012.

[10] L. Gao and S. Cui, “Efficient subcarrier, power, and rate allocation with fairness consideration for OFDMA uplink,” IEEE Transactions on Wireless Communications, vol. 7, no. 5, pp. 1507–1511, May 2008.

[11] J. Huang, V. G. Subramanian, R. Agrawal, and R. Berry, “Joint scheduling and resource allocation in uplink OFDM systems for broadband wireless access networks,” IEEE Journal on Selected Areas in Communications, vol. 27, no. 2, pp. 226–234, February 2009.

[12] E. Yaacoub and Z. Dawy, “A survey on uplink resource allocation in OFDMA wireless networks,” IEEE Communications Surveys Tutorials, vol. 14, no. 2, pp. 322–337, Second 2012.

[13] C. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algorithms and Complexity. Dover, 1998.

[14] D. P. Bertsekas, “The auction algorithm: A distributed relaxation method for the assignment problem,” Annals of Operations Research, vol. 14, no. 1, pp. 105–123, Dec 1988. [Online]. Available: https://doi.org/10.1007/BF02186476

[15] O. Naparstek and A. Leshem, “Fully distributed optimal channel assignment for open spectrum access,” IEEE Transactions on Signal Processing, vol. 62, no. 2, pp. 283–294, Jan 2014.

[16] U. Challita, L. Dong, and W. Saad, “Proactive resource management in LTE-U systems: A deep learning perspective,” arXiv preprint arXiv:1702.07031, June 2017.

[17] S. Wang, H. Liu, P. H. Gomes, and B. Krishnamachari, “Deep reinforcement learning for dynamic multichannel access in wireless networks,” IEEE Transactions on Cognitive Communications and Networking, vol. 4, no. 2, pp. 257–265, June 2018.

[18] O. Naparstek and K. Cohen, “Deep multi-user reinforcement learning for distributed dynamic spectrum access,” arXiv preprint arXiv:1704.02613, 2017.

[19] N. Nayyar, D. Kalathil, and R. Jain, “On regret-optimal learning in decentralized multi-player multi-armed bandits,” IEEE Transactions on Control of Network Systems, vol. 5, no. 1, pp. 597–606, 2018.

[20] O. Avner and S. Mannor, “Concurrent bandits and cognitive radio networks,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2014, pp. 66–81.

[21] A. Anandkumar, N. Michael, A. K. Tang, and A. Swami, “Distributed algorithms for learning and cognitive medium access with logarithmic regret,” IEEE Journal on Selected Areas in Communications, vol. 29, no. 4, pp. 731–745, 2011.

[22] K. Liu and Q. Zhao, “Cooperative game in dynamic spectrum access with unknown model and imperfect sensing,” IEEE Transactions on Wireless Communications, vol. 11, no. 4, pp. 1596–1604, April 2012.

[23] ——, “Indexability of restless bandit problems and optimality of whittle index for dynamic multichannel access,” IEEE Transactions on Information Theory, vol. 56, no. 11, pp. 5547–5567, Nov 2010.

[24] S. Vakili, K. Liu, and Q. Zhao, “Deterministic sequencing of exploration and exploitation for multi-armed bandit problems,” IEEE Journal of Selected Topics in Signal Processing, vol. 7, no. 5, pp. 759–767, 2013.

[25] J. Rosenski, O. Shamir, and L. Szlak, “Multi-player bandits–a musical chairs approach,” in International Conference on Machine Learning, 2016, pp. 155–163.

[26] K. Liu and Q. Zhao, “Distributed learning in multi-armed bandit with multiple players,” IEEE Transactions on Signal Processing, vol. 58, no. 11, pp. 5667–5681, 2010.

[27] D. Kalathil, N. Nayyar, and R. Jain, “Decentralized learning for multiplayer multiarmed bandits,” IEEE Transactions on Information Theory, vol. 60, no. 4, pp. 2331–2345, 2014.

[28] I. Bistritz and A. Leshem, “Game of thrones: Fully distributed learning for multi-player bandits,” arXiv:1810.11162.

[29] ——, “Distributed multi-player bandits-a game of thrones approach,” in Advances in Neural Information Processing Systems, 2018, pp. 7222–7232.

[30] H. Kwon, S. Kim, and B. G. Lee, “Opportunistic multi-channel CSMA protocol for OFDMA systems,” IEEE Transactions on Wireless Communications, vol. 9, no. 5, pp. 1552–1557, May 2010.

[31] Y. Yaffe, A. Leshem, and E. Zehavi, “Stable matching for channel access control in cognitive radio systems,” in 2010 2nd International Workshop on Cognitive Information Processing, June 2010, pp. 470–475.

[32] A. Leshem, E. Zehavi, and Y. Yaffe, “Multichannel opportunistic carrier sensing for stable channel access control in cognitive radio systems,” IEEE Journal on Selected Areas in Communications, vol. 30, no. 1, pp. 82–95, January 2012.

[33] D. Gale and L.Shapley, “College admissions and the stability of marriage,” The American Mathematical Monthly, vol. 69, no. 1, pp. 9–15, 1962.

[34] O. Naparstek and A. Leshem, “Bounds on the expected optimal channel assignment in Rayleigh channels,” in 2012 IEEE 13th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), June 2012, pp. 294–298.

[35] R. Mochaourab, B. Holfeld, and T. Wirth, “Distributed channel assignment in cognitive radio networks: Stable matching and Walrasian equilibrium,” IEEE Transactions on Wireless Communications, vol. 14, no. 7, pp. 3924–3936, July 2015.

[36] I. Bistritz and A. Leshem, “Game theoretic dynamic channel allocation for frequency-selective interference channels,” IEEE Transactions on Information Theory, 2018, DOI: 10.1109/TIT.2018.2868440.

[37] Y. Xiao, K. C. Chen, C. Yuen, Z. Han, and L. A. DaSilva, “A bayesian overlapping coalition formation game for device-to-device spectrum sharing in cellular networks,” IEEE Transactions on Wireless Communications, vol. 14, no. 7, pp. 4034–4051, July 2015.

[38] E. Zehavi and A. Leshem, “Bargaining solution for partial orthogonal transmission over frequency selective interference channel,” in 2011 IEEE International Symposium on Information Theory Proceedings, July 2011, pp. 2701–2705.

[39] Z. Han, Z. Ji, and K. J. R. Liu, “Fair multiuser channel allocation for OFDMA networks using Nash bargaining solutions and coalitions,” IEEE Transactions on Communications, vol. 53, no. 8, pp. 1366–1376, Aug 2005.

[40] A. Leshem and E. Zehavi, “Smart carrier sensing for distributed computation of the generalized nash bargaining solution,” in 2011 17th International Conference on Digital Signal Processing (DSP), July 2011, pp. 1–5.

[41] K. Cohen, A. Leshem, and E. Zehavi, “Game theoretic aspects of the multi-channel ALOHA protocol in cognitive radio networks,” IEEE Journal on Selected Areas in Communications, vol. 31, no. 11, pp. 2276–2288, November 2013.

[42] K. Cohen and A. Leshem, “Distributed throughput maximization for multi-channel ALOHA networks,” in 2013 5th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), Dec 2013, pp. 456–459.

[43] J. Sun, E. Modiano, and L. Zheng, “Wireless channel allocation using an auction algorithm,” IEEE Journal on Selected Areas in Communications, vol. 24, no. 5, pp. 1085–1096, May 2006.

[44] Z. Han, R. Zheng, and H. V. Poor, “Repeated auctions with Bayesian nonparametric learning for spectrum access in cognitive radio networks,” IEEE Transactions on Wireless Communications, vol. 10, no. 3, pp. 890–900, March 2011.

[45] A. Mukherjee and H. M. Kwon, “General auction-theoretic strategies for distributed partner selection in cooperative wireless networks,” IEEE Transactions on Communications, vol. 58, no. 10, pp. 2903–2915, October 2010.

[46] H. B. Chang and K. C. Chen, “Auction-based spectrum management of cognitive radio networks,” IEEE Transactions on Vehicular Technology, vol. 59, no. 4, pp. 1923–1935, May 2010.

[47] K. Yang, N. Prasad, and X. Wang, “An auction approach to resource allocation in uplink OFDMA systems,” IEEE Transactions on Signal Processing, vol. 57, no. 11, pp. 4482–4496, Nov 2009.

[48] M. Bayati, B. Prabhakar, D. Shah, and M. Sharma, “Iterative scheduling algorithms,” in IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications, May 2007, pp. 445–453.

[49] M. Bayati, D. Shah, and M. Sharma, “Max-product for maximum weight matching: Convergence, correctness, and LP duality,” IEEE Transactions on Information Theory, vol. 54, no. 3, pp. 1241–1251, March 2008.

[50] O. Naparstek and A. Leshem, “A fast matching algorithm for asymptotically optimal distributed channel assignment,” in 2013 18th International Conference on Digital Signal Processing (DSP), July 2013, pp. 1–6.

[51] J. Meredith, “Study on channel model for frequency spectrum above 6 ghz,” 3GPP TR 38.900, Jun, Tech. Rep., 2016.

[52] S. Sun, T. S. Rappaport, S. Rangan, T. A. Thomas, A. Ghosh, I. Z. Kovacs, I. Rodriguez, O. Koymen, A. Partyka, and J. Jarvelainen, “Propagation path loss models for 5g urban micro-and macro-cellular scenarios,” in Vehicular Technology Conference (VTC Spring), 2016 IEEE 83rd. IEEE, 2016, pp. 1–6.

[53] W. Hoeffding, “Probability inequalities for sums of bounded random variables,” Journal of the American Statistical Association, vol. 58, no. 301, pp. 13–30, 1963.

designed for accessibility and to further open science