Remote Contextual Bandits

We consider a remote contextual multi-armed bandit (CMAB) problem, in which the decision-maker observes the context and the reward, but must communicate the actions to be taken by the agents over a rate-limited communication channel. This can model, for example, a personalized ad placement application, where the content owner observes the individual visitors to its website, and hence has the context information, but must convey the ads that must be shown to each visitor to a separate entity that manages the marketing content. In this remote CMAB (R-CMAB) problem, the constraint on the communication rate between the decision-maker and the agents imposes a trade-off between the number of bits sent per agent and the acquired average reward. We are particularly interested in characterizing the rate required to achieve sub-linear regret. Consequently, this can be considered as a policy compression problem, where the distortion metric is induced by the learning objectives. We first study the fundamental information theoretic limits of this problem by letting the number of agents go to infinity, and study the regret achieved when Thompson sampling strategy is adopted. In particular, we identify two distinct rate regions resulting in linear and sub-linear regret behavior, respectively. Then, we provide upper bounds for the achievable regret when the decision-maker can reliably transmit the policy without distortion.

Remote Contextual Bandits / Pase, F.; Gunduz, D.; Zorzi, M.. - 2022-:(2022), pp. 1665-1670. (Intervento presentato al convegno 2022 IEEE International Symposium on Information Theory, ISIT 2022 tenutosi a fin nel 2022) [10.1109/ISIT50566.2022.9834399].

Remote Contextual Bandits

Pase F.;Gunduz D.;Zorzi M.

2022

Abstract

We consider a remote contextual multi-armed bandit (CMAB) problem, in which the decision-maker observes the context and the reward, but must communicate the actions to be taken by the agents over a rate-limited communication channel. This can model, for example, a personalized ad placement application, where the content owner observes the individual visitors to its website, and hence has the context information, but must convey the ads that must be shown to each visitor to a separate entity that manages the marketing content. In this remote CMAB (R-CMAB) problem, the constraint on the communication rate between the decision-maker and the agents imposes a trade-off between the number of bits sent per agent and the acquired average reward. We are particularly interested in characterizing the rate required to achieve sub-linear regret. Consequently, this can be considered as a policy compression problem, where the distortion metric is induced by the learning objectives. We first study the fundamental information theoretic limits of this problem by letting the number of agents go to infinity, and study the regret achieved when Thompson sampling strategy is adopted. In particular, we identify two distinct rate regions resulting in linear and sub-linear regret behavior, respectively. Then, we provide upper bounds for the achievable regret when the decision-maker can reliably transmit the policy without distortion.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2022
			
	Titolo del Convegno
	
				2022 IEEE International Symposium on Information Theory, ISIT 2022
			
	Luogo del Convegno
	
				fin
			
	Data del Convegno
	
				2022
			
	Codice DOI
	
				https://dx.doi.org/10.1109/ISIT50566.2022.9834399
			
	Codice Scopus
	
				2-s2.0-85136247127
			
	N° del Volume
	
				2022-
			
	Pagina iniziale
	
				1665
			
	Pagina finale
	
				1670
			
	Tutti gli autori
	
						Pase, F.; Gunduz, D.; Zorzi, M.
					
	Citazione
	
				Remote Contextual Bandits / Pase, F.; Gunduz, D.; Zorzi, M.. - 2022-:(2022), pp. 1665-1670. (Intervento presentato al  convegno 2022 IEEE International Symposium on Information Theory, ISIT 2022 tenutosi a fin nel 2022) [10.1109/ISIT50566.2022.9834399].
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

File	Dimensione	Formato
Remote_Contextual_Bandits.pdf Accesso riservato Tipologia: VOR - Versione pubblicata dall'editore Dimensione 1.07 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.07 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
2202.05182.pdf Open access Tipologia: AO - Versione originale dell'autore proposta per la pubblicazione Dimensione 359.06 kB Formato Adobe PDF Visualizza/Apri	359.06 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1286033

Citazioni

ND

3

ND

social impact