1b:["$","$L29",null,{"isWhiteLabelled":false,"children":["$","$Lb",null,{"pt":{"compact":0,"expanded":3},"children":[["$","$L2a",null,{"noStar":true,"publisher":true,"task":true,"params":true,"size":"xl","product":{"id":"eyJwYXBlcklEIjoiMjAwMS4wNTQ5NyIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","updated":"2020-01-15T19:00:00.000Z","paperID":"2001.05497","published":"2020-01-15T19:00:00.000Z","authors":"[\"Max Hopkins\",\"Daniel Kane\",\"Shachar Lovett\",\"Gaurav Mahajan\"]","title":"Noise-tolerant, Reliable Active Classification with Comparison Queries","scoreTrending":null,"summary":"With the explosion of massive, widely available unlabeled data in the past\nyears, finding label and time efficient, robust learning algorithms has become\never more important in theory and in practice. We study the paradigm of active\nlearning, in which algorithms with access to large pools of data may adaptively\nchoose what samples to label in the hope of exponentially increasing\nefficiency. By introducing comparisons, an additional type of query comparing\ntwo points, we provide the first time and query efficient algorithms for\nlearning non-homogeneous linear separators robust to bounded (Massart) noise.\nWe further provide algorithms for a generalization of the popular Tsybakov low\nnoise condition, and show how comparisons provide a strong reliability\nguarantee that is often impractical or impossible with only labels - returning\na classifier that makes no errors with high probability.","lastCheckedForCode":"2022-09-04T19:36:05.475Z","links":[{"id":"eyJ1cmwiOiJodHRwczovL3BhcGVyc3dpdGhjb2RlLmNvbS9wYXBlci9ub2lzZS10b2xlcmFudC1yZWxpYWJsZS1hY3RpdmUtY2xhc3NpZmljYXRpb24ifQ==","type":"pwc","url":"https://paperswithcode.com/paper/noise-tolerant-reliable-active-classification","data":null}],"reposConnection":{"edges":[]},"models":[],"tags":[{"id":"eyJuYW1lIjoiYWN0aXZlIGxlYXJuaW5nIiwidHlwZSI6InRhc2sifQ==","name":"active learning","description":"In active learning, the model queries the user for labels on specific data points it finds difficult to classify. This method is used when labeled data is scarce or expensive to obtain, allowing the model to learn effectively with fewer labeled examples.","scoreTrending":null,"count":{"stars":5418,"papers":2595,"models":1881},"__typename":"Tag"}],"summaries":[],"emailsConnection":{"edges":[{"author":"max hopkins","node":{"id":"eyJhZGRyZXNzIjoibm1ob3BraW5AZW5nLnVjc2QuZWR1In0=","address":"nmhopkin@eng.ucsd.edu","name":null,"avatar":null,"linkedin":null,"bio":null,"site":null,"override":null,"membership":[],"paper":[{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}}],"github":[],"scholar":[{"thirdPartyID":"4MjpulEAAAAJ"}],"twitter":[],"location":[],"owner":[{"id":"eyJ1aWQiOiJkOWJhZTFmMi0xZmNhLTRkYmItOGM2Ny03MGJiYmUzNmViMmEifQ==","name":"max hopkins","github":[],"email":[],"authored":[{"id":"eyJwYXBlcklEIjoiMTkwNy4wMzgxNiIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"1907.03816"},{"id":"eyJwYXBlcklEIjoiMTcwOS4wMDY0OSIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"1709.00649"},{"id":"eyJwYXBlcklEIjoiMjAwMS4wNTQ5NyIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2001.05497"},{"id":"eyJwYXBlcklEIjoiMjExMS4wNDc0NiIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2111.04746"},{"id":"eyJwYXBlcklEIjoiMjAwNC4xMTM4MCIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2004.11380"},{"id":"eyJwYXBlcklEIjoiMjEwMi4wNTA0NyIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2102.05047"},{"id":"eyJwYXBlcklEIjoiMjIwMS4wOTQzMyIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2201.09433"},{"id":"eyJwYXBlcklEIjoiMjMwMi4wNjI4NSIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2302.06285"},{"id":"eyJwYXBlcklEIjoiMTkxMC4wODA3NyIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"1910.08077"},{"id":"eyJwYXBlcklEIjoiNTM4NjgiLCJwdWJsaXNoZXIiOiJuZXVyaXBzIn0=","publisher":"neurips","paperID":"53868"}]}]}},{"author":"daniel kane","node":{"id":"eyJhZGRyZXNzIjoiZGFrYW5lQGVuZy51Y3NkLmVkdSJ9","address":"dakane@eng.ucsd.edu","name":"Daniel Kane","avatar":"https://img.fullcontact.com/static/ab42ed791ee97bd60dc1b719ddee5457_1db86a10ac89629591430a80f222a265fbe074ff58d56c980b1741f468e52f04","linkedin":"https://www.linkedin.com/in/danielkane","bio":null,"site":"http://jacobsschoolofengineering.blogspot.com/","override":null,"membership":[{"name":"UC San Diego Jacobs School of Engineering"}],"paper":[{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}}],"github":[],"scholar":[{"thirdPartyID":"DulpV-cAAAAJ"}],"twitter":[],"location":[],"owner":[{"id":"eyJ1aWQiOiI1M2M4NjgyYi0zNjhhLTQ2ZTktYjU5ZC02ODM0NDlmYWIzNjYifQ==","name":"daniel m kane","github":[],"email":[{"avatar":"https://img.fullcontact.com/static/ab42ed791ee97bd60dc1b719ddee5457_1db86a10ac89629591430a80f222a265fbe074ff58d56c980b1741f468e52f04"}],"authored":[{"id":"eyJwYXBlcklEIjoiMTcwNC4wMzU2NCIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"1704.03564"},{"id":"eyJwYXBlcklEIjoiMTkxMS4wNzk3MSIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"1911.07971"},{"id":"eyJwYXBlcklEIjoiMTYwNi4wNzM4NCIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"1606.07384"},{"id":"eyJwYXBlcklEIjoiMjAxMi4wMjExOSIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2012.02119"},{"id":"eyJwYXBlcklEIjoiMjAwNS4wNjQxNyIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2005.06417"},{"id":"eyJwYXBlcklEIjoiMTcwNS4wMTcyMCIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"1705.01720"},{"id":"eyJwYXBlcklEIjoiMTcxMS4wNTg5MyIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"1711.05893"},{"id":"eyJwYXBlcklEIjoiMTkwNy4wMzgxNiIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"1907.03816"},{"id":"eyJwYXBlcklEIjoiMjEwOC4wODc2NyIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2108.08767"},{"id":"eyJwYXBlcklEIjoiMTkwMi4wNTg3NiIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"1902.05876"},{"id":"eyJwYXBlcklEIjoiMjMwMi4wNjUxMiIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2302.06512"},{"id":"eyJwYXBlcklEIjoiMjIwNy4xNDI2NiIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2207.14266"},{"id":"eyJwYXBlcklEIjoiMjAwMS4wNTQ5NyIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2001.05497"},{"id":"eyJwYXBlcklEIjoiMjAxMC4wMTcwNSIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2010.01705"},{"id":"eyJwYXBlcklEIjoiMTkxMS4wODA4NSIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"1911.08085"},{"id":"eyJwYXBlcklEIjoiMTkwMi4wNDcyOCIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"1902.04728"},{"id":"eyJwYXBlcklEIjoiMjExMS4wNDc0NiIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2111.04746"},{"id":"eyJwYXBlcklEIjoiMjMxMi4xNjYxNiIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2312.16616"},{"id":"eyJwYXBlcklEIjoiMjEwNi4wNzc3OSIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2106.07779"},{"id":"eyJwYXBlcklEIjoiMjIwMi4wNTQ0NCIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2202.05444"},{"id":"eyJwYXBlcklEIjoiMjAwNC4xMTM4MCIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2004.11380"},{"id":"eyJwYXBlcklEIjoiMjQwNi4wMjYyOCIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2406.02628"},{"id":"eyJwYXBlcklEIjoiMjEwMi4wNTA0NyIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2102.05047"},{"id":"eyJwYXBlcklEIjoiMjEwMi4wNTYyOSIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2102.05629"},{"id":"eyJwYXBlcklEIjoiMjMwMy4wNTQ4NSIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2303.05485"},{"id":"eyJwYXBlcklEIjoiMjMwNi4xNjM1MiIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2306.16352"},{"id":"eyJwYXBlcklEIjoiMjMwMi4wNjI4NSIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2302.06285"},{"id":"eyJwYXBlcklEIjoiMjMxMS4xMzE1NCIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2311.13154"},{"id":"eyJwYXBlcklEIjoiMjMwMi4xMjk0MCIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2302.12940"},{"id":"eyJwYXBlcklEIjoiMjMwNy4wODQzOCIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2307.08438"},{"id":"eyJwYXBlcklEIjoiMjMwOC4wMDA4OSIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2308.00089"},{"id":"eyJwYXBlcklEIjoiNTI5MDMiLCJwdWJsaXNoZXIiOiJuZXVyaXBzIn0=","publisher":"neurips","paperID":"52903"},{"id":"eyJwYXBlcklEIjoiMjMxMC4xNTkzMiIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2310.15932"},{"id":"eyJwYXBlcklEIjoiNzEyNzkiLCJwdWJsaXNoZXIiOiJuZXVyaXBzIn0=","publisher":"neurips","paperID":"71279"},{"id":"eyJwYXBlcklEIjoiNzA1NTYiLCJwdWJsaXNoZXIiOiJuZXVyaXBzIn0=","publisher":"neurips","paperID":"70556"},{"id":"eyJwYXBlcklEIjoiMjQwNC4wMDUyOSIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2404.00529"},{"id":"eyJwYXBlcklEIjoiMjQwOC4xNzE2NSIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2408.17165"}]}]}},{"author":"shachar lovett","node":{"id":"eyJhZGRyZXNzIjoic2xvdmV0dEBjcy51Y3NkLmVkdSJ9","address":"slovett@cs.ucsd.edu","name":null,"avatar":null,"linkedin":null,"bio":null,"site":null,"override":null,"membership":[],"paper":[{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}}],"github":[],"scholar":[{"thirdPartyID":"f6JF7BkAAAAJ"}],"twitter":[],"location":[],"owner":[{"id":"eyJ1aWQiOiJhNThhMjA3OC1jODFlLTQzOGItYjUzOS03N2Y2YTI0ZmNjYWYifQ==","name":"shachar lovett","github":[],"email":[],"authored":[{"id":"eyJwYXBlcklEIjoiMTcwNC4wMzU2NCIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"1704.03564"},{"id":"eyJwYXBlcklEIjoiMjEwMy4xMDg5NyIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2103.10897"},{"id":"eyJwYXBlcklEIjoiMTcwNS4wMTcyMCIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"1705.01720"},{"id":"eyJwYXBlcklEIjoiMTkwNy4wMzgxNiIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"1907.03816"},{"id":"eyJwYXBlcklEIjoiMjAwMS4wNTQ5NyIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2001.05497"},{"id":"eyJwYXBlcklEIjoiMjExMS4wNDc0NiIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2111.04746"},{"id":"eyJwYXBlcklEIjoiMjIwMi4wNTQ0NCIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2202.05444"},{"id":"eyJwYXBlcklEIjoiMjAwNC4xMTM4MCIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2004.11380"},{"id":"eyJwYXBlcklEIjoiMjEwMi4wNTA0NyIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2102.05047"},{"id":"eyJwYXBlcklEIjoiMjMwMi4wNjI4NSIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2302.06285"},{"id":"eyJwYXBlcklEIjoiMjMwMi4xMjk0MCIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2302.12940"}]}]}},{"author":"gaurav mahajan","node":{"id":"eyJhZGRyZXNzIjoiZ21haGFqYW5AZW5nLnVjc2QuZWR1In0=","address":"gmahajan@eng.ucsd.edu","name":null,"avatar":null,"linkedin":null,"bio":null,"site":null,"override":null,"membership":[],"paper":[{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}}],"github":[{"avatar":"https://avatars.githubusercontent.com/u/30613119?v=4","username":"gomahajan"}],"scholar":[{"thirdPartyID":"3kvq284AAAAJ"}],"twitter":[],"location":[],"owner":[{"id":"eyJ1aWQiOiJjNjNlMGRiMS05OWUzLTQ1NWMtYWMzYS1kMjAzZWYzZGUxOTEifQ==","name":"gaurav mahajan","github":[],"email":[],"authored":[{"id":"eyJwYXBlcklEIjoiMTkwOC4wMDI2MSIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"1908.00261"},{"id":"eyJwYXBlcklEIjoiMjEwMy4xMDg5NyIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2103.10897"},{"id":"eyJwYXBlcklEIjoiMjAwMi4wNzEyNSIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2002.07125"},{"id":"eyJwYXBlcklEIjoiMjAwMS4wNTQ5NyIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2001.05497"},{"id":"eyJwYXBlcklEIjoiMjExMS4wNDc0NiIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2111.04746"},{"id":"eyJwYXBlcklEIjoiMjIwMi4wNTQ0NCIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2202.05444"},{"id":"eyJwYXBlcklEIjoiMjAwNC4xMTM4MCIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2004.11380"},{"id":"eyJwYXBlcklEIjoiMjMwMi4wNjI4NSIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2302.06285"},{"id":"eyJwYXBlcklEIjoiMjIwMi4xMDY0MCIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2202.10640"},{"id":"eyJwYXBlcklEIjoiMjIwMS4wMzgwNiIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2201.03806"},{"id":"eyJwYXBlcklEIjoiMjMwMi4xMjk0MCIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2302.12940"}]}]}}]},"__typename":"paper","authorArray":["Max Hopkins","Daniel Kane","Shachar Lovett","Gaurav Mahajan"]}}],["$","$L18",null,{"container":true,"columns":100,"spacing":{"compact":0,"expanded":2,"large":3},"children":[["$","$L18",null,{"size":{"compact":100,"expanded":100,"large":68},"children":[["$","$7",null,{"children":["$","$L2b",null,{"publisher":"arxiv","paperID":"2001.05497","product":{"paper":"$1b:props:children:props:children:0:props:product","models":"$1b:props:children:props:children:0:props:product:models"},"isWhiteLabelled":false}]}],["$","$7",null,{"children":["$","$L2c",null,{"article":"$L2d","model":"$undefined"}]}]]}],["$","$L18",null,{"size":"grow","children":["$","$L2e",null,{}]}]]}],["$","$7",null,{"children":null}],[["$","audio",null,{"id":"tts"}],["$","$L2f",null,{"paperID":"2001.05497","publisher":"arxiv","paperJSON":{"title":"Noise-tolerant, Reliable Active Classification with Comparison Queries","paperID":"2001.05497","avgLineHeight":12.01,"imgScale":4,"sections":[{"heading":"Abstract","paragraphs":[[{"text":"With the explosion of massive, widely available unlabeled data in the past years, finding label and time efficient, robust learning algorithms has become ever more important in theory and in practice. We study the paradigm of active learning, in which algorithms with access to large pools of data may adaptively choose what samples to label in the hope of exponentially increasing efficiency. By introducing comparisons, an additional type of query comparing two points, we provide the first time and query efficient algorithms for learning non-homogeneous linear separators robust to bounded (Massart) noise. We further provide algorithms for a generalization of the popular Tsybakov low noise condition, and show how comparisons provide a strong reliability guarantee that is often impractical or impossible with only labels - returning a classifier that makes no errors with high probability.","element":"span"}]]},{"heading":"1 Introduction","paragraphs":[[{"text":"Due to the now ubiquitous presence of massive unlabeled datasets, recent years have seen an explosion in the search for computationally efficient, noise tolerant learning strategies that minimize the required amount of labeled data to learn a classifier. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Active learning ","element":"span"},{"text":"is a formalization of the PAC-learning paradigm for unlabeled data. In active learning, the learning algorithm has access both to either a stream or pool of unlabeled data, and an oracle which can label the data on request. The complexity of learning certain classes is then defined by their query complexity, the number of oracle calls required to almost learn the classifier with high probability. The goal in active learning is to adaptively choose data to send to the oracle in such a way that one uses much fewer queries than in the labeled case.","element":"span"}],[{"text":"While active learning saw initial success in the noise-free regime with simple concept classes such as thresholds in one dimension, lower bounds [","element":"span"},{"href":"#id-0","referenceIndex":1,"text":"1","element":"a"},{"text":"] soon showed that important classes such as linear separators gave no improvement over PAC-learning, even in only two dimensions. However, subsequent work showed that slight tweaks to the model could overcome this barrier. Balcan and Long [","element":"span"},{"href":"#id-1","referenceIndex":2,"text":"2","element":"a"},{"text":"] showed that by assuming that the data was drawn from a log-concave distribution – a wide set of distributions including Gaussian distributions and uniform distributions over convex sets, learning homogeneous (through the origin) linear separators could be done in exponentially fewer queries than in the PAC model. Later, Balcan and Zhang [","element":"span"},{"href":"#id-2","referenceIndex":3,"text":"3","element":"a"},{"text":"] extended this to the more general class of s-concave distributions, a generalization of log-concavity that includes fat-tailed distributions as well. Rather than restricting the power of the adversary, Kane, Lovett, Moran, and Zhang [","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"4","element":"a"},{"text":"] studied the effect on query complexity of empowering the learner. By allowing the learner to ask more complicated questions of the oracle, such as comparing two points, Kane et al. [","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"4","element":"a"},{"text":"] showed that non-homogeneous linear separators in two-dimensions can be learned in exponentially fewer labeled samples than the PAC case. Later, Kane, Lovett, and Moran [","element":"span"},{"href":"#id-4","referenceIndex":5,"text":"5","element":"a"},{"text":"] extended this to higher dimensions using a complicated set of queries, and Hopkins, Kane, and Lovett [","element":"span"},{"href":"#id-5","referenceIndex":6,"text":"6","element":"a"},{"text":"] did the same by assuming weak concentration and anti-concentration on the distribution – conditions once again satisfied by s-concave distributions.","element":"span"}],[{"text":"While query efficient algorithms in high dimensions are an important step towards the use of active learning on real world data, it is equally important that algorithms be computationally efficient and noise tolerant. In an early work, Castro and Nowak [","element":"span"},{"href":"#id-6","referenceIndex":7,"text":"7","element":"a"},{"text":"] provided query efficient algorithms for thresholding in one dimension in the presence of bounded (Massart [","element":"span"},{"href":"#id-7","referenceIndex":8,"text":"8","element":"a"},{"text":"]) and unbounded (Tsybakov [","element":"span"},{"href":"#id-8","referenceIndex":9,"text":"9","element":"a"},{"text":"]) noise under the uniform distribution on ","element":"span"},{"text":"[0","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"1]","element":"span"},{"text":". Soon after, Balcan, Broder, and Zhang [","element":"span"},{"href":"#id-9","referenceIndex":10,"text":"10","element":"a"},{"text":"] extended these results to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":"-dimensional homogeneous hyperplanes over a uniform distribution on a ball. Years later, Hanneke [","element":"span"},{"href":"#id-10","referenceIndex":11,"text":"11","element":"a"},{"text":"] offered a more general analysis for Tsybakov noise based off of the distributional complexity measure the disagreement coefficient, and later with Yang provided a distribution-free analysis [","element":"span"},{"href":"#id-11","referenceIndex":12,"text":"12","element":"a"},{"text":"]. In another vein of work, Balcan and Long [","element":"span"},{"href":"#id-1","referenceIndex":2,"text":"2","element":"a"},{"text":"] provided an algorithm for learning ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":"-dimensional homogeneous hyperplanes over nearly isotropic log-concave distributions with optimal query complexity for Tsybakov noise [","element":"span"},{"href":"#id-12","referenceIndex":13,"text":"13","element":"a"},{"text":"], a result which was later extended by Awasthi, Balcan, Haghtalab and Urner [","element":"span"},{"href":"#id-13","referenceIndex":14,"text":"14","element":"a"},{"text":"] to be computationally efficient for Massart noise when the distribution is restricted to uniform over the unit ball. Similarly, Balcan and Zhang [","element":"span"},{"href":"#id-2","referenceIndex":3,"text":"3","element":"a"},{"text":"] gave a computationally efficient algorithm for learning the more difficult adversarial noise model over s-concave distributions. Concurrently, Xu et al. [","element":"span"},{"href":"#id-14","referenceIndex":15,"text":"15","element":"a"},{"text":"] proposed using comparison queries as a sub-routine in previous algorithms to deal with noise in a computationally efficient manner, improving the overall query complexity along the way.","element":"span"}],[{"text":"The comparison based methods of Xu et al. [","element":"span"},{"href":"#id-14","referenceIndex":15,"text":"15","element":"a"},{"text":"], however, do not carry over to the algorithmic technique proposed by Kane et al. [","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"4","element":"a"},{"text":"] for learning non-homogeneous linear separators. Kane et al.’s technique is based upon logical inference. Viewing concept classes as the sign of an underlying family of functions, they build a learner via a linear program with constraints given by query solutions. As a result, the learners created by Kane et al.’s method actually fall into a stronger model than PAC-learning called Reliably and Probably Useful (RPU)-learning [","element":"span"},{"href":"#id-15","referenceIndex":16,"text":"16","element":"a"},{"text":"], variants of which have been studied more recently under a variety of names (e.g. KWIK learning [","element":"span"},{"href":"#id-16","referenceIndex":17,"text":"17","element":"a"},{"text":"], perfect selective classification [","element":"span"},{"href":"#id-17","referenceIndex":18,"text":"18","element":"a"},{"text":"], or confident learning [","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"4","element":"a"},{"text":"]). In this model, the learner is not allowed to err, but may instead output “I don’t know” a small fraction of the time. While Kane et al.’s RPU-learner is computationally efficient, it is not tolerant to noise – the linear program is sensitive to errors in both labels and comparisons. This raises a natural question: can the inference based algorithms of Kane et al. be extended to noisy scenarios, and if so, does a strong reliability guarantee remain? In this work we answer these questions in the positive for Massart and Tsybakov noise. In both cases our algorithms satisfy a noisy version of RPU-learning: with high probability the learner makes no errors at all. Due to their similarity to RPU-learners, we call learners that satisfy this property ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Almost Reliable and Probably Useful ","element":"span"},{"text":"(ARPU). Indeed, taking the limit of our reliability condition returns exactly the RPU model. Our work provides the first query and computationally efficient algorithm for PAC or ARPU-learning non-homogeneous linear separators in the presence of Massart noise over s-concave distributions, as well as more generally for hypothesis classes with finite inference dimension or small average inference dimension. In addition, we provide the first algorithm for ARPU-learning non-homogeneous linear separators under the Tsybakov Low Noise Condition.","element":"span"}],[{"text":"Similar to how Xu et al. ","element":"span"},{"text":"[","element":"span"},{"href":"#id-14","referenceIndex":15,"text":"15","element":"a"},{"text":"] use comparisons as a subroutine for correcting label errors, we use an approximate sorting scheme (modified from a seminal work from Braverman and Mossel [","element":"span"},{"href":"#id-18","referenceIndex":19,"text":"19","element":"a"},{"text":"] on sorting with noisy comparisons) to create a small set of points whose labels ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"text":"comparisons are correct with high probability. We then feed this cleaned set into an inference LP, and repeat the process in a boosting style algorithm based off of the framework of [","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"4","element":"a"},{"text":"]. By carefully curating the cleaned set at each step, we are able to use a symmetry argument from [","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"4","element":"a"},{"text":"] to prove that our learners have good coverage, while the guarantees of [","element":"span"},{"href":"#id-18","referenceIndex":19,"text":"19","element":"a"},{"text":"] and the inference framework give reliability.","element":"span"}],[{"text":"Our algorithms require the use of comparison queries, an addition which we show is necessary in many cases for active PAC and ARPU-learning. Along with recalling lower bounds from [","element":"span"},{"href":"#id-5","referenceIndex":6,"text":"6","element":"a"},{"text":"] which show comparisons are necessary for efficiently active learning non-homogeneous hyperplanes, we show that in the noiseless case it is impossible to ARPU-learn the uniform distribution over ","element":"span"},{"style":{"height":13.38},"width":42.73,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/2-0.png","element":"img","alt":" S1","inline":true,"padRight":true},{"text":"in a finite number of label queries. Further, even with the addition of a margin assumption we show the existence of simple distributions which require a number of label queries that is exponential in dimension. Because Massart noise and certain instantiations of Tsybakov noise subsume the noiseless case, these results prove the existence of a large gap between labels and comparisons for noisy ARPU-learning.","element":"span"}],[{"text":"Our paper proceeds as follows. ","element":"span"},{"text":"In Sections ","element":"span"},{"href":"#id-19","text":"1.1, ","element":"a"},{"href":"#id-20","text":"1.2, ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-21","text":"1.3 ","element":"a"},{"text":"we cover preliminaries, our main results, and our main techniques respectively. In Section ","element":"span"},{"text":"2 ","element":"span"},{"text":"we present query and computationally efficient algorithms for ARPU-learning hypothesis classes with finite inference dimension or super exponential average inference dimension under the Massart noise model, as well as a lower bound for ARPU-learning ","element":"span"},{"style":{"height":13.38},"width":42.74,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/2-1.png","element":"img","alt":" S1 ","inline":true,"padRight":true},{"text":"using only labels. In Section ","element":"span"},{"text":"3 ","element":"span"},{"text":"we present algorithms for ARPU-learning linear separators with margin and finite inference dimension or over distributions with weak distributional conditions under the Tsybakov Low Noise Condition, as well as a lower bound for ARPU-learning a corresponding distribution with margin using only labels","element":"span"}],[{"id":"id-19","style":{"fontWeight":"bold"},"text":"1.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Preliminaries","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"1.1.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Basic definitions","element":"span"}],[{"text":"A hypothesis class is a pair ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":")","element":"span"},{"text":", where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"is a set, and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H ","element":"span"},{"text":"is a class of functions ","element":"span"},{"style":{"height":11.2},"width":178.88,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/2-2.png","element":"img","alt":" h: X → R","inline":true},{"text":". Each function ","element":"span"},{"style":{"height":12},"width":116.52,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/2-3.png","element":"img","alt":"h ∈ H","inline":true,"padRight":true},{"text":"is called a ","element":"span"},{"style":{"fontStyle":"italic"},"text":"hypothesis","element":"span"},{"text":". We refer to ","element":"span"},{"style":{"height":16},"width":432.12,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/2-4.png","element":"img","alt":" CH = {sign(h): h ∈ H}","inline":true,"padRight":true},{"text":"as the associated ","element":"span"},{"style":{"fontStyle":"italic"},"text":"concept ","element":"span"},{"text":"class. For example, when ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H ","element":"span"},{"text":"is the class of ","element":"span"},{"style":{"height":13.79},"width":138.38,"height":34.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/2-5.png","element":"img","alt":" Rd → R","inline":true,"padRight":true},{"text":"affine functions, then the associated concept class ","element":"span"},{"style":{"height":13.99},"width":55.48,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/2-6.png","element":"img","alt":" CH","inline":true,"padRight":true},{"text":"is the class of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":"-dimensional half-spaces.","element":"span"}],[{"text":"We consider the binary classification problem, where we want to predict the binary label ","element":"span"},{"style":{"fontStyle":"italic"},"text":"y ","element":"span"},{"text":"for each instance ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":". We assume access to an underlying unknown distribution ","element":"span"},{"style":{"height":13.19},"width":59.99,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/2-7.png","element":"img","alt":" DX","inline":true,"padRight":true},{"text":"over ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"and a label oracle ","element":"span"},{"style":{"height":14},"width":53.5,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/2-8.png","element":"img","alt":" QL","inline":true},{"text":". Querying ","element":"span"},{"style":{"height":14},"width":53.5,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/2-9.png","element":"img","alt":" QL","inline":true,"padRight":true},{"text":"with unlabeled ","element":"span"},{"style":{"height":11.6},"width":106.56,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/2-10.png","element":"img","alt":" x ∈ X","inline":true,"padRight":true},{"text":"generates a label ","element":"span"},{"style":{"height":16},"width":109.57,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/2-11.png","element":"img","alt":" QL(x)","inline":true},{"text":", drawn from unknown distribution ","element":"span"},{"style":{"height":16},"width":209.08,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/2-12.png","element":"img","alt":" P(QL(x)|x).","inline":true,"padRight":true},{"text":"Note that querying ","element":"span"},{"style":{"height":14},"width":53.5,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/2-13.png","element":"img","alt":" QL","inline":true,"padRight":true},{"text":"on the same point again would generate the same answer. We use notation ","element":"span"},{"style":{"height":13.19},"width":54.99,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/2-14.png","element":"img","alt":" DL","inline":true,"padRight":true},{"text":"to denote the joint distribution over examples ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"and labels from ","element":"span"},{"style":{"height":14},"width":66.36,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/2-15.png","element":"img","alt":" QL:","inline":true}],[{"style":{"width":"33%"},"width":619,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/2-16.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"1.1.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"PAC-Learning","element":"span"}],[{"text":"Probably Approximately Correct (PAC) learning is a probabilistic framework due to Valiant [","element":"span"},{"href":"#id-22","referenceIndex":20,"text":"20","element":"a"},{"text":"] and Vapnik and Chervonenkis [","element":"span"},{"href":"#id-23","referenceIndex":21,"text":"21","element":"a"},{"text":"] for learning adversarially chosen classifiers and input distributions. In this model, given a set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"and a set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H ","element":"span"},{"text":"of hypotheses ","element":"span"},{"style":{"height":11.2},"width":183.28,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-0.png","element":"img","alt":" h : X → R","inline":true},{"text":", an adversary first chooses distribution ","element":"span"},{"style":{"height":13.19},"width":271.05,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-1.png","element":"img","alt":" DL over X × Y","inline":true,"padRight":true},{"text":"with the marginal distribution ","element":"span"},{"style":{"height":16},"width":855.77,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-2.png","element":"img","alt":" DX over X. If Y = sign(h⋆(X)) for some h⋆ ∈ H","inline":true},{"text":", we call this realizable case learning. With no knowledge of the choice of distribution the learner draws labeled samples from ","element":"span"},{"style":{"height":13.19},"width":55,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-3.png","element":"img","alt":" DL","inline":true,"padRight":true},{"text":"with the goal of outputting ","element":"span"},{"style":{"fontStyle":"italic"},"text":"c ","element":"span"},{"text":"= sign(","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":") ","element":"span"},{"text":"for some hypothesis ","element":"span"},{"style":{"height":12},"width":105.68,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-4.png","element":"img","alt":" h ∈ H","inline":true,"padRight":true},{"text":"which minimizes loss over ","element":"span"},{"style":{"height":13.19},"width":67.85,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-5.png","element":"img","alt":" DL:","inline":true}],[{"style":{"width":"26%"},"width":502,"height":50,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-6.png","element":"img"}],[{"text":"In the realizable case, a hypothesis class ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":") ","element":"span"},{"text":"is called PAC-learnable if ","element":"span"},{"style":{"height":14.8},"width":77.44,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-7.png","element":"img","alt":" ∀ε, δ","inline":true},{"text":", there exists a learner ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":", where no matter the choice of the adversary, outputs a concept ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":") ","element":"span"},{"text":"such that:","element":"span"}],[{"style":{"width":"24%"},"width":463,"height":66,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-8.png","element":"img"}],[{"text":"Here ","element":"span"},{"style":{"height":16},"width":187.98,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-9.png","element":"img","alt":" n = n(ε, δ)","inline":true,"padRight":true},{"text":"is called the sample complexity, and must be ","element":"span"},{"style":{"height":19.37},"width":366.86,"height":48.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-10.png","element":"img","alt":" poly( 1ε, 1δ ) for (X, H)","inline":true,"padRight":true},{"text":"to be PAC-learnable.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"1.1.3 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"RPU-Learning","element":"span"}],[{"text":"Reliable and Probably Useful (RPU) learning is an alternative learning framework in which the learner is not allowed to make errors, but may instead respond “I don’t know”, notated by “","element":"span"},{"style":{"height":10.8},"width":31,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-11.png","element":"img","alt":"⊥","inline":true},{"text":"”. Introduced by Rivest and Sloan [","element":"span"},{"href":"#id-15","referenceIndex":16,"text":"16","element":"a"},{"text":"], RPU learning was later studied under the name of Perfect Selective Classification by El-Yaniv and Weiner [","element":"span"},{"href":"#id-17","referenceIndex":18,"text":"18","element":"a"},{"text":"], and confident learning by Kane, Lovett, Moran, and Zhang [","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"4","element":"a"},{"text":"]. Since it is easy to make a reliable learner by simply always outputting “","element":"span"},{"style":{"height":10.8},"width":31,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-12.png","element":"img","alt":"⊥","inline":true},{"text":"”, our learner must be useful, and with high probability cannot output “","element":"span"},{"style":{"height":10.8},"width":31,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-13.png","element":"img","alt":"⊥","inline":true},{"text":"” more than a small fraction of the time. Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"be a reliable learner and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":") ","element":"span"},{"text":"be the concept returned by the learner ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"on training sample ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":", then we define the loss of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":") ","element":"span"},{"text":"as the measure of unlearned samples:","element":"span"}],[{"style":{"width":"35%"},"width":673,"height":50,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-14.png","element":"img"}],[{"text":"We will commonly refer to ","element":"span"},{"style":{"height":16},"width":688.5,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-15.png","element":"img","alt":" 1 − LDL(A(S)) as the coverage of A(S)","inline":true},{"text":". Sample complexity and learnability are then defined analogously to PAC-learning. Note that any point which is not labeled “","element":"span"},{"style":{"height":10.8},"width":31,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-16.png","element":"img","alt":"⊥","inline":true},{"text":"” by an RPU-learner is labeled correctly.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"1.1.4 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Comparison Queries","element":"span"}],[{"text":"Following the framework of [","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"4","element":"a"},{"text":"], our learner will have access to more information than just the label of a point. We focus on one particularly natural additional query, the ability to compare points. A comparison query measures the relative distance of two points to the decision boundary. In other words, say that our goal is to identify photographs of diseased vs healthy patients. A comparison query asks: “which patient looks healthier?”. Formally, given an underlying function ","element":"span"},{"style":{"height":12},"width":105.68,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-17.png","element":"img","alt":" h ∈ H","inline":true,"padRight":true},{"text":"and two points ","element":"span"},{"style":{"height":10},"width":97.14,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-18.png","element":"img","alt":" x1, x2","inline":true},{"text":", a comparison query asks which one of ","element":"span"},{"style":{"height":16},"width":207.43,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-19.png","element":"img","alt":" h(x1), h(x2)","inline":true,"padRight":true},{"text":"is bigger. Equivalently:","element":"span"}],[{"style":{"width":"22%"},"width":427,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-20.png","element":"img"}],[{"text":"Similar to our label oracle ","element":"span"},{"style":{"height":14},"width":53.5,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-21.png","element":"img","alt":" QL","inline":true},{"text":", we define a comparison oracle ","element":"span"},{"style":{"height":14.4},"width":315.02,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-22.png","element":"img","alt":" QC. Querying QC","inline":true,"padRight":true},{"text":"with two points ","element":"span"},{"style":{"height":14},"width":182.81,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-23.png","element":"img","alt":" x1, x2 ∈ X","inline":true,"padRight":true},{"text":"generates a comparison result ","element":"span"},{"style":{"height":15.6},"width":187.51,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-24.png","element":"img","alt":" QC(x1, x2)","inline":true},{"text":", which is drawn from an unknown distribution ","element":"span"},{"style":{"height":16},"width":363.52,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-25.png","element":"img","alt":" P(QC(x1, x2)|x1, x2).","inline":true,"padRight":true},{"text":"Along with their added theoretical power [","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"4","element":"a"},{"text":", ","element":"span"},{"href":"#id-5","referenceIndex":6,"text":"6","element":"a"},{"text":"], comparison queries are already used in practice in recommender systems ","element":"span"},{"href":"#id-24","referenceIndex":22,"text":"[22] ","element":"a"},{"text":"and ranking systems ","element":"span"},{"href":"#id-18","referenceIndex":19,"text":"[19]","element":"a"},{"text":", and in some scenarios have better accuracy than label queries ","element":"span"},{"href":"#id-14","referenceIndex":15,"text":"[15]","element":"a"},{"text":".","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"1.1.5 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Inference Dimension","element":"span"}],[{"text":"Inference dimension is a combinatorial complexity measure introduced by Kane et al. [","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"4","element":"a"},{"text":"] to characterize the query complexity in active learning when the learner is allowed to ask a more complicated set of questions. Given a set of binary queries ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"text":", let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":") ","element":"span"},{"text":"denote the answers to all such queries on the sample ","element":"span"},{"style":{"height":13.2},"width":240.08,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-26.png","element":"img","alt":" S. Let S ⊆ X","inline":true,"padRight":true},{"text":"be an unlabeled sample. For ","element":"span"},{"style":{"height":14.4},"width":373.2,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-27.png","element":"img","alt":" x ∈ X and h ∈ H, let","inline":true}],[{"style":{"width":"11%"},"width":218,"height":57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/3-28.png","element":"img"}],[{"text":"denote the statement that answers to binary queries from ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q ","element":"span"},{"text":"on the sample ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"determine the label of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":", when the learned concept is ","element":"span"},{"text":"sign(","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":"))","element":"span"},{"text":", corresponding to an hypothesis ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":". We will often say for shorthand that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"“infers” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":", and sometimes drop the underlying classifier ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":". In this case the underlying function is assumed to be the Bayes optimal classifier. Inference dimension with respect to some query set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q ","element":"span"},{"text":"is defined as follows.","element":"span"}],[{"id":"id-29","style":{"fontWeight":"bold"},"text":"Definition 1.1 ","element":"span"},{"text":"(Inference dimension)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The inference dimension of ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is the minimal number ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"style":{"fontStyle":"italic"},"text":"such that for every ","element":"span"},{"style":{"height":14.4},"width":275.48,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-0.png","element":"img","alt":" S ⊆ X of size k","inline":true},{"style":{"fontStyle":"italic"},"text":", and every ","element":"span"},{"style":{"height":12},"width":105.69,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-1.png","element":"img","alt":" h ∈ H","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"there exists ","element":"span"},{"style":{"height":12},"width":272.09,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-2.png","element":"img","alt":" x ∈ S such that","inline":true}],[{"style":{"width":"17%"},"width":328,"height":57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-3.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"If no such ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"style":{"fontStyle":"italic"},"text":"exists then the inference dimension of ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is defined as ","element":"span"},{"style":{"height":7.2},"width":51.85,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-4.png","element":"img","alt":" ∞.","inline":true}],[{"text":"Inference dimension is a worst case measure. Since we will be dealing with varying levels of distribution dependence, we will also take advantage of an average case version of inference dimension introduced in ","element":"span"},{"href":"#id-5","referenceIndex":6,"text":"[6]","element":"a"},{"text":".","element":"span"}],[{"id":"id-34","style":{"fontWeight":"bold"},"text":"Definition 1.2 ","element":"span"},{"text":"(Average Inference Dimension)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"We say ","element":"span"},{"style":{"height":16},"width":198.28,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-5.png","element":"img","alt":" (DX, X, H)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"has average inference dimension ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")","element":"span"},{"style":{"fontStyle":"italic"},"text":", if:","element":"span"}],[{"style":{"width":"51%"},"width":966,"height":57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-6.png","element":"img"}],[{"text":"Average inference dimension is used to prove that the inference dimension of a finite sample drawn from ","element":"span"},{"style":{"height":13.19},"width":59.99,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-7.png","element":"img","alt":"DX","inline":true,"padRight":true},{"text":"cannot be too large with high probability. This allows us to build query efficient algorithms for hypothesis class with infinite inference dimension by proving that large finite samples do not take too many queries to learn with high probability.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"1.1.6 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Noisy Learning","element":"span"}],[{"text":"Before we discuss our relaxation of RPU-learning, we formalize the presence of noise in our distributions. Given a hypothesis class ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":")","element":"span"},{"text":", we assume the Bayes optimal classifier is some hypothesis ","element":"span"},{"style":{"height":12},"width":129.19,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-8.png","element":"img","alt":" h⋆ ∈ H","inline":true,"padRight":true},{"text":"with decision boundary ","element":"span"},{"style":{"height":15.6},"width":166.96,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-9.png","element":"img","alt":" h⋆(x) = 0","inline":true},{"text":". Note that ","element":"span"},{"style":{"height":10.8},"width":36.96,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-10.png","element":"img","alt":" h⋆ ","inline":true,"padRight":true},{"text":"itself can have non-zero error. To measure the noise in our model we define the conditional probability distributions ","element":"span"},{"style":{"height":14.4},"width":197.49,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-11.png","element":"img","alt":" βL and βC:","inline":true}],[{"style":{"width":"54%"},"width":1018,"height":101,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-12.png","element":"img"}],[{"text":"Note that for all the noise models discussed below, querying ","element":"span"},{"style":{"height":14},"width":53.5,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-13.png","element":"img","alt":" QL","inline":true,"padRight":true},{"text":"on the same point again (and similarly querying ","element":"span"},{"style":{"height":14},"width":55.5,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-14.png","element":"img","alt":" QC","inline":true,"padRight":true},{"text":"with the same pair of points again) would generate the same answer. This is a realistic model for the case where the oracle is a human expert who may err with some probability across different inputs, but will always return the same answer on the same input.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Massart Noise ","element":"span"},{"text":"Massart, or bounded noise, is a well studied model of noise throughout statistics and learning theory [","element":"span"},{"href":"#id-1","referenceIndex":2,"text":"2","element":"a"},{"text":", ","element":"span"},{"href":"#id-7","referenceIndex":8,"text":"8","element":"a"},{"text":", ","element":"span"},{"href":"#id-14","referenceIndex":15,"text":"15","element":"a"},{"text":"]. Massart noise is a tractable and realistic generalization of the standard random classification noise model [","element":"span"},{"href":"#id-25","referenceIndex":23,"text":"23","element":"a"},{"text":"], where the oracle flips its response with probability ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p < ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"2","element":"span"},{"text":". Similar to [","element":"span"},{"href":"#id-13","referenceIndex":14,"text":"14","element":"a"},{"text":", ","element":"span"},{"href":"#id-14","referenceIndex":15,"text":"15","element":"a"},{"text":"], we say “noisy” oracles ","element":"span"},{"style":{"height":14},"width":53.51,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-15.png","element":"img","alt":" QL","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":14},"width":55.51,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-16.png","element":"img","alt":" QC","inline":true,"padRight":true},{"text":"satisfy Massart noise with parameter ","element":"span"},{"style":{"height":11.6},"width":96.38,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-17.png","element":"img","alt":" λ > 0","inline":true,"padRight":true},{"text":"if the conditional label and comparison distributions are such that","element":"span"}],[{"style":{"width":"34%"},"width":645,"height":178,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-18.png","element":"img"}],[{"text":"Equivalently, we say that ","element":"span"},{"style":{"height":16},"width":237.76,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-19.png","element":"img","alt":" QL (resp. QC","inline":true},{"text":") satisfies Massart noise with parameter ","element":"span"},{"style":{"height":10.8},"width":23,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-20.png","element":"img","alt":" λ","inline":true},{"text":", if an adversary constructs ","element":"span"},{"style":{"height":14},"width":53.5,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-21.png","element":"img","alt":"QL","inline":true,"padRight":true},{"text":"(resp. ","element":"span"},{"style":{"height":14},"width":55.5,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-22.png","element":"img","alt":" QC","inline":true},{"text":") by first taking the “clean” oracle ","element":"span"},{"style":{"height":16.83},"width":53.5,"height":42.07,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-23.png","element":"img","alt":"¯QL","inline":true,"padRight":true},{"text":"(resp. ","element":"span"},{"style":{"height":16.83},"width":55.51,"height":42.07,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-24.png","element":"img","alt":"¯QC","inline":true},{"text":") and then flipping the result of the oracle with probability at most ","element":"span"},{"style":{"height":19.37},"width":103.62,"height":48.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/4-25.png","element":"img","alt":"12 − λ.","inline":true}],[{"style":{"fontWeight":"bold"},"text":"Tsybakov Low Noise Condition ","element":"span"},{"text":"Massart error is restrictive in that the distributions ","element":"span"},{"style":{"height":14.4},"width":44.54,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-0.png","element":"img","alt":" βL","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":14.4},"width":46.54,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-1.png","element":"img","alt":" βC","inline":true,"padRight":true},{"text":"are bounded away from ","element":"span"},{"style":{"height":19.37},"width":16,"height":48.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-2.png","element":"img","alt":"12","inline":true,"padRight":true},{"text":"– in reality, this may not be the case as examples approach the decision boundary. ","element":"span"},{"text":"The Tsybakov Low Noise Condition (TNC) [","element":"span"},{"href":"#id-8","referenceIndex":9,"text":"9","element":"a"},{"text":"] offers an alternative: the closer an example is to the decision boundary, the closer its error to ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"2","element":"span"},{"text":". There is a natural extension of this intuition to comparison queries as well: comparisons made between arbitrarily close points should be arbitrarily noisy. A number of variants of TNC have been studied in the literature. Here we will follow the variant studied in [","element":"span"},{"href":"#id-6","referenceIndex":7,"text":"7","element":"a"},{"text":", ","element":"span"},{"href":"#id-26","referenceIndex":24,"text":"24","element":"a"},{"text":"]. Let ","element":"span"},{"style":{"height":10.8},"width":36.96,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-3.png","element":"img","alt":" h⋆","inline":true,"padRight":true},{"text":"be the Bayes optimal classifier. We say ","element":"span"},{"style":{"height":14},"width":53.5,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-4.png","element":"img","alt":" QL","inline":true,"padRight":true},{"text":"satisfies the Tsybakov Low Noise Condition with parameters, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m < M,","element":"span"}],[{"style":{"width":"85%"},"width":1597,"height":245,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-5.png","element":"img"}],[{"text":"In other words, far away from the decision boundary ","element":"span"},{"style":{"height":16},"width":100.98,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-6.png","element":"img","alt":" βL(x)","inline":true,"padRight":true},{"text":"satisfies Massart noise, but approaches ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"2 ","element":"span"},{"text":"at a polynomial rate as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"approaches the decision boundary. Similarly, ","element":"span"},{"style":{"height":14},"width":55.5,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-7.png","element":"img","alt":" QC","inline":true,"padRight":true},{"text":"satisfies the Tsybakov Low Noise Condition with parameters, ","element":"span"},{"style":{"height":16},"width":993.62,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-8.png","element":"img","alt":" m < M, ε0 > 0, and κ ≥ 1 (TNC(m, M, κ, ε0)) if ∀x1, x2:","inline":true}],[{"style":{"width":"92%"},"width":1740,"height":177,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-9.png","element":"img"}],[{"text":"Similar to the label case, ","element":"span"},{"style":{"height":15.6},"width":178.54,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-10.png","element":"img","alt":" βC(x1, x2)","inline":true,"padRight":true},{"text":"satisfies Massart noise for pairs of points ","element":"span"},{"style":{"height":10},"width":97.14,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-11.png","element":"img","alt":" x1, x2","inline":true,"padRight":true},{"text":"which differ greatly with respect to ","element":"span"},{"style":{"height":10.99},"width":38.96,"height":27.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-12.png","element":"img","alt":" h∗ ","inline":true,"padRight":true},{"text":"and approaches 1/2 at a polynomial rate as ","element":"span"},{"style":{"height":16},"width":275.07,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-13.png","element":"img","alt":" h∗(x1) − h∗(x2)","inline":true,"padRight":true},{"text":"approaches ","element":"span"},{"text":"0","element":"span"},{"text":".","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Generalized Tsybakov Low Noise Condition ","element":"span"},{"text":"The Tsybakov Low Noise Condition upper and lower bounds correctness by a particular function of distance. We will consider the direct generalization of this model where these bounds are replaced with arbitrary monotone increasing functions ","element":"span"},{"style":{"height":16},"width":459.01,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-14.png","element":"img","alt":" gL ≤ gU : [0, ε0] → [0, 1/2]","inline":true},{"text":". We say ","element":"span"},{"style":{"height":14},"width":53.5,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-15.png","element":"img","alt":" QL","inline":true,"padRight":true},{"text":"satisfies the Generalized Tsybakov Low Noise Condition with parameters ","element":"span"},{"style":{"height":16},"width":297.16,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-16.png","element":"img","alt":" (gL, gU, ε0) if ∀x:","inline":true}],[{"style":{"width":"84%"},"width":1581,"height":177,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-17.png","element":"img"}],[{"text":"Similarly, we say ","element":"span"},{"style":{"height":14},"width":55.5,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-18.png","element":"img","alt":" QC","inline":true,"padRight":true},{"text":"satisfies the Generalized Tsybakov Low Noise Condition with parameters ","element":"span"},{"style":{"height":16},"width":230.26,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-19.png","element":"img","alt":" (gL, gU, ε0) if","inline":true}],[{"id":"id-56","style":{"width":"99%"},"width":1867,"height":242,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-20.png","element":"img"}],[{"text":"For notational convenience, we will sometimes write ","element":"span"},{"style":{"height":16},"width":264.02,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-21.png","element":"img","alt":" gL(x) = gL(ε0)","inline":true,"padRight":true},{"text":"for ","element":"span"},{"style":{"height":11.19},"width":112.38,"height":27.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-22.png","element":"img","alt":" x > ε0","inline":true},{"text":". In addition, since we will often need to compose ","element":"span"},{"style":{"height":19.1},"width":194.96,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-23.png","element":"img","alt":" gL and g−1U ","inline":true,"padRight":true},{"text":", we will use the simplified notation:","element":"span"}],[{"id":"id-73","style":{"width":"20%"},"width":380,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-24.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"c ","element":"span"},{"text":"is some constant.","element":"span"}],[{"id":"id-28","style":{"fontWeight":"bold"},"text":"1.1.7 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"ARPU-Learning","element":"span"}],[{"text":"RPU learning suffers from an inability to deal with noise. We introduce the learning framework ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Almost Reliable and Probably Useful Learning ","element":"span"},{"text":"(ARPU-Learning), a relaxation of RPU-learning that allows for noise, but keeps stronger reliability guarantees than PAC-learning. Recall that given a distribution ","element":"span"},{"style":{"height":13.6},"width":280.38,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-25.png","element":"img","alt":" DL over X × Y ,","inline":true,"padRight":true},{"text":"for a reliable learner ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"and sample ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":", we define the loss of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":") ","element":"span"},{"text":"as the measure of unlearned samples:","element":"span"}],[{"style":{"width":"35%"},"width":673,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/5-26.png","element":"img"}],[{"text":"We will commonly refer to ","element":"span"},{"style":{"height":16},"width":265.01,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-0.png","element":"img","alt":" 1 − LDL(A(S))","inline":true,"padRight":true},{"text":"as the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"coverage ","element":"span"},{"text":"of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":")","element":"span"},{"text":". A model is a pair ","element":"span"},{"style":{"height":16},"width":143.05,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-1.png","element":"img","alt":" (Q, DX)","inline":true,"padRight":true},{"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q ","element":"span"},{"text":"is a set of oracles ","element":"span"},{"style":{"height":16},"width":163.18,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-2.png","element":"img","alt":" (QL, QC)","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":13.19},"width":57.74,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-3.png","element":"img","alt":" DX","inline":true,"padRight":true},{"text":"is a set of distributions over ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":". In ARPU-Learning, given a hypothesis class ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":") ","element":"span"},{"text":"and a model ","element":"span"},{"style":{"height":16},"width":142.57,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-4.png","element":"img","alt":" (Q, DX)","inline":true},{"text":", an adversary chooses a distribution ","element":"span"},{"style":{"height":13.59},"width":227.09,"height":33.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-5.png","element":"img","alt":" DX from DX","inline":true,"padRight":true},{"text":"and the “noisy” oracles ","element":"span"},{"style":{"height":16},"width":302.83,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-6.png","element":"img","alt":"(QL, QC) from Q","inline":true},{"text":", which induces a distribution ","element":"span"},{"style":{"height":18.03},"width":442.66,"height":45.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-7.png","element":"img","alt":"˜DL over X × Y given by:","inline":true}],[{"style":{"width":"33%"},"width":631,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-8.png","element":"img"}],[{"id":"id-30","style":{"fontWeight":"bold"},"text":"Definition 1.3 ","element":"span"},{"text":"(ARPU-Learnable)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"We say that a hypothesis class ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is ARPU-learnable under model ","element":"span"},{"style":{"height":16},"width":418.41,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-9.png","element":"img","alt":"(Q, DX) if ∀δr, δu, ε > 0","inline":true},{"style":{"fontStyle":"italic"},"text":", there exists a learner ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"style":{"fontStyle":"italic"},"text":"which is","element":"span"}],[{"id":"id-27","style":{"width":"97%"},"width":1818,"height":468,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-10.png","element":"img"}],[{"text":"Note that in both Equations ","element":"span"},{"href":"#id-27","text":"(5) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-27","text":"(6) ","element":"a"},{"text":"the probability is over the randomness of the algorithm, sample ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":", and noisy oracles ","element":"span"},{"style":{"height":14},"width":135.06,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-11.png","element":"img","alt":" QL, QC","inline":true,"padRight":true},{"text":"chosen by the adversary. Also, in comparison to PAC learning, all point which are not labeled “","element":"span"},{"style":{"height":10.8},"width":31,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-12.png","element":"img","alt":"⊥","inline":true},{"text":"” by an ARPU-learner are labeled correctly with high probability and setting ","element":"span"},{"style":{"height":13.99},"width":108.62,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-13.png","element":"img","alt":" δr = 0","inline":true,"padRight":true},{"text":"reduces exactly to RPU learning. Sample complexity and learnability are then defined equivalently to PAC-learning. Finally, we will refer to learners that satisfy condition ","element":"span"},{"href":"#id-27","text":"(5) ","element":"a"},{"text":"as ","element":"span"},{"style":{"height":13.99},"width":36.71,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-14.png","element":"img","alt":" δu","inline":true},{"text":"-useful, and learners that satisfy condition ","element":"span"},{"href":"#id-27","text":"(6) ","element":"a"},{"text":"as ","element":"span"},{"style":{"height":13.99},"width":32.71,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-15.png","element":"img","alt":" δr","inline":true},{"text":"-reliable. While the logical inference technique previously used to build RPU learners [","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"4","element":"a"},{"text":", ","element":"span"},{"href":"#id-5","referenceIndex":6,"text":"6","element":"a"},{"text":"] are very sensitive to noise, we show in later sections how to modify those techniques to build ARPU-learners.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"1.1.8 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Passive vs Active learning","element":"span"}],[{"text":"PAC-learning traditionally is applied to supervised learning, where the learning algorithm receives pre-labeled samples. We call this paradigm passive learning. In contrast, active learning refers to the case where the learner receives unlabeled samples and may adaptively query a labeling or comparison oracle. Similar to the passive case, for active learning we study the query complexity as the minimum number of queries to learn some pair ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":") ","element":"span"},{"text":"in either the PAC, RPU or ARPU-learning model. In general, passive learners learn concept classes up to error ","element":"span"},{"style":{"height":7.2},"width":19,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-16.png","element":"img","alt":" ε","inline":true,"padRight":true},{"text":"in ","element":"span"},{"style":{"height":16},"width":122.26,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-17.png","element":"img","alt":" Θ(1/ε)","inline":true,"padRight":true},{"text":"samples. We add to a long line of work [","element":"span"},{"href":"#id-1","referenceIndex":2,"text":"2","element":"a"},{"text":"–","element":"span"},{"href":"#id-4","referenceIndex":5,"text":"5","element":"a"},{"text":", ","element":"span"},{"href":"#id-6","referenceIndex":7,"text":"7","element":"a"},{"text":", ","element":"span"},{"href":"#id-13","referenceIndex":14,"text":"14","element":"a"},{"text":"] showing that active learning can achieve such learning in only polylog","element":"span"},{"style":{"height":16},"width":89.93,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-18.png","element":"img","alt":"(1/ε)","inline":true,"padRight":true},{"text":"queries on important concept classes.","element":"span"}],[{"id":"id-20","style":{"fontWeight":"bold"},"text":"1.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Our Results","element":"span"}],[{"text":"In this work, we study ARPU-learning (Section ","element":"span"},{"href":"#id-28","text":"1.1.7) ","element":"a"},{"text":"under two widely studied noise models: Massart Noise and the Generalized Tsybakov Low Noise Condition.","element":"span"}],[{"id":"id-32","style":{"fontWeight":"bold"},"text":"1.2.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Notation","element":"span"}],[{"text":"We use notation where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"is the instance space, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H ","element":"span"},{"text":"is the set of hypothesis from ","element":"span"},{"style":{"height":14},"width":210.51,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-19.png","element":"img","alt":" X → R, Hd","inline":true,"padRight":true},{"text":"is the class of linear separators in ","element":"span"},{"style":{"height":13.38},"width":45.78,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-20.png","element":"img","alt":" Rd","inline":true,"padRight":true},{"text":"(corresponding to affine functions ","element":"span"},{"style":{"height":13.78},"width":207.88,"height":34.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-21.png","element":"img","alt":" h : Rd → R","inline":true},{"text":"), and ","element":"span"},{"style":{"height":15.59},"width":77.18,"height":38.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-22.png","element":"img","alt":" Hd,γ","inline":true,"padRight":true},{"text":"is the class of linear separators in ","element":"span"},{"style":{"height":13.38},"width":45.78,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-23.png","element":"img","alt":" Rd ","inline":true,"padRight":true},{"text":"with margin ","element":"span"},{"style":{"height":14.8},"width":163.52,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-24.png","element":"img","alt":" γ from X","inline":true},{"text":". Since previous work [","element":"span"},{"href":"#id-1","referenceIndex":2,"text":"2","element":"a"},{"text":", ","element":"span"},{"href":"#id-13","referenceIndex":14,"text":"14","element":"a"},{"text":"] refers to the class of homogeneous linear separators as simply “linear separators,” we will often refer to ","element":"span"},{"style":{"height":13.19},"width":50.12,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-25.png","element":"img","alt":" Hd","inline":true,"padRight":true},{"text":"as “non-homogeneous linear separators” to differentiate our results. For noise models, ","element":"span"},{"style":{"height":16},"width":98.06,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-26.png","element":"img","alt":" M(λ)","inline":true,"padRight":true},{"text":"is the set of all oracles which satisfy Massart noise with parameter ","element":"span"},{"style":{"height":16},"width":358.75,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-27.png","element":"img","alt":" λ, GTNC(gL, gU, ε0)","inline":true,"padRight":true},{"text":"is the set of all oracles which satisfy the Generalized Tsybakov Low Noise Condition with parameters ","element":"span"},{"style":{"height":15.6},"width":595.79,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-28.png","element":"img","alt":" (gL, gU, ε0), and TNC(m, M, κ, ε0)","inline":true,"padRight":true},{"text":"is the set of all oracles satisfying the Tsybakov Low Noise Condition with parameters (","element":"span"},{"style":{"height":14},"width":186.46,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-29.png","element":"img","alt":"m, M, κ, ε0","inline":true},{"text":"). A model is a pair ","element":"span"},{"style":{"height":16},"width":142.97,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-30.png","element":"img","alt":" (Q, DX)","inline":true,"padRight":true},{"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q ","element":"span"},{"text":"is a set of oracles ","element":"span"},{"style":{"height":16},"width":310.86,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-31.png","element":"img","alt":"(QL, QC) and DX","inline":true,"padRight":true},{"text":"is a set of distributions over ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":". For distributions over instance space ","element":"span"},{"style":{"height":16.58},"width":156.59,"height":41.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-32.png","element":"img","alt":" X or Rd,","inline":true}],[{"text":"1. ","element":"span"},{"style":{"height":13.19},"width":47.98,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/6-33.png","element":"img","alt":" CX","inline":true,"padRight":true},{"text":"is the class of all continuous distributions over ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":",","element":"span"}],[{"text":"2. ","element":"span"},{"style":{"height":13.19},"width":67.79,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-0.png","element":"img","alt":" LCd","inline":true,"padRight":true},{"text":"is the class of all log-concave distribution on ","element":"span"},{"style":{"height":16.59},"width":58.36,"height":41.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-1.png","element":"img","alt":" Rd,","inline":true}],[{"text":"3. ","element":"span"},{"style":{"height":13.19},"width":67.43,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-2.png","element":"img","alt":" SCd","inline":true,"padRight":true},{"text":"is the class of all s-concave distributions on ","element":"span"},{"style":{"height":20.97},"width":317.82,"height":52.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-3.png","element":"img","alt":" Rd for s ≥ − 12d+3,","inline":true}],[{"text":"4. ","element":"span"},{"style":{"height":13.19},"width":92.07,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-4.png","element":"img","alt":" ISCd","inline":true,"padRight":true},{"text":"is the class of all isotropic s-concave distributions on ","element":"span"},{"style":{"height":20.97},"width":317.82,"height":52.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-5.png","element":"img","alt":" Rd for s ≥ − 12d+3,","inline":true}],[{"text":"5. ","element":"span"},{"style":{"height":16.39},"width":171.99,"height":40.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-6.png","element":"img","alt":" ACCd,c1,c2","inline":true,"padRight":true},{"text":"is the class of all continuous distributions ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D ","element":"span"},{"text":"which satisfy the following concentration and anti-concentration inequalities:","element":"span"}],[{"style":{"width":"60%"},"width":1127,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-7.png","element":"img"}],[{"text":"6. For hypothesis class ","element":"span"},{"style":{"height":17.68},"width":357.59,"height":44.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-8.png","element":"img","alt":" (X, H), A(X,H),a,f(d)","inline":true,"padRight":true},{"text":"is the class of all continuous distributions ","element":"span"},{"style":{"height":13.19},"width":59.99,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-9.png","element":"img","alt":" DX","inline":true,"padRight":true},{"text":"over ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"such that ","element":"span"},{"style":{"height":16},"width":198.18,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-10.png","element":"img","alt":" (DX, X, H)","inline":true,"padRight":true},{"text":"has average inference dimension ","element":"span"},{"style":{"height":26.07},"width":326.29,"height":65.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-11.png","element":"img","alt":" g(n) ≤ 2−Ω�n1+af(d)�.","inline":true}],[{"text":"We will call an algorithm sample (respectively time) efficient if it uses ","element":"span"},{"style":{"height":20.97},"width":288.35,"height":52.42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-12.png","element":"img","alt":" poly(d, 1ε, 1δr , 1δu )","inline":true,"padRight":true},{"text":"samples (respectively ","element":"span"},{"text":"time), and query efficient if it uses ","element":"span"},{"style":{"height":20.97},"width":464.32,"height":52.42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-13.png","element":"img","alt":" poly(d, log 1ε, log 1δr , log 1δu )","inline":true,"padRight":true},{"text":"queries. Finally, for some parameter ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"(e.g. ","element":"span"},{"text":"dimension, error) and function ","element":"span"},{"style":{"height":14},"width":176.89,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-14.png","element":"img","alt":" f : R → R","inline":true},{"text":", for the sake of readability we will often use the notation ","element":"span"},{"style":{"height":18.83},"width":142.86,"height":47.07,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-15.png","element":"img","alt":"˜O(f(n))","inline":true,"padRight":true},{"text":"to ignore multiplicative factors that are logarithmic in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")","element":"span"},{"text":".","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"1.2.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Massart Noise","element":"span"}],[{"text":"To begin, we show that under the Massart noise model, finite inference dimension (Definition ","element":"span"},{"href":"#id-29","text":"1.1) ","element":"a"},{"text":"implies computationally efficient ARPU-learning with exponentially better query complexity than any passive PAClearner","element":"span"},{"style":{"height":17.39},"width":264.46,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-16.png","element":"img","alt":"1. Recall M(λ)","inline":true,"padRight":true},{"text":"is the set of all oracles which satisfy Massart noise with parameter ","element":"span"},{"style":{"height":14},"width":95.57,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-17.png","element":"img","alt":" λ, CX","inline":true,"padRight":true},{"text":"is the class of all continuous distributions over ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":", and a model is a pair ","element":"span"},{"style":{"height":16},"width":143.07,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-18.png","element":"img","alt":" (Q, DX)","inline":true,"padRight":true},{"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q ","element":"span"},{"text":"is a set of oracles ","element":"span"},{"style":{"height":16},"width":163.15,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-19.png","element":"img","alt":" (QL, QC)","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":13.19},"width":57.74,"height":32.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-20.png","element":"img","alt":" DX","inline":true,"padRight":true},{"text":"is a set of distributions over ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":". Note that in the ARPU-Learning model (Definition ","element":"span"},{"href":"#id-30","text":"1.3)","element":"a"},{"text":", given a hypothesis class ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":") ","element":"span"},{"text":"and a model ","element":"span"},{"style":{"height":16},"width":143.1,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-21.png","element":"img","alt":" (Q, DX)","inline":true},{"text":", an adversary chooses a distribution ","element":"span"},{"style":{"height":13.19},"width":59.99,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-22.png","element":"img","alt":" DX","inline":true,"padRight":true},{"text":"from ","element":"span"},{"style":{"height":13.19},"width":57.74,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-23.png","element":"img","alt":" DX","inline":true,"padRight":true},{"text":"and the “noisy” oracles ","element":"span"},{"style":{"height":16},"width":313.38,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-24.png","element":"img","alt":" (QL, QC) from Q.","inline":true}],[{"id":"id-31","style":{"fontWeight":"bold"},"text":"Theorem 1.4 ","element":"span"},{"text":"(Finite Inference Dimension ","element":"span"},{"style":{"height":8.8},"width":64.2,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-25.png","element":"img","alt":" =⇒","inline":true,"padRight":true},{"text":"ARPU-Learning under Massart Noise)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let the hypothesis class ","element":"span"},{"style":{"height":17.38},"width":275.63,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-26.png","element":"img","alt":" (X, H), X ⊆ Rd","inline":true},{"style":{"fontStyle":"italic"},"text":", have inference dimension ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"style":{"fontStyle":"italic"},"text":"with respect to comparison queries. Then, ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is ARPUlearnable under model ","element":"span"},{"style":{"height":35.5},"width":868.33,"height":88.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-27.png","element":"img","alt":" (M(λ), CX) in time poly�d, k, 1δr , 1ε, log( 1δu )� ˜O( 1λ5 )","inline":true},{"style":{"fontStyle":"italic"},"text":", uses only ","element":"span"},{"style":{"height":28.8},"width":540.48,"height":72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-28.png","element":"img","alt":" poly�k, 1λ, 1ε, log( 1δr ), log( 1δu ))�","inline":true}],[{"style":{"fontStyle":"italic"},"text":"unlabeled samples, and has a query complexity of","element":"span"}],[{"style":{"width":"39%"},"width":743,"height":96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-29.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"for ","element":"span"},{"style":{"height":16},"width":160.27,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-30.png","element":"img","alt":" δr ≤ 1/2.","inline":true}],[{"text":"To put this result into context, we note two lower bounds which together with Theorem ","element":"span"},{"href":"#id-31","text":"1.4 ","element":"a"},{"text":"show a separation between passive and active learning, and label only and comparison based ARPU-learning. In the case of passive, comparison based PAC-learning, we recall the ","element":"span"},{"style":{"height":19.37},"width":97.71,"height":48.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-31.png","element":"img","alt":" Ω� 1ε�","inline":true},{"text":"lower bound from [","element":"span"},{"href":"#id-5","referenceIndex":6,"text":"6","element":"a"},{"text":"]. For label only APRU-learning, we present a lower bound novel to this work:","element":"span"}],[{"id":"id-42","style":{"fontWeight":"bold"},"text":"Lemma 1.5. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The query complexity of ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"4","element":"span"},{"style":{"fontStyle":"italic"},"text":"-reliably, ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"8","element":"span"},{"style":{"fontStyle":"italic"},"text":"-usefully ARPU learning ","element":"span"},{"style":{"height":17.38},"width":463.84,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-32.png","element":"img","alt":" (S1, H2) with 1/2-coverage","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"under model ","element":"span"},{"style":{"height":16},"width":197.73,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-33.png","element":"img","alt":" (M(λ), CX)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is infinite:","element":"span"}],[{"style":{"width":"18%"},"width":355,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-34.png","element":"img"}],[{"text":"Together, these bounds show that comparison based active learning provides not only an exponential improvement in query complexity over any passive PAC-learner, but also an infinite improvement over any active ARPU-learner using only labels. Further, Theorem ","element":"span"},{"href":"#id-31","text":"1.4 ","element":"a"},{"text":"provides the first algorithm for learning noisy non-homogeneous linear separators in two dimensions which is time, sample, and query efficient in the sense of Section ","element":"span"},{"href":"#id-32","text":"1.2.1, ","element":"a"},{"text":"since the inference dimension of ","element":"span"},{"style":{"height":16},"width":147.18,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/7-35.png","element":"img","alt":" (R2, H2)","inline":true,"padRight":true},{"text":"is 5 [","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"4","element":"a"},{"text":"]. If the instance space has bounded bit-complexity or minimal-ratio, the result also implies an efficient learner for higher dimensional","element":"span"}],[{"text":"non-homogeneous linear separators.","element":"span"}],[{"text":"Bounded bit-complexity and minimal-ratio, however, are assumptions that may not hold on real-world data. Instead, we will take a path inspired by the recent explosion of work in data science [","element":"span"},{"href":"#id-33","referenceIndex":25,"text":"25","element":"a"},{"text":"] that focuses on weakly restricting the distribution over data to beat lower bounds based off of improbable adversarial examples. While inference dimension itself is not applicable in this scenario, we will employ its average case variant, average inference dimension (Definition ","element":"span"},{"href":"#id-34","text":"1.2)","element":"a"},{"text":". In particular, we provide a computationally efficient algorithm for learning under Massart noise under the assumption that the hypothesis class and distribution have super-exponential average inference dimension, a fact true for non-homogeneous linear separators and comparison queries across a wide range of distributions [","element":"span"},{"href":"#id-5","referenceIndex":6,"text":"6","element":"a"},{"text":"]. Given a hypothesis class ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":")","element":"span"},{"text":", recall ","element":"span"},{"style":{"height":17.28},"width":216,"height":43.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-0.png","element":"img","alt":" A(X,H),a,f(d)","inline":true,"padRight":true},{"text":"is the class of all continuous distributions ","element":"span"},{"style":{"height":16},"width":585.57,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-1.png","element":"img","alt":" DX over X such that (DX, X, H)","inline":true,"padRight":true},{"text":"has average inference dimension ","element":"span"},{"style":{"height":26.07},"width":583.58,"height":65.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-2.png","element":"img","alt":" g(n) ≤ 2−Ω�n1+af(d)�for some a > 0","inline":true,"padRight":true},{"text":"and function of dimension ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":")","element":"span"},{"text":". Then,","element":"span"}],[{"id":"id-36","style":{"fontWeight":"bold"},"text":"Theorem 1.6 ","element":"span"},{"text":"(Average Inference Dimension ","element":"span"},{"style":{"height":8.8},"width":64.98,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-3.png","element":"img","alt":" =⇒","inline":true,"padRight":true},{"text":"ARPU-Learning under Massart Noise)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Consider any hypothesis class ","element":"span"},{"style":{"height":17.39},"width":302.27,"height":43.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-4.png","element":"img","alt":" (X, H), X ⊆ Rd","inline":true},{"style":{"fontStyle":"italic"},"text":", and corresponding class of distributions ","element":"span"},{"style":{"height":17.28},"width":216,"height":43.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-5.png","element":"img","alt":" A(X,H),a,f(d)","inline":true},{"style":{"fontStyle":"italic"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Then, ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is ARPU-learnable under model ","element":"span"},{"style":{"height":17.68},"width":365.84,"height":44.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-6.png","element":"img","alt":" (M(λ), A(X,H),a,f(d))","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"in time ","element":"span"},{"style":{"height":35.5},"width":548.12,"height":88.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-7.png","element":"img","alt":" poly�f(d), 1δr , 1ε, log( 1δu )� ˜O( 1λ5 )","inline":true},{"style":{"fontStyle":"italic"},"text":", uses only","element":"span"}],[{"style":{"height":28.8},"width":676.48,"height":72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-8.png","element":"img","alt":"poly�f(d), 1λ, log( 1ε), log( 1δr ), log( 1δu ))�","inline":true},{"style":{"fontStyle":"italic"},"text":"unlabeled samples, and has a query complexity of","element":"span"}],[{"style":{"width":"48%"},"width":907,"height":101,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-9.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"for small enough ","element":"span"},{"style":{"height":13.99},"width":47.36,"height":34.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-10.png","element":"img","alt":" δr.","inline":true}],[{"text":"To see the applicability of Theorem ","element":"span"},{"href":"#id-35","text":"2.8, ","element":"a"},{"text":"we note that Hopkins et al. [","element":"span"},{"href":"#id-5","referenceIndex":6,"text":"6","element":"a"},{"text":"] proved that a wide range of distributions lie in ","element":"span"},{"style":{"height":17.63},"width":285.01,"height":44.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-11.png","element":"img","alt":" A(Rd,Hd),1,d log(d)","inline":true},{"text":". In particular, following [","element":"span"},{"href":"#id-5","referenceIndex":6,"text":"6","element":"a"},{"text":"], we say two distributions ","element":"span"},{"style":{"height":16.58},"width":315.8,"height":41.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-12.png","element":"img","alt":" D, D′ over Rd are","inline":true,"padRight":true},{"text":"affinely equivalent if there is an invertible affine map ","element":"span"},{"style":{"height":17.38},"width":690.46,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-13.png","element":"img","alt":" f : Rd → Rd such that D(x) = D′(f(x))","inline":true},{"text":". Hopkins et al. [","element":"span"},{"href":"#id-5","referenceIndex":6,"text":"6","element":"a"},{"text":"] proved that distributions which may be affinely transformed to a distribution with anti-concentration and concentration (i.e. to a distribution in ","element":"span"},{"style":{"height":16.39},"width":171.99,"height":40.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-14.png","element":"img","alt":" ACCd,c1,c2","inline":true},{"text":") lie in ","element":"span"},{"style":{"height":17.63},"width":285.01,"height":44.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-15.png","element":"img","alt":" A(Rd,Hd),1,d log(d)","inline":true},{"text":", a condition satisfied by s-concave distributions","element":"span"},{"style":{"height":7.6},"width":16,"height":19,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-16.png","element":"img","alt":"2","inline":true},{"text":". Recall that ","element":"span"},{"style":{"height":13.19},"width":67.42,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-17.png","element":"img","alt":" SCd","inline":true,"padRight":true},{"text":"is the class of all s-concave distribution, ","element":"span"},{"style":{"height":20.97},"width":183.71,"height":52.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-18.png","element":"img","alt":" s ≥ − 12d+3","inline":true},{"text":", on ","element":"span"},{"style":{"height":13.38},"width":45.78,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-19.png","element":"img","alt":" Rd","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":13.19},"width":50.12,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-20.png","element":"img","alt":" Hd","inline":true,"padRight":true},{"text":"is the ","element":"span"},{"text":"class of both homogeneous and non-homogeneous linear separators in ","element":"span"},{"style":{"height":13.38},"width":45.78,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-21.png","element":"img","alt":" Rd","inline":true},{"text":". Then, as a direct corollary to Theorem ","element":"span"},{"href":"#id-36","text":"1.6, ","element":"a"},{"text":"we have","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Corollary 1.7. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The hypothesis class ","element":"span"},{"style":{"height":17.39},"width":148.28,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-22.png","element":"img","alt":" (Rd, Hd)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is ARPU-learnable under model ","element":"span"},{"style":{"height":16},"width":355.18,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-23.png","element":"img","alt":" (M(λ), SCd) in time","inline":true}],[{"style":{"width":"99%"},"width":1872,"height":212,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-24.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"for small enough ","element":"span"},{"style":{"height":13.99},"width":47.36,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-25.png","element":"img","alt":" δr.","inline":true}],[{"text":"Previous work showed a similar result for homogeneous linear separators over nearly isotropic log-concave distributions [","element":"span"},{"href":"#id-37","referenceIndex":26,"text":"26","element":"a"},{"text":"] and isotropic s-concave distributions [","element":"span"},{"href":"#id-2","referenceIndex":3,"text":"3","element":"a"},{"text":"] with label queries. However, their techniques cannot be extended to the non-homogeneous case due to a ","element":"span"},{"style":{"height":19.37},"width":132.73,"height":48.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/8-26.png","element":"img","alt":" poly( 1ε)","inline":true,"padRight":true},{"text":"lower bound on the query complexity of ","element":"span"},{"text":"active label-only learners [","element":"span"},{"href":"#id-5","referenceIndex":6,"text":"6","element":"a"},{"text":"]. Thus it is only by leveraging the additional power of comparison queries that we extend efficient learning to non-homogeneous linear separators over (not necessarily isotropic) s-concave distributions.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"1.2.3 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Generalized Tsybakov Low Noise Condition","element":"span"}],[{"text":"While Massart noise is a clean theoretical model, its assumption that the noise is bounded away from ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"2 ","element":"span"},{"text":"is not necessarily reminiscent of the real world. This motivates us to study a variant of the Tsybakov Low Noise Condition, a model in which noise is unbounded as data approaches the Bayes optimal classifier. ","element":"span"},{"text":"However, learning in this unbounded regime is harder, as evidenced by the polynomial query lower bounds of [","element":"span"},{"href":"#id-11","referenceIndex":12,"text":"12","element":"a"},{"text":", ","element":"span"},{"href":"#id-12","referenceIndex":13,"text":"13","element":"a"},{"text":", ","element":"span"},{"href":"#id-14","referenceIndex":15,"text":"15","element":"a"},{"text":"]. In order to ARPU-learn in this regime, we need to introduce several restrictions not present for our Massart algorithms. First, instead of allowing any hypothesis class with finite inference dimension, we will only consider (non-homogeneous) linear separators. Second, we will either assume some margin ","element":"span"},{"style":{"height":10.4},"width":22,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-0.png","element":"img","alt":" γ","inline":true},{"text":", or that the distribution satisfies certain weak concentration and anti-concentration bounds. To begin, we consider learning hypothesis classes over any continuous distribution with finite inference dimension and margin. Recall ","element":"span"},{"style":{"height":16},"width":310.58,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-1.png","element":"img","alt":" GTNC(gL, gU, ε0)","inline":true,"padRight":true},{"text":"is the set of all oracles which satisfy the Generalized Tsybakov Low Noise Condition with parameters ","element":"span"},{"style":{"height":16.79},"width":293.84,"height":41.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-2.png","element":"img","alt":" (gL, gU, ε0), Hd,γ","inline":true,"padRight":true},{"text":"is the class of linear separators in ","element":"span"},{"style":{"height":13.39},"width":45.78,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-3.png","element":"img","alt":" Rd","inline":true,"padRight":true},{"text":"with margin ","element":"span"},{"style":{"height":10.4},"width":22,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-4.png","element":"img","alt":" γ","inline":true,"padRight":true},{"text":"from ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":", and ","element":"span"},{"style":{"height":13.19},"width":47.98,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-5.png","element":"img","alt":" CX","inline":true,"padRight":true},{"text":"is the class of all continuous distributions over ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":".","element":"span"}],[{"id":"id-39","style":{"fontWeight":"bold"},"text":"Theorem 1.8 ","element":"span"},{"text":"(Finite Inference Dimension and Margin ","element":"span"},{"style":{"height":8.8},"width":64.57,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-6.png","element":"img","alt":" =⇒","inline":true,"padRight":true},{"text":"ARPU-Learning under GTNC)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"height":15.78},"width":135.07,"height":39.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-7.png","element":"img","alt":" X ⊆ Rd","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"style":{"height":16.79},"width":163.08,"height":41.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-8.png","element":"img","alt":" (X, Hd,γ)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"have inference dimension ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"style":{"fontStyle":"italic"},"text":"with respect to comparison queries. Then for small enough ","element":"span"},{"style":{"height":13.99},"width":32.71,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-9.png","element":"img","alt":" δr","inline":true},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"style":{"height":16.79},"width":162.77,"height":41.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-10.png","element":"img","alt":"(X, Hd,γ)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is ARPU-learnable under model ","element":"span"},{"style":{"height":16},"width":410.65,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-11.png","element":"img","alt":" (GTNC(gL, gU, ε0), CX)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"with query complexity:","element":"span"}],[{"style":{"width":"66%"},"width":1248,"height":170,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-12.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Where","element":"span"}],[{"style":{"width":"38%"},"width":727,"height":96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-13.png","element":"img"}],[{"text":"We prove in addition that while ARPU-learning may no longer be impossible using only labels when margin is introduced, it still suffers from query inefficiency due to the curse of dimensionality.","element":"span"}],[{"id":"id-54","style":{"height":14.18},"width":584.29,"height":35.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-14.png","element":"img","alt":"Lemma 1.9. Let X ∈ Rd be the d","inline":true},{"style":{"fontStyle":"italic"},"text":"-dimensional hypercube ","element":"span"},{"style":{"height":17.38},"width":113.62,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-15.png","element":"img","alt":" {0, 1}d ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"modified to have a ball of radius ","element":"span"},{"style":{"height":22.73},"width":213.97,"height":56.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-16.png","element":"img","alt":"14√d centered","inline":true}],[{"style":{"fontStyle":"italic"},"text":"about each point. The query complexity of ARPU-learning ","element":"span"},{"style":{"height":22.04},"width":202.72,"height":55.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-17.png","element":"img","alt":" (X, Hd, 14√d )","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"under model ","element":"span"},{"style":{"height":22.73},"width":440.48,"height":56.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-18.png","element":"img","alt":" (GTNC(gL, gU, 14√d), CX)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is at least:","element":"span"}],[{"style":{"width":"21%"},"width":412,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-19.png","element":"img"}],[{"text":"In the above example, ","element":"span"},{"style":{"height":22.44},"width":204.22,"height":56.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-20.png","element":"img","alt":" (X, Hd, 14√d )","inline":true,"padRight":true},{"text":"has inference dimension ","element":"span"},{"style":{"height":18.83},"width":85.26,"height":47.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-21.png","element":"img","alt":" ˜O(d)","inline":true,"padRight":true},{"text":"by a minimal-ratio argument from [","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"4","element":"a"},{"text":"]. ","element":"span"},{"text":"Theorem ","element":"span"},{"href":"#id-31","text":"1.4 ","element":"a"},{"text":"thus gives an algorithm using only ","element":"span"},{"text":"poly(","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":") ","element":"span"},{"text":"queries, demonstrating the exponential gap in query complexity between label only and comparison based ARPU-learning with Tsybakov noise. Due to margin causing bounded error in label queries, another way to view this result is the statement that comparison queries with unbounded error exponenentially improve the query complexity of ARPU-learning using only labels with bounded error.","element":"span"}],[{"text":"Similar to the case of Massart noise, we may drop the restrictive assumptions of finite inference dimension and margin by assuming weak distributional requirements. Unlike in the case of Massart, here we deal with the requirements directly rather than assuming average inference dimension. Recall ","element":"span"},{"style":{"height":16.39},"width":171.99,"height":40.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-22.png","element":"img","alt":" ACCd,c1,c2","inline":true,"padRight":true},{"text":"is the class of all continuous distributions ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D ","element":"span"},{"text":"with the following properties:","element":"span"}],[{"text":"1. ","element":"span"},{"style":{"height":17.5},"width":541.04,"height":43.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-23.png","element":"img","alt":" ∀α > 0, Prx∼D[||x|| > dα] ≤ c1α","inline":true}],[{"text":"2. ","element":"span"},{"style":{"height":17.38},"width":1078.29,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-24.png","element":"img","alt":" ∀α > 0, v ∈ Rd, ∥v∥ = 1, b ∈ Rd, Prx∼D[|⟨x, v⟩ + b| ≤ α] ≤ c2α","inline":true}],[{"id":"id-72","style":{"fontWeight":"bold"},"text":"Theorem 1.10 ","element":"span"},{"text":"(Concentration and Anti-Concentration ","element":"span"},{"style":{"height":8.8},"width":64.97,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-25.png","element":"img","alt":" =⇒","inline":true,"padRight":true},{"text":"ARPU-learning under ","element":"span"},{"text":"GTNC","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"For small enough ","element":"span"},{"style":{"height":13.99},"width":32.71,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-26.png","element":"img","alt":" δr","inline":true},{"style":{"fontStyle":"italic"},"text":", the hypothesis class ","element":"span"},{"style":{"height":17.38},"width":146.96,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-27.png","element":"img","alt":" (Rd, Hd)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is ARPU-learnable under model ","element":"span"},{"style":{"height":16.39},"width":619.2,"height":40.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-28.png","element":"img","alt":" (GTNC(gL, gU, ε0), ACCd,c1,c2) with","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"query complexity:","element":"span"}],[{"style":{"width":"69%"},"width":1311,"height":218,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/9-29.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where","element":"span"}],[{"style":{"width":"19%"},"width":358,"height":96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/10-0.png","element":"img"}],[{"text":"Since isotropic s-concave distributions satisfy these conditions [","element":"span"},{"href":"#id-2","referenceIndex":3,"text":"3","element":"a"},{"text":", ","element":"span"},{"href":"#id-5","referenceIndex":6,"text":"6","element":"a"},{"text":"], we get the immediate corollary for TNC noise under isotropic s-concave distributions. Recall that ","element":"span"},{"style":{"height":13.19},"width":92.07,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/10-1.png","element":"img","alt":" ISCd","inline":true,"padRight":true},{"text":"is the class of all isotropic (","element":"span"},{"text":"0 ","element":"span"},{"text":"mean, identity variance) s-concave distribution on ","element":"span"},{"style":{"height":16.58},"width":198.63,"height":41.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/10-2.png","element":"img","alt":" Rd, and Hd","inline":true,"padRight":true},{"text":"is the class of non-homogeneous linear separators in ","element":"span"},{"style":{"height":13.39},"width":58.36,"height":33.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/10-3.png","element":"img","alt":"Rd.","inline":true}],[{"id":"id-38","style":{"fontWeight":"bold"},"text":"Corollary 1.11. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The hypothesis class ","element":"span"},{"style":{"height":17.38},"width":148.5,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/10-4.png","element":"img","alt":" (Rd, Hd)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is ARPU-learnable under model ","element":"span"},{"style":{"height":16},"width":450.29,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/10-5.png","element":"img","alt":" (TNC(m, M, κ, ε0), ISCd)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"with query complexity:","element":"span"}],[{"style":{"width":"51%"},"width":974,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/10-6.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Where","element":"span"}],[{"style":{"width":"17%"},"width":330,"height":75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/10-7.png","element":"img"}],[{"text":"This result similarly extends previous work on homogeneous linear separators over isotropic log-concave distributions [","element":"span"},{"href":"#id-1","referenceIndex":2,"text":"2","element":"a"},{"text":", ","element":"span"},{"href":"#id-12","referenceIndex":13,"text":"13","element":"a"},{"text":"] to the non-homogeneous case. In comparison to Hanneke and Yang’s [","element":"span"},{"href":"#id-11","referenceIndex":12,"text":"12","element":"a"},{"text":"] distribution free algorithm for label only PAC-learning, Corollary ","element":"span"},{"href":"#id-38","text":"1.11 ","element":"a"},{"text":"provides an improved query complexity for ","element":"span"},{"style":{"height":19.77},"width":201.1,"height":49.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/10-8.png","element":"img","alt":" 1 < κ < 1514,","inline":true,"padRight":true},{"text":"and more importantly provides the reliability guarantees of the ARPU-learning model.","element":"span"}],[{"text":"Finally, note that unlike Theorems ","element":"span"},{"href":"#id-31","text":"1.4, ","element":"a"},{"href":"#id-36","text":"1.6, ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-39","text":"1.8, ","element":"a"},{"text":"Corollary ","element":"span"},{"href":"#id-38","text":"1.11 ","element":"a"},{"text":"has polynomial rather than polylogarithmic dependence on ","element":"span"},{"style":{"height":13.38},"width":59.49,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/10-9.png","element":"img","alt":" ε−1","inline":true},{"text":". This is unavoidable, as we prove a lower bound also polynomial in ","element":"span"},{"style":{"height":13.38},"width":72.37,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/10-10.png","element":"img","alt":" ε−1.","inline":true}],[{"id":"id-70","style":{"fontWeight":"bold"},"text":"Lemma 1.12. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The query complexity of actively PAC-learning ","element":"span"},{"style":{"height":17.39},"width":145.56,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/10-11.png","element":"img","alt":" (R2, H2)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"under model ","element":"span"},{"style":{"height":15.6},"width":422.36,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/10-12.png","element":"img","alt":" (TNC(m, M, κ, ε0), SC2)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is at least","element":"span"}],[{"style":{"width":"63%"},"width":1192,"height":185,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/10-13.png","element":"img"}],[{"text":"Thus the main advantage of comparisons in this regime is their added reliability.","element":"span"}],[{"id":"id-21","style":{"fontWeight":"bold"},"text":"1.3 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Techniques","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"1.3.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Inference Dimension","element":"span"}],[{"text":"Our algorithms will follow the form of the learning technique for hypothesis classes with finite inference dimension (Definition ","element":"span"},{"href":"#id-29","text":"1.1) ","element":"a"},{"text":"introduced in [","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"4","element":"a"},{"text":"]. Drawing and querying a subsample ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":", Kane et al. build a weak learner by defining a Linear Program (LP) with constraints given by the query responses ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":")","element":"span"},{"text":", and objective function defined by the input point to be labeled. Through a symmetry argument, Kane et al. are able to show that if ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"is large enough with respect to the inference dimension, the coverage of this weak learner will be at least ","element":"span"},{"text":"3","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"4","element":"span"},{"text":". Since we will rely on this argument throughout our paper, we offer a brief description here.","element":"span"}],[{"text":"The expected coverage of the learner may be viewed as the probability that a randomly drawn point from the distribution is inferred by the LP. Since our weak learner is built from some finite sample from the same distribution, symmetry gives that this is equivalent to the probability that any of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"style":{"fontStyle":"italic"},"text":"| ","element":"span"},{"text":"+ 1 ","element":"span"},{"text":"points can be inferred from the other ","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"text":". Kane et al. then provide the following observation for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"style":{"fontStyle":"italic"},"text":"| ","element":"span"},{"text":"= ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"and inference dimension ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"which proves that setting ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= 4","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"gives coverage at least ","element":"span"},{"text":"3","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"4","element":"span"},{"text":".","element":"span"}],[{"id":"id-49","style":{"fontWeight":"bold"},"text":"Observation 1.13 ","element":"span"},{"text":"(Observation 3.4 [","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"4","element":"a"},{"text":"])","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let the hypothesis class ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"S, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":")","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"style":{"fontStyle":"italic"},"text":"| ","element":"span"},{"text":"= ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":", have inference dimension ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"style":{"fontStyle":"italic"},"text":"for the set of binary queries ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"style":{"fontStyle":"italic"},"text":". Then ","element":"span"},{"style":{"height":12},"width":132.64,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/10-14.png","element":"img","alt":" ∀h ∈ H","inline":true},{"style":{"fontStyle":"italic"},"text":", there exists a subset ","element":"span"},{"style":{"height":11.6},"width":121.88,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/10-15.png","element":"img","alt":" S′ ⊂ S","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"of size ","element":"span"},{"style":{"height":12},"width":167.84,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/10-16.png","element":"img","alt":" n − k + 1","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"such that","element":"span"}],[{"style":{"width":"58%"},"width":1097,"height":93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/10-17.png","element":"img"}],[{"text":"Inference dimension on its own, however, is restrictive. Using only comparisons and labels, the inference dimension of linear separators in three or more dimensions is infinite, which implies the existence of realizable distributions with ","element":"span"},{"style":{"height":19.37},"width":83.84,"height":48.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/11-0.png","element":"img","alt":" Ω( 1ε)","inline":true,"padRight":true},{"text":"query complexity [","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"4","element":"a"},{"text":"]. To get around this barrier, we will introduce weak distributional ","element":"span"},{"text":"assumptions and instead employ the framework of average inference dimension introduced in [","element":"span"},{"href":"#id-5","referenceIndex":6,"text":"6","element":"a"},{"text":"]. Average inference dimension (Definition ","element":"span"},{"href":"#id-34","text":"1.2) ","element":"a"},{"text":"allows us to build algorithms for hypothesis classes with infinite inference dimension, as long as the distribution it is over is sufficiently nice. We will take advantage of a reduction from average to worst case inference dimension to prove such results:","element":"span"}],[{"id":"id-53","style":{"fontWeight":"bold"},"text":"Observation 1.14 ","element":"span"},{"text":"(Observation 3.6 [","element":"span"},{"href":"#id-5","referenceIndex":6,"text":"6","element":"a"},{"text":"])","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"D, X, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"have average inference dimension ","element":"span"},{"style":{"height":15.6},"width":322.05,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/11-1.png","element":"img","alt":" g(n), and S ∼ Dn.","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"Then ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"S, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"has inference dimension ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"style":{"fontStyle":"italic"},"text":"with probability:","element":"span"}],[{"style":{"width":"50%"},"width":940,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/11-2.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"1.3.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Noisy Sorting","element":"span"}],[{"text":"The linear program used as a weak learner relies heavily on the correctness of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":")","element":"span"},{"text":", making noisy oracles a challenging problem. To retain correctness and reliability, we rely on using extra points outside of the linear program to help identify the true answers ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":")","element":"span"},{"text":". This idea is not all together new. Contemporaneously with Kane et al., Xu, Zhang, Singh, Miller and Dubrawski [","element":"span"},{"href":"#id-14","referenceIndex":15,"text":"15","element":"a"},{"text":"] suggested using noisy comparisons as a sub-routine in older active learning algorithms to correct for noise in labels. However, as Xu et al. [","element":"span"},{"href":"#id-14","referenceIndex":15,"text":"15","element":"a"},{"text":"] point out, this technique does not work for Kane et al.’s [","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"4","element":"a"},{"text":"] algorithm which requires corrected comparisons as well. Instead, we adopt and adapt a noisy sorting algorithm from Braverman and Mossel ","element":"span"},{"href":"#id-18","referenceIndex":19,"text":"[19]","element":"a"},{"text":".","element":"span"}],[{"text":"Braverman and Mossel [","element":"span"},{"href":"#id-18","referenceIndex":19,"text":"19","element":"a"},{"text":"] study the problem of recovering the best possible ranking from an ordered set with access to a noisy comparison oracle ","element":"span"},{"style":{"height":14},"width":55.51,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/11-3.png","element":"img","alt":" QC","inline":true},{"text":". In particular, given a ground set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"of size ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":", Braverman and Mossel aim to find an order ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/11-4.png","element":"img","alt":" π","inline":true,"padRight":true},{"text":"that minimizes the number of discrepancies with the measured comparisons ","element":"span"},{"style":{"height":16},"width":116.53,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/11-5.png","element":"img","alt":"QC(S)","inline":true},{"text":", denoted by the order relation ","element":"span"},{"style":{"height":9.6},"width":42,"height":24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/11-6.png","element":"img","alt":"�<:","inline":true}],[{"style":{"width":"46%"},"width":865,"height":66,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/11-7.png","element":"img"}],[{"text":"If the oracle ","element":"span"},{"style":{"height":14},"width":55.5,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/11-8.png","element":"img","alt":" QC","inline":true,"padRight":true},{"text":"flips comparisons with probability exactly ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p < ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"2 ","element":"span"},{"text":"and the true ordering has a uniform prior, Braverman and Mossel [","element":"span"},{"href":"#id-18","referenceIndex":19,"text":"19","element":"a"},{"text":"] note that this scoring function has a nice probabilistic interpretation: it is a Maximum Likelihood ordering","element":"span"}],[{"style":{"width":"19%"},"width":369,"height":60,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/11-9.png","element":"img"}],[{"text":"Braverman and Mossel [","element":"span"},{"href":"#id-18","referenceIndex":19,"text":"19","element":"a"},{"text":"] call finding such an ordering the Noisy Signal Aggregation (NSA) problem, and provide a randomized algorithm that uses only ","element":"span"},{"style":{"height":16},"width":220.28,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/11-10.png","element":"img","alt":" Oλ(n log(n))","inline":true,"padRight":true},{"text":"comparisons for oracles satisfying Massart noise with parameter ","element":"span"},{"style":{"height":10.8},"width":23,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/11-11.png","element":"img","alt":" λ","inline":true},{"text":". Further, they provide an important structural insight into MLE orderings: with high probability, no point in an MLE order has moved further than ","element":"span"},{"style":{"height":15.6},"width":188.67,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/11-12.png","element":"img","alt":" Oλ(log(n))","inline":true,"padRight":true},{"text":"from its position in the true order.","element":"span"}],[{"id":"id-44","style":{"fontWeight":"bold"},"text":"Theorem 1.15 ","element":"span"},{"text":"(Optimal Ranking [","element":"span"},{"href":"#id-18","referenceIndex":19,"text":"19","element":"a"},{"text":"])","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be a set of size ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"fontStyle":"italic"},"text":"with underlying order ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":", . . . , n ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/11-13.png","element":"img","alt":" σ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"an MLE order for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"style":{"fontStyle":"italic"},"text":"under comparisons given by an oracle ","element":"span"},{"style":{"height":14},"width":55.5,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/11-14.png","element":"img","alt":" QC","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"satisfying Massart noise with parameter ","element":"span"},{"style":{"height":10.8},"width":142.51,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/11-15.png","element":"img","alt":" λ. Then","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"with probability at least ","element":"span"},{"style":{"height":11.6},"width":100.85,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/11-16.png","element":"img","alt":" 1 − δ:","inline":true}],[{"style":{"width":"37%"},"width":702,"height":101,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/11-17.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"as long as ","element":"span"},{"style":{"height":19.37},"width":110.38,"height":48.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/11-18.png","element":"img","alt":" n or 1δ ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is at least exponential in ","element":"span"},{"style":{"height":13.39},"width":78.03,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/11-19.png","element":"img","alt":" λ−1.","inline":true}],[{"text":"This pointwise movement allows us to determine with high probability comparisons between points that are well-separated throughout an MLE order. By using only such separated points to build our inference LP, our algorithms are almost reliable – a point can only be mislabeled if some well-separated comparison is wrong, a low probability event.","element":"span"}],[{"text":"While Braverman and Mossel’s algorithm is query efficient and has a strong pointwise movement guarantee, its exponential time complexity in the error parameter is the main limiting factor in the computational efficiency of our algorithm for Massart noise. The existence of an efficient (polynomial in error) sorting scheme that retains some sub-linear (not necessarily logarithmic) point-wise movement bound under Massart noise would immediately imply computationally and query efficient algorithms for Massart noise for ","element":"span"},{"style":{"height":13.38},"width":64.15,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/12-0.png","element":"img","alt":" λ−1","inline":true,"padRight":true},{"text":"poly-logarithmic in ","element":"span"},{"style":{"height":19.37},"width":16,"height":48.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/12-1.png","element":"img","alt":"1ε","inline":true},{"text":", rather than for ","element":"span"},{"style":{"height":19.53},"width":375.66,"height":48.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/12-2.png","element":"img","alt":" λ−1 = ˜O(log1/5(1/ε))","inline":true,"padRight":true},{"text":"as we require. Follow up works on Braverman ","element":"span"},{"text":"and Massart’s algorithm [","element":"span"},{"href":"#id-40","referenceIndex":27,"text":"27","element":"a"},{"text":"–","element":"span"},{"href":"#id-41","referenceIndex":29,"text":"29","element":"a"},{"text":"] made progress in this direction, providing algorithms with significantly improved time complexity, but only work for ","element":"span"},{"style":{"height":10.8},"width":23,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/12-3.png","element":"img","alt":" λ","inline":true,"padRight":true},{"text":"bounded from below by some constant. Providing an algorithm that remains efficient while ","element":"span"},{"style":{"height":10.8},"width":23,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/12-4.png","element":"img","alt":" λ","inline":true,"padRight":true},{"text":"goes to 0 is an open problem.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"1.3.3 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Cluster Detection and Inference","element":"span"}],[{"text":"Braverman and Mossel’s noisy sorting algorithm works well in the case of bounded error, but noise models with unbounded error require a different approach. The particular model we examine in this case, the Generalized Tsybakov Low Noise Condition, is a distance based error metric. This means that as points approach each other in function value, their comparisons “look random”. We can use this fact to detect clusters of points close in function value by testing whether comparisons between them look like they have been drawn at random. In particular, we define a natural measure of randomness that we call equitability:","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Definition 1.16 ","element":"span"},{"text":"(Equitability)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be a set with comparisons denoted by ","element":"span"},{"style":{"height":9.6},"width":31,"height":24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/12-5.png","element":"img","alt":"�<","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"on each pair of elements. For an element ","element":"span"},{"style":{"height":16},"width":255.7,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/12-6.png","element":"img","alt":" x ∈ S, let v(x)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"denote the number of elements ","element":"span"},{"style":{"height":14.4},"width":590.61,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/12-7.png","element":"img","alt":" y ∈ S such that y� 0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"such that for any ","element":"span"},{"style":{"height":16},"width":229.2,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/16-24.png","element":"img","alt":" 1/2 > δr > 0","inline":true},{"style":{"fontStyle":"italic"},"text":", there exists a weak learner that ","element":"span"},{"style":{"height":13.99},"width":52.92,"height":34.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/16-25.png","element":"img","alt":" 3δr","inline":true},{"style":{"fontStyle":"italic"},"text":"-reliably learns ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":")","element":"span"},{"style":{"fontStyle":"italic"},"text":", has coverage ","element":"span"},{"style":{"height":9.19},"width":33.25,"height":22.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/16-26.png","element":"img","alt":" c1","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"with probability ","element":"span"},{"style":{"height":12.8},"width":75.31,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/16-27.png","element":"img","alt":" ≥ c1","inline":true},{"style":{"fontStyle":"italic"},"text":", makes at most ","element":"span"},{"style":{"height":16},"width":120.9,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/16-28.png","element":"img","alt":" qwl(δr)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"queries, and runs in time ","element":"span"},{"style":{"height":23.3},"width":420.12,"height":58.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/16-29.png","element":"img","alt":" poly(k, 1δr ) ˜O(λ−5), where","inline":true}],[{"style":{"width":"62%"},"width":1162,"height":315,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/16-30.png","element":"img"}],[{"text":"Following Lemmas ","element":"span"},{"href":"#id-47","text":"2.5 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-48","text":"2.6, ","element":"a"},{"text":"we we will slot a second, i.i.d. drawn set ","element":"span"},{"style":{"height":10.8},"width":40.74,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/16-31.png","element":"img","alt":" S′","inline":true,"padRight":true},{"text":"of points into our MLE order where ","element":"span"},{"style":{"height":16},"width":262.07,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/16-32.png","element":"img","alt":" |S′| = 32k + 16","inline":true},{"text":". Then with constant probability we can find a subset of ","element":"span"},{"style":{"height":14.4},"width":260.06,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/16-33.png","element":"img","alt":" 4k points in S′ ","inline":true,"padRight":true},{"text":"which may be","element":"span"}],[{"text":"correctly ordered and labeled with probability at least ","element":"span"},{"style":{"height":13.99},"width":114.99,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-0.png","element":"img","alt":" 1 − δr.","inline":true}],[{"text":"We are now in position to apply the symmetry argument from [","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"4","element":"a"},{"text":"] to show that this subset gives constant coverage with constant probability. The expected coverage is given by the probability that an additional, independently drawn point ","element":"span"},{"style":{"height":13.19},"width":135.9,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-1.png","element":"img","alt":" x ∼ DX","inline":true,"padRight":true},{"text":"is inferred:","element":"span"}],[{"style":{"width":"61%"},"width":1153,"height":60,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-2.png","element":"img"}],[{"text":"Since ","element":"span"},{"style":{"height":11.6},"width":150.32,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-3.png","element":"img","alt":" S′ and x","inline":true,"padRight":true},{"text":"are drawn randomly, the right hand side is equivalent to the probability that any point in the sample can be inferred from the rest:","element":"span"}],[{"style":{"width":"57%"},"width":1077,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-4.png","element":"img"}],[{"text":"Recall that with probability at least ","element":"span"},{"style":{"height":19.77},"width":16,"height":49.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-5.png","element":"img","alt":"59","inline":true,"padRight":true},{"text":"we can find and, with probability ","element":"span"},{"style":{"height":13.99},"width":101.41,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-6.png","element":"img","alt":" 1 − δr","inline":true},{"text":", correctly order and label a ","element":"span"},{"text":"subset of ","element":"span"},{"text":"4","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"points from ","element":"span"},{"style":{"height":10.8},"width":40.73,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-7.png","element":"img","alt":" S′","inline":true},{"text":". By Observation ","element":"span"},{"href":"#id-49","text":"1.13, ","element":"a"},{"text":"at least ","element":"span"},{"text":"3","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"of these can be inferred from the rest, bounding the right hand side by:","element":"span"}],[{"style":{"width":"68%"},"width":1292,"height":96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-8.png","element":"img"}],[{"text":"where we have assumed ","element":"span"},{"style":{"height":16},"width":148.34,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-9.png","element":"img","alt":" δr < 1/2","inline":true},{"text":". Then for any constant ","element":"span"},{"style":{"height":13.59},"width":269.67,"height":33.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-10.png","element":"img","alt":" c1 > 0 we have:","inline":true}],[{"style":{"width":"57%"},"width":1074,"height":83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-11.png","element":"img"}],[{"text":"which for small enough ","element":"span"},{"style":{"height":14.4},"width":143.72,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-12.png","element":"img","alt":" c1 gives:","inline":true}],[{"style":{"width":"31%"},"width":583,"height":84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-13.png","element":"img"}],[{"text":"Accounting for the fact that we have assumed our comparisons and labels are correct, our weak learner has coverage ","element":"span"},{"style":{"height":11.19},"width":75.31,"height":27.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-14.png","element":"img","alt":" > c1","inline":true,"padRight":true},{"text":"with probability at least ","element":"span"},{"style":{"height":19.37},"width":477.48,"height":48.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-15.png","element":"img","alt":" (1 − δr)2c1 > c1 for δr < 12.","inline":true}],[{"style":{"fontWeight":"bold"},"text":"Query Complexity: ","element":"span"},{"text":"Now, we compute the number of queries made by the weak learner. Let ","element":"span"},{"style":{"height":16},"width":249.18,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-16.png","element":"img","alt":" c3 = m2/ log n","inline":true,"padRight":true},{"text":"where ","element":"span"},{"style":{"height":9.19},"width":50.99,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-17.png","element":"img","alt":" m2","inline":true,"padRight":true},{"text":"is the point-wise movement as defined in Observation ","element":"span"},{"href":"#id-46","text":"2.4. ","element":"a"},{"text":"Using the same notation as [","element":"span"},{"href":"#id-18","referenceIndex":19,"text":"19","element":"a"},{"text":"], we let (setting ","element":"span"},{"style":{"height":10.8},"width":110.77,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-18.png","element":"img","alt":" α = O","inline":true}],[{"style":{"width":"99%"},"width":1864,"height":122,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-19.png","element":"img"}],[{"text":"Using [","element":"span"},{"href":"#id-18","referenceIndex":19,"text":"19","element":"a"},{"text":", Lemmas 31 and 32], the number of queries made in the sorting ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"points (which includes dynamic programming step on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"points and slotting ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"points) and slotting additional ","element":"span"},{"style":{"height":16},"width":263.66,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-20.png","element":"img","alt":" |S′| = 32k + 16","inline":true,"padRight":true},{"text":"points are","element":"span"}],[{"style":{"width":"68%"},"width":1275,"height":212,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-21.png","element":"img"}],[{"text":"with probability ","element":"span"},{"style":{"height":13.99},"width":101.42,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-22.png","element":"img","alt":" 1 − δr","inline":true},{"text":". Since we do not want our number of queries to be probabilistic, if our learner does not complete after ","element":"span"},{"style":{"height":16},"width":120.68,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-23.png","element":"img","alt":" qwl(δr)","inline":true,"padRight":true},{"text":"queries, we stop and output all 0’s. This increases our error probability by ","element":"span"},{"style":{"height":13.99},"width":46.36,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/17-24.png","element":"img","alt":" δr.","inline":true}],[{"style":{"fontWeight":"bold"},"text":"Time Complexity: ","element":"span"},{"text":"Using an algorithm from [","element":"span"},{"href":"#id-18","referenceIndex":19,"text":"19","element":"a"},{"text":", Theorem 30], we can sort ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"points with noisy comparisons in time ","element":"span"},{"style":{"height":10.59},"width":52.16,"height":26.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-0.png","element":"img","alt":" nc4","inline":true,"padRight":true},{"text":"where ","element":"span"},{"style":{"height":21.77},"width":656.72,"height":54.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-1.png","element":"img","alt":" c4 = O(λ−5 log 1λ(1 + (log 1δr )( 1log n)))","inline":true,"padRight":true},{"text":"with probability ","element":"span"},{"style":{"height":13.99},"width":103.26,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-2.png","element":"img","alt":" 1 − δr","inline":true},{"text":". Since slotting a point in ","element":"span"},{"text":"worst case takes ","element":"span"},{"style":{"fontStyle":"italic"},"text":"O","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":") ","element":"span"},{"text":"time, we can slot ","element":"span"},{"style":{"fontStyle":"italic"},"text":"O","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":") ","element":"span"},{"text":"points in time ","element":"span"},{"style":{"fontStyle":"italic"},"text":"O","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"kn","element":"span"},{"text":")","element":"span"},{"text":". This gives us the total time taken by the weak learner as","element":"span"}],[{"style":{"width":"65%"},"width":1229,"height":120,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-3.png","element":"img"}],[{"text":"Therefore, the time complexity of the algorithm is ","element":"span"},{"style":{"height":25.12},"width":285.72,"height":62.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-4.png","element":"img","alt":" poly(k, 1δr )˜O( 1λ5 )","inline":true},{"text":". Once again taking the strategy of outputting all 0’s if the algorithm does not complete in time ","element":"span"},{"style":{"height":16},"width":126.49,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-5.png","element":"img","alt":" Twl(δr)","inline":true},{"text":", we lose another error factor of ","element":"span"},{"style":{"height":13.99},"width":32.71,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-6.png","element":"img","alt":" δr","inline":true},{"text":", making the algorithm all together ","element":"span"},{"style":{"height":13.99},"width":205.84,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-7.png","element":"img","alt":" 3δr-reliable.","inline":true}],[{"text":"With our weak learner in hand, all that is left for the proof of Theorem ","element":"span"},{"href":"#id-50","text":"2.3 ","element":"a"},{"text":"is Step 3: stringing together copies of the weak learner through rejection sampling.","element":"span"}],[{"href":"#id-50","style":{"height":15.54},"width":653.86,"height":38.85,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-8.png","element":"img","alt":"Proof of Theorem 2.3. Let δwr and δwu ","inline":true,"padRight":true},{"text":"be reliability and usefullness parameters for our weak learner. Recall ","element":"span"},{"text":"that Lemma ","element":"span"},{"href":"#id-51","text":"2.7 ","element":"a"},{"text":"gives a ","element":"span"},{"style":{"height":15.54},"width":62.2,"height":38.85,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-9.png","element":"img","alt":" 3δwr","inline":true,"padRight":true},{"text":"-reliable weak learner with coverage ","element":"span"},{"style":{"height":9.19},"width":33.25,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-10.png","element":"img","alt":" c1","inline":true,"padRight":true},{"text":"with probability ","element":"span"},{"style":{"height":9.19},"width":33.24,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-11.png","element":"img","alt":" c1","inline":true},{"text":". Applying this weak ","element":"span"},{"text":"learner ","element":"span"},{"style":{"height":16},"width":231.67,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-12.png","element":"img","alt":" O(log(1/δwu ))","inline":true,"padRight":true},{"text":"then amplifies this probability to at least ","element":"span"},{"style":{"height":15.54},"width":124.65,"height":38.85,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-13.png","element":"img","alt":" 1 − δwu .","inline":true}],[{"text":"Restricting to the distribution of un-inferred points via rejection sampling, we repeat the above process until our coverage reaches ","element":"span"},{"style":{"height":10.8},"width":87.64,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-14.png","element":"img","alt":" 1 − ε","inline":true},{"text":". Assume each repetition is successful, then after ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"steps our coverage is:","element":"span"}],[{"style":{"width":"17%"},"width":321,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-15.png","element":"img"}],[{"text":"Setting ","element":"span"},{"style":{"height":16},"width":278.23,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-16.png","element":"img","alt":" t to O(log(1/ε))","inline":true,"padRight":true},{"text":"is then sufficient to set the right hand side to ","element":"span"},{"style":{"height":10.4},"width":86.56,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-17.png","element":"img","alt":" 1 − ε","inline":true},{"text":". However, each repetition in this process degrades the overall probability of usefulness. In order to get an overall guarantee of ","element":"span"},{"style":{"height":13.99},"width":36.71,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-18.png","element":"img","alt":" δu","inline":true},{"text":", we must adjust our initial ","element":"span"},{"style":{"height":15.54},"width":104.72,"height":38.85,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-19.png","element":"img","alt":" δwu to:","inline":true}],[{"style":{"width":"18%"},"width":344,"height":120,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-20.png","element":"img"}],[{"text":"Similarly, since we apply the weak learner ","element":"span"},{"style":{"height":16},"width":379.21,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-21.png","element":"img","alt":" O(log(1/ε) log(1/δwu ))","inline":true,"padRight":true},{"text":"times, we adjust our ","element":"span"},{"style":{"height":15.54},"width":93.8,"height":38.85,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-22.png","element":"img","alt":" δwr to","inline":true}],[{"style":{"width":"27%"},"width":511,"height":150,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-23.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Query Complexity: ","element":"span"},{"text":"In total, we run our weak learner at most ","element":"span"},{"style":{"height":28.8},"width":367.41,"height":72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-24.png","element":"img","alt":" O�log� 1ε�log�1δwu��","inline":true},{"text":"times, giving a query complexity of:","element":"span"}],[{"style":{"width":"52%"},"width":976,"height":320,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-25.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Sample Complexity: ","element":"span"},{"text":"At each step of our algorithm, we restrict to the distribution of un-inferred points through rejection sampling. By itself, this poses a problem: what if we have inferred much of the space early and our algorithm continually rejects points? To combat this, we note that we can estimate the measure of remaining un-inferred points by how many samples we have to draw before finding one. Formally, if at any step we draw ","element":"span"},{"style":{"height":16},"width":225.72,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-26.png","element":"img","alt":" 2 log(1/δu)/ε","inline":true,"padRight":true},{"text":"inferred points in a row, then by a Chernoff bound the coverage of our learner is ","element":"span"},{"style":{"height":10.4},"width":86.12,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-27.png","element":"img","alt":"1 − ε","inline":true,"padRight":true},{"text":"with probability at least ","element":"span"},{"style":{"height":13.99},"width":228.07,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/18-28.png","element":"img","alt":" 1 − δu. Let n","inline":true,"padRight":true},{"text":"be the sample size as defined in Lemma ","element":"span"},{"href":"#id-51","text":"2.7. ","element":"a"},{"text":"Since our algorithm","element":"span"}],[{"style":{"width":"100%"},"width":1874,"height":93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/19-0.png","element":"img"}],[{"text":"our algorithm stops after rejecting ","element":"span"},{"style":{"height":16},"width":239.88,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/19-1.png","element":"img","alt":" 2 log(N/δu)/ε","inline":true,"padRight":true},{"text":"points in a row. This means that we can bound the total number of samples drawn by","element":"span"}],[{"style":{"width":"30%"},"width":574,"height":90,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/19-2.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Time Complexity: ","element":"span"},{"text":"The time complexity of our algorithm has two main components: the complexity of finding an MLE order in the weak learner, and the complexity of rejection sampling. We already computed the time complexity of the weak learner in Lemma ","element":"span"},{"href":"#id-51","text":"2.7 ","element":"a"},{"text":"as ","element":"span"},{"style":{"height":25.12},"width":558.28,"height":62.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/19-3.png","element":"img","alt":" Twl(δr) = poly(k, log( 1δr ))˜O( 1λ5 )","inline":true},{"text":". Since,","element":"span"}],[{"text":"we run our weak learner at most ","element":"span"},{"style":{"height":28.8},"width":312.2,"height":72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/19-4.png","element":"img","alt":" O�log� 1ε�log�1δwu","inline":true}],[{"style":{"width":"31%"},"width":596,"height":59,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/19-5.png","element":"img"}],[{"text":"It remains to compute the time complexity of rejection sampling. Recall that the we sample at most ","element":"span"},{"style":{"height":16},"width":183.63,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/19-6.png","element":"img","alt":"n(ε, δr, δu)","inline":true,"padRight":true},{"text":"points total in our process. For each point, we run an LP in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"+ 1 ","element":"span"},{"text":"variables with constraints detailed by our previous queries that round. Since the queries our weak learner uses in each round only involve ","element":"span"},{"style":{"height":18.83},"width":88.26,"height":47.07,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/19-7.png","element":"img","alt":"˜O(n)","inline":true,"padRight":true},{"text":"points, the time complexity of sampling is at most:","element":"span"}],[{"style":{"width":"59%"},"width":1119,"height":208,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/19-8.png","element":"img"}],[{"text":"Since the total time complexity is order of the sum of sampling and sorting, we get an algorithm that runs in time ","element":"span"},{"style":{"height":25.12},"width":522.28,"height":62.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/19-9.png","element":"img","alt":" poly(d, k, 1δr , 1ε, log( 1δu ))˜O( 1λ5 ).","inline":true}],[{"style":{"fontWeight":"bold"},"text":"2.3 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Average Inference Dimension","element":"span"}],[{"text":"While inference dimension allows us to work over arbitrary continuous distributions, as a complexity parameter it is rather restricting, barring for instance the learning of linear separators in dimensions above two. To generalize to a broader range of classifiers, we will use the framework of average inference dimension introduced in [","element":"span"},{"href":"#id-5","referenceIndex":6,"text":"6","element":"a"},{"text":"]. In particular, we show that any hypothesis class and distribution with super-exponential average inference dimension may be efficiently learned under Massart noise. ","element":"span"},{"text":"As a result, we provide the first computationally and query efficient learner for non-homogeneous linear separators over s-concave distributions with Massart noise.","element":"span"}],[{"id":"id-35","style":{"fontWeight":"bold"},"text":"Theorem 2.8 ","element":"span"},{"text":"(Restatement of Theorem ","element":"span"},{"href":"#id-36","text":"1.6)","element":"a"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Consider any hypothesis class ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and corresponding class of distributions ","element":"span"},{"style":{"height":17.28},"width":216,"height":43.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/19-10.png","element":"img","alt":" A(X,H),a,f(d)","inline":true},{"style":{"fontStyle":"italic"},"text":". Then, ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is ARPU-learnable under model ","element":"span"},{"style":{"height":17.68},"width":365.83,"height":44.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/19-11.png","element":"img","alt":" (M(λ), A(X,H),a,f(d))","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"in time","element":"span"}],[{"style":{"height":25.12},"width":523.01,"height":62.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/19-12.png","element":"img","alt":"poly(f(d), 1δu , 1ε, log( 1δr ))˜O( 1λ5 )","inline":true},{"style":{"fontStyle":"italic"},"text":", uses only ","element":"span"},{"style":{"height":20.97},"width":649.44,"height":52.42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/19-13.png","element":"img","alt":" poly(f(d), 1λ, log( 1ε), log( 1δr ), log( 1δu )))","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"unlabeled samples, and has a ","element":"span"},{"style":{"fontStyle":"italic"},"text":"query complexity of","element":"span"}],[{"style":{"width":"47%"},"width":883,"height":100,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/19-14.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"for small enough ","element":"span"},{"style":{"height":13.99},"width":47.36,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/19-15.png","element":"img","alt":" δr.","inline":true}],[{"text":"Average inference dimension gives a high probability bound on the inference dimension of a finite sample. However, shifting our strategy to directly work with a finite samples introduces a new problem: since our algorithm corrects noise via extra helper points, we may not be able to learn the entire sample. Our first step will be to show that learning most of a finite sample in few queries with high probability is sufficient to learn the entire distribution.","element":"span"}],[{"id":"id-52","style":{"fontWeight":"bold"},"text":"Lemma 2.9. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be a hypothesis class, and ","element":"span"},{"style":{"height":13.19},"width":59.99,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/19-16.png","element":"img","alt":" DX","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"a distribution over ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"style":{"fontStyle":"italic"},"text":". Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be an active, inference based learner taking in finite samples ","element":"span"},{"style":{"height":15.19},"width":148.35,"height":37.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/19-17.png","element":"img","alt":" S ∼ DnX","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"with the property that for sufficiently large ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"style":{"fontStyle":"italic"},"text":"learns a ","element":"span"},{"style":{"height":15.6},"width":133.92,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-0.png","element":"img","alt":"(1 − ε1)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"fraction of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"style":{"fontStyle":"italic"},"text":"with probability ","element":"span"},{"style":{"height":11.6},"width":86.28,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-1.png","element":"img","alt":" 1 − δ","inline":true},{"style":{"fontStyle":"italic"},"text":", while querying at most an ","element":"span"},{"style":{"height":9.59},"width":34.58,"height":23.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-2.png","element":"img","alt":" ε2","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"fraction of the points. The expected coverage of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"style":{"fontStyle":"italic"},"text":"over the entirety of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is at least:","element":"span"}],[{"style":{"width":"32%"},"width":605,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-3.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"To find the expected coverage of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"over the entire distribution ","element":"span"},{"style":{"height":13.19},"width":59.99,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-4.png","element":"img","alt":" DX","inline":true,"padRight":true},{"text":"based on samples ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"of size ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":", we look at the probability that an additional randomly drawn point is inferred:","element":"span"}],[{"style":{"width":"58%"},"width":1089,"height":58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-5.png","element":"img"}],[{"text":"We can bound the right hand term by looking at ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"applied to samples ","element":"span"},{"style":{"height":10.8},"width":40.73,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-6.png","element":"img","alt":" S′","inline":true,"padRight":true},{"text":"of size ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"+ 1","element":"span"},{"text":". In particular, if ","element":"span"},{"style":{"height":10.79},"width":82.94,"height":26.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-7.png","element":"img","alt":"xn+1","inline":true,"padRight":true},{"text":"is learned but not queried by ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":", then because ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"is an inference based learner, it must be the case that ","element":"span"},{"style":{"height":16},"width":195.89,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-8.png","element":"img","alt":"{x1, . . . xn}","inline":true,"padRight":true},{"text":"infer ","element":"span"},{"style":{"height":10.79},"width":82.94,"height":26.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-9.png","element":"img","alt":" xn+1","inline":true},{"text":". Since the points of ","element":"span"},{"style":{"height":10.8},"width":40.73,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-10.png","element":"img","alt":" S′","inline":true,"padRight":true},{"text":"are drawn i.i.d from ","element":"span"},{"style":{"height":13.19},"width":60,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-11.png","element":"img","alt":" DX","inline":true},{"text":", the probability that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"queries or learns any given point ","element":"span"},{"style":{"height":9.19},"width":33.78,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-12.png","element":"img","alt":" xi","inline":true,"padRight":true},{"text":"is the same for all ","element":"span"},{"style":{"height":13.2},"width":234.2,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-13.png","element":"img","alt":" 1 ≤ i ≤ n + 1","inline":true},{"text":". Because a ","element":"span"},{"style":{"height":13.19},"width":103.71,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-14.png","element":"img","alt":" 1 − ε1","inline":true,"padRight":true},{"text":"fraction of points are learned with probability ","element":"span"},{"style":{"height":11.6},"width":82.76,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-15.png","element":"img","alt":" 1 − δ","inline":true,"padRight":true},{"text":"and only an ","element":"span"},{"style":{"height":9.59},"width":34.58,"height":23.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-16.png","element":"img","alt":" ε2","inline":true,"padRight":true},{"text":"fraction of points are queried, the probability that a point is learned but not queried is at least ","element":"span"},{"style":{"height":13.99},"width":255.92,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-17.png","element":"img","alt":" 1 − δ − ε1 − ε2","inline":true,"padRight":true},{"text":"by a union bound, which gives the desired bound on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":"’s coverage.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-35","style":{"fontStyle":"italic"},"text":"2.8. ","element":"a"},{"text":"We will argue that the learner presented in Theorem ","element":"span"},{"href":"#id-50","text":"2.3 ","element":"a"},{"text":"satisfies the properties of Lemma ","element":"span"},{"href":"#id-52","text":"2.9 ","element":"a"},{"text":"for a large enough sample size. To prove this, we first examine learning a specific sample with small inference dimension. The coverage over all samples will then follow from the fact that almost all samples have small inference dimension due by Observation ","element":"span"},{"href":"#id-53","text":"1.14 ","element":"a"},{"text":"[","element":"span"},{"href":"#id-5","referenceIndex":6,"text":"6","element":"a"},{"text":"] and our assumption on average inference dimension.","element":"span"}],[{"text":"Because we are considering a fixed sample ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":", the weak learner draws uniformly without replacement from ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"(denoted ","element":"span"},{"style":{"height":10.8},"width":108.76,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-18.png","element":"img","alt":" x ∼ S","inline":true},{"text":") rather than from the distribution itself. All required symmetry arguments still hold in this regime, as the order that points are pulled is still uniformly random. The expected coverage of our learner over ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"is thus the same as for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"in Lemma ","element":"span"},{"href":"#id-51","text":"2.7 ","element":"a"},{"text":"adjusted for the fact that we sample without replacement:","element":"span"}],[{"style":{"width":"78%"},"width":1474,"height":154,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-19.png","element":"img"}],[{"text":"and hence","element":"span"}],[{"style":{"width":"61%"},"width":1154,"height":95,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-20.png","element":"img"}],[{"text":"Assume for now that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"style":{"fontStyle":"italic"},"text":"| ","element":"span"},{"text":"is large enough that the subtracted term is negligible. To analyze the remaining coverage probability, assume that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"satisfies the constraints of Lemma ","element":"span"},{"href":"#id-51","text":"2.7 ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":19.53},"width":455.65,"height":48.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-21.png","element":"img","alt":" k = ˜Θ(f(d)1/a log1/a(|S|))","inline":true},{"text":", and further that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"has inference dimension ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":". Then by the arguments in Lemma ","element":"span"},{"href":"#id-51","text":"2.7, ","element":"a"},{"text":"this probability over the sample itself and noisy oracles is constant. Further, as long as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"is sufficiently large, we can get coverage ","element":"span"},{"style":{"height":10.8},"width":88.78,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-22.png","element":"img","alt":"1 − ε","inline":true,"padRight":true},{"text":"with probability ","element":"span"},{"style":{"height":10.8},"width":88.78,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-23.png","element":"img","alt":" 1 − ε","inline":true,"padRight":true},{"text":"by applying the same argument restricted to the subset of un-inferred points ","element":"span"},{"style":{"height":20.23},"width":213.69,"height":50.56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-24.png","element":"img","alt":"O�log2 � 1ε��","inline":true},{"text":"times. This argument only fails when there are no longer ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"remaining points for our weak learner to use, but as long as ","element":"span"},{"style":{"height":17.5},"width":191.88,"height":43.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-25.png","element":"img","alt":" |S| = ω( nε )","inline":true},{"text":", this will not affect our coverage. Since Lemma ","element":"span"},{"href":"#id-52","text":"2.9 ","element":"a"},{"text":"also only allows the ","element":"span"},{"text":"learner to query a ","element":"span"},{"style":{"height":7.2},"width":19,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-26.png","element":"img","alt":" ε","inline":true,"padRight":true},{"text":"fraction of points, we set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"to:","element":"span"}],[{"style":{"width":"37%"},"width":704,"height":237,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-27.png","element":"img"}],[{"text":"which also validates our assumption that ","element":"span"},{"style":{"height":24.43},"width":154.27,"height":61.07,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-28.png","element":"img","alt":"Oλ(log(n))|S|","inline":true,"padRight":true},{"text":"is negligible (we lose less than ","element":"span"},{"style":{"height":7.2},"width":19,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-29.png","element":"img","alt":" ε","inline":true,"padRight":true},{"text":"over all iterations). To apply Lemma ","element":"span"},{"href":"#id-52","text":"2.9, ","element":"a"},{"text":"it is sufficient to have a learner ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"such that:","element":"span"}],[{"style":{"width":"34%"},"width":640,"height":57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/20-30.png","element":"img"}],[{"text":"Because ","element":"span"},{"style":{"height":19.37},"width":204.24,"height":48.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/21-0.png","element":"img","alt":" |S| > Ω� 1ε�","inline":true},{"text":", S has inference dimension ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"with probability at least ","element":"span"},{"style":{"height":10.8},"width":89.84,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/21-1.png","element":"img","alt":" 1 − ε","inline":true,"padRight":true},{"text":"by Observation ","element":"span"},{"href":"#id-53","text":"1.14 ","element":"a"},{"text":"[","element":"span"},{"href":"#id-5","referenceIndex":6,"text":"6","element":"a"},{"text":"]. Combining this with the fact that our algorithm has a ","element":"span"},{"style":{"height":10.8},"width":87.72,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/21-2.png","element":"img","alt":" 1 − ε","inline":true,"padRight":true},{"text":"probability of achieving ","element":"span"},{"style":{"height":10.8},"width":107.84,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/21-3.png","element":"img","alt":" 1 − 2ε","inline":true,"padRight":true},{"text":"coverage when the inference dimension is ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"proves this claim.","element":"span"}],[{"text":"Finally, by Lemma ","element":"span"},{"href":"#id-52","text":"2.9, ","element":"a"},{"text":"our learner has expected coverage is ","element":"span"},{"style":{"height":13.6},"width":167.88,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/21-4.png","element":"img","alt":" ≥ 1 − 5ε","inline":true,"padRight":true},{"text":"over the entire space. ","element":"span"},{"text":"To get the desired coverage probability, we run the algorithm over ","element":"span"},{"style":{"height":16},"width":224.46,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/21-5.png","element":"img","alt":" O(log(1/δu))","inline":true,"padRight":true},{"text":"samples, setting ","element":"span"},{"style":{"height":16},"width":319.79,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/21-6.png","element":"img","alt":" δr to δr/ log(1/δu)","inline":true,"padRight":true},{"text":"to amend the degradation of correctness over repetition. Then by the same argument as Theorem ","element":"span"},{"href":"#id-50","text":"2.3, ","element":"a"},{"text":"our query complexity is:","element":"span"}],[{"style":{"width":"56%"},"width":1054,"height":96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/21-7.png","element":"img"}],[{"text":"Sample and time complexity follow similarly to Theorem ","element":"span"},{"href":"#id-50","text":"2.3.","element":"a"}]]},{"heading":"3 Generalized Tsybakov Noise Condition","paragraphs":[[{"text":"The Massart noise model does well to capture situations with adversarial bounded noise, but even in a realistic non-adversarial scenario, error may not be bounded away from ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"2","element":"span"},{"text":". One might think, for instance, that label noise should be bounded as a function of the distance to the Bayes optimal classifier, reaching purely random labels on the decision boundary itself. Likewise, comparisons between arbitrarily close points should be difficult, with error approaching ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"2 ","element":"span"},{"text":"as well. This motivates us to study the Tsybakov Low Noise condition, a popular instantiation of distance-based noise. However, learning in this unbounded regime is harder, as evidenced by polynomial query lower bounds [","element":"span"},{"href":"#id-12","referenceIndex":13,"text":"13","element":"a"},{"text":", ","element":"span"},{"href":"#id-14","referenceIndex":15,"text":"15","element":"a"},{"text":"], and the lack of computationally efficient algorithms for the model. In order to ARPU-learn in this regime, we need to introduce more stringent restrictions than for Massart noise. First, instead of allowing any set system with finite inference dimension, we will only consider non-homogeneous linear separators. Second, we will either assume some margin ","element":"span"},{"style":{"height":10.4},"width":22,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/21-8.png","element":"img","alt":" γ","inline":true},{"text":", or that the distribution satisfies certain weak concentration and anti-concentration bounds, a property which implies our earlier assumption for Massart noise of super-exponential average inference dimension.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"3.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Finite Inference Dimension and Margin","element":"span"}],[{"text":"In this section, we will consider ARPU-learning hyperplanes over any continuous distribution with finite inference dimension and margin. Note that in the GTNC model, introducing margin bounds the error on label queries away from ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"2","element":"span"},{"text":". Thus our results should informally be viewed as saying the following: comparison queries with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"unbounded error ","element":"span"},{"text":"exponentially improve query complexity over label queries with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"bounded error ","element":"span"},{"text":"in the ARPU-learning model. Indeed, although we have picked a specific model of bounded label error in this case, trading for another model such as Massart noise on labels causes no significant change to our upper or lower bound.","element":"span"}],[{"text":"As in the case of Massart noise, we will first show the gap in query complexity between label only and comparison ARPU-learning. Our previous method showed an infinite gap between the two regimes, but the assumption of a non-zero margin requires a different argument. In this case, we will show a family of examples in which comparisons provide an exponential improvement.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Lemma 3.1 ","element":"span"},{"text":"(Restatement of Lemma ","element":"span"},{"href":"#id-54","text":"1.9)","element":"a"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"height":14.18},"width":130.63,"height":35.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/21-9.png","element":"img","alt":" X ∈ Rd","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"style":{"fontStyle":"italic"},"text":"-dimensional hypercube ","element":"span"},{"style":{"height":17.38},"width":114.88,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/21-10.png","element":"img","alt":" {0, 1}d","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"modified to have a ball of radius ","element":"span"},{"style":{"height":22.73},"width":59.07,"height":56.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/21-11.png","element":"img","alt":"14√d","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"centered about each point. The query complexity of ARPU-learning ","element":"span"},{"style":{"height":22.44},"width":204.34,"height":56.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/21-12.png","element":"img","alt":" (X, Hd, 14√d )","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"under model ","element":"span"},{"style":{"height":22.73},"width":442.4,"height":56.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/21-13.png","element":"img","alt":" (GTNC(gL, gU, 14√d), CX)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is at least:","element":"span"}],[{"style":{"width":"21%"},"width":412,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/21-14.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"For simplicity, the adversary will pick the uniform distribution from ","element":"span"},{"style":{"height":13.19},"width":47.98,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/21-15.png","element":"img","alt":" CX","inline":true},{"text":", and the noiseless case from ","element":"span"},{"style":{"height":22.73},"width":441.9,"height":56.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/21-16.png","element":"img","alt":"(GTNC(gL, gU, 14√d), CX)","inline":true},{"text":". Further, by Yao’s minimax principle it is sufficient to show there is a distribution ","element":"span"},{"text":"over hyperplanes in ","element":"span"},{"style":{"height":21.25},"width":114.39,"height":53.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/21-17.png","element":"img","alt":" Hd, 14√d","inline":true,"padRight":true},{"text":"for which no learner can achieve at least ","element":"span"},{"text":"3","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"4 ","element":"span"},{"text":"coverage with perfect correctness","element":"span"}],[{"text":"with greater than ","element":"span"},{"text":"3","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"4 ","element":"span"},{"text":"probability. Let the adversary pick the uniform distribution over the ","element":"span"},{"style":{"height":13.39},"width":37.09,"height":33.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/22-0.png","element":"img","alt":" 2d","inline":true,"padRight":true},{"text":"hyperplanes which truncate corners of the hypercube, e.g.","element":"span"}],[{"style":{"width":"11%"},"width":219,"height":118,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/22-1.png","element":"img"}],[{"text":"Note that these hyperplanes have margin ","element":"span"},{"style":{"height":22.73},"width":59.07,"height":56.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/22-2.png","element":"img","alt":"14√d","inline":true},{"text":", so they lie in ","element":"span"},{"style":{"height":21.24},"width":114.39,"height":53.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/22-3.png","element":"img","alt":" Hd, 14√d","inline":true,"padRight":true},{"text":", and that each one may be seen as ","element":"span"},{"text":"selecting a single ball to be negative. Given any set strategy, the learner can only query points in ","element":"span"},{"style":{"height":13.38},"width":77.56,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/22-4.png","element":"img","alt":" 2d−1","inline":true,"padRight":true},{"text":"out of ","element":"span"},{"style":{"height":13.39},"width":37.32,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/22-5.png","element":"img","alt":" 2d","inline":true,"padRight":true},{"text":"balls. The probability that one of the balls the learner queries is the negative ball is at most ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"2","element":"span"},{"text":". If the learner does not locate the negative ball, to have coverage ","element":"span"},{"text":"3","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"4 ","element":"span"},{"text":"it must label half of the remaining space with no additional queries. However, any set strategy from the learner in this case will have an incorrect label with probability at least ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"2 ","element":"span"},{"text":"since the negative ball is uniformly distributed over the remaining balls. Thus any learner that has ","element":"span"},{"text":"3","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"4 ","element":"span"},{"text":"coverage with probability more than ","element":"span"},{"text":"3","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"4 ","element":"span"},{"text":"must incorrectly label some point, violating the conditions of ARPU-learning.","element":"span"}],[{"text":"By an argument based on minimal-ratio (margin normalized by the maximum function value) from [","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"4","element":"a"},{"text":"], the inference dimension of the above hypothesis class is ","element":"span"},{"style":{"height":18.83},"width":85.04,"height":47.07,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/22-6.png","element":"img","alt":"˜O(d)","inline":true},{"text":". We will prove that this implies a comparison based algorithm that only makes ","element":"span"},{"text":"poly(","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":") ","element":"span"},{"text":"queries.","element":"span"}],[{"id":"id-68","style":{"fontWeight":"bold"},"text":"Theorem 3.2 ","element":"span"},{"text":"(Restatement of Theorem ","element":"span"},{"href":"#id-39","text":"1.8)","element":"a"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"height":15.78},"width":135.08,"height":39.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/22-7.png","element":"img","alt":" X ⊆ Rd","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"style":{"height":16.79},"width":162.91,"height":41.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/22-8.png","element":"img","alt":" (X, Hd,γ)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"have inference dimension ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"style":{"fontStyle":"italic"},"text":"with respect to comparison queries. Then, ","element":"span"},{"style":{"height":16.39},"width":161.48,"height":40.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/22-9.png","element":"img","alt":" (X, Hd,γ)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is ARPU-learnable under model ","element":"span"},{"style":{"height":15.6},"width":495.32,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/22-10.png","element":"img","alt":" (GTNC(gL, gU, ε0), CX) with","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"query complexity:","element":"span"}],[{"style":{"width":"66%"},"width":1248,"height":170,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/22-11.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Where","element":"span"}],[{"style":{"width":"38%"},"width":727,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/22-12.png","element":"img"}],[{"text":"See Algorithm ","element":"span"},{"href":"#id-55","text":"2. ","element":"a"},{"text":"Unlike the Massart case, we can no longer directly rely on the sorting algorithm of [","element":"span"},{"href":"#id-18","referenceIndex":19,"text":"19","element":"a"},{"text":"], as the point-wise movement guarantees rely on bounded noise. Instead, we rely on the fact that we can, with high probability, check the level of noise of a drawn sample. If the sample is not too noisy, we can modify the bounds of [","element":"span"},{"href":"#id-18","referenceIndex":19,"text":"19","element":"a"},{"text":"] and apply the same technique. On the other hand, if the sample is very noisy, we use this to infer structural information about the sample and thus learn some fraction of the instance space. Informally, our algorithm follows a similar three step process to the Massart case:","element":"span"}],[{"style":{"width":"51%"},"width":970,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/22-13.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Step 2a (high noise): ","element":"span"},{"text":"If ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"measures as noisy, we identity a subset ","element":"span"},{"style":{"height":11.6},"width":117.1,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/22-14.png","element":"img","alt":" S′ ⊂ S","inline":true,"padRight":true},{"text":"of points which are close with respect to the underlying hypothesis. Using additional randomly drawn points, we create an inference LP based on the structure of ","element":"span"},{"style":{"height":10.8},"width":40.73,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/22-15.png","element":"img","alt":" S′ ","inline":true,"padRight":true},{"text":"to learn a fraction of the instance space.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Step 2b (low noise): ","element":"span"},{"text":"If ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"measures as having only a small amount of noise, sort ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"into an MLE order, and apply the same learning strategy as for Massart.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Step 3: ","element":"span"},{"text":"Restrict ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D ","element":"span"},{"text":"(by rejection sampling) to points un-inferred by the LP in step 2a/b, and repeat steps 1 and 2a/b until coverage has reached ","element":"span"},{"style":{"height":10.8},"width":98.22,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/22-16.png","element":"img","alt":" 1 − ε.","inline":true}],[{"text":"At the core of this technique is the ability to detect subsets with high levels of noise, and to certify that they are highly structured. With this in mind, we show that if comparisons on a subset of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"look sufficiently random, then almost all points in this subset are clustered together in function value. Formally, we define a cluster as:","element":"span"}],[{"id":"id-55","style":{"width":"103%"},"width":1932,"height":1756,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/23-0.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Definition 3.3 ","element":"span"},{"text":"(Cluster)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be a set system. Given ","element":"span"},{"style":{"height":12},"width":105.7,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/23-1.png","element":"img","alt":" h ∈ H","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and a sample ","element":"span"},{"style":{"height":14.4},"width":431.98,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/23-2.png","element":"img","alt":" S ⊆ X, S is an ε-cluster","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"with respect to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h ","element":"span"},{"style":{"fontStyle":"italic"},"text":"if","element":"span"}],[{"style":{"width":"27%"},"width":521,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/23-3.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"We will often omit “with respect to ","element":"span"},{"style":{"height":11.6},"width":179.47,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/23-4.png","element":"img","alt":" h” when h","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is the function underlying the Bayes optimal classifier.","element":"span"}],[{"text":"We will detect clusters by a measure of randomness we term equitibility, the condition that every element is bigger than about half of the other elements.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Definition 3.4 ","element":"span"},{"text":"(Equitability)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be a set with comparisons denoted by ","element":"span"},{"style":{"height":9.6},"width":31,"height":24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/23-5.png","element":"img","alt":"�<","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"on each pair of elements. For an element ","element":"span"},{"style":{"height":16},"width":255.7,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/23-6.png","element":"img","alt":" x ∈ S, let v(x)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"denote the number of elements ","element":"span"},{"style":{"height":14.4},"width":590.61,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/23-7.png","element":"img","alt":" y ∈ S such that y� 2g−1L (ε)","inline":true},{"text":", and further that the middle point must be at least ","element":"span"},{"style":{"height":19.1},"width":113.61,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/25-7.png","element":"img","alt":" g−1L (ε)","inline":true,"padRight":true},{"text":"far ","element":"span"},{"text":"from either ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"text":", or ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j","element":"span"},{"text":". Since our argument will be symmetric, assume this to be ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"without loss of generality. Our strategy will be to bound the random variable","element":"span"}],[{"style":{"width":"20%"},"width":385,"height":250,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/25-8.png","element":"img"}],[{"text":"and use an averaging argument to show that there exists a value ","element":"span"},{"style":{"height":16},"width":636.66,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/25-9.png","element":"img","alt":" 1 ≤ x ≤ c s.t. v(x) < |S′|(1/2 − ε/4)","inline":true,"padRight":true},{"text":"We can decompose ","element":"span"},{"style":{"fontStyle":"italic"},"text":"V ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"c","element":"span"},{"text":") ","element":"span"},{"text":"into","element":"span"}],[{"style":{"width":"36%"},"width":679,"height":143,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/25-10.png","element":"img"}],[{"text":"where the first term is always","element":"span"},{"style":{"height":19.21},"width":52.15,"height":48.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/25-11.png","element":"img","alt":"�c2�","inline":true},{"text":". Because each point left of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"is at least ","element":"span"},{"style":{"height":19.1},"width":199.9,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/25-12.png","element":"img","alt":" g−1L (ε) ≤ ε0","inline":true,"padRight":true},{"text":"far away from the right ","element":"span"},{"text":"half of ","element":"span"},{"style":{"height":16},"width":59.98,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/25-13.png","element":"img","alt":" |S′|","inline":true},{"text":", we can bound the second term as for any ","element":"span"},{"style":{"fontStyle":"italic"},"text":"v","element":"span"},{"text":",","element":"span"}],[{"style":{"width":"80%"},"width":1501,"height":280,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/25-14.png","element":"img"}],[{"text":"A Chernoff bound gives","element":"span"}],[{"style":{"width":"38%"},"width":718,"height":56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/25-15.png","element":"img"}],[{"text":"Then an averaging argument shows that","element":"span"}],[{"style":{"width":"87%"},"width":1634,"height":137,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/25-16.png","element":"img"}],[{"text":"We are not quite done with our cluster detection algorithm, as our goal will be to test for clusters sublinear in the size of our main sample ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":". Lemma ","element":"span"},{"href":"#id-57","text":"3.5 ","element":"a"},{"text":"is enough to show that if such a cluster exists some subset will measure as equitable, but we need to prove that any equitable subset of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"contains a cluster. For large enough ","element":"span"},{"style":{"fontStyle":"italic"},"text":"c","element":"span"},{"text":", this is true with high probability.","element":"span"}],[{"id":"id-58","style":{"fontWeight":"bold"},"text":"Corollary 3.6. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be a sample of size ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":", and ","element":"span"},{"style":{"height":21.63},"width":168.14,"height":54.07,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-0.png","element":"img","alt":" ε ≤ gL(ε0)4","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":". For all subsets ","element":"span"},{"style":{"height":13.2},"width":117.05,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-1.png","element":"img","alt":" S′ ⊆ S","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"of size ","element":"span"},{"style":{"height":16},"width":266.95,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-2.png","element":"img","alt":" |S′| = (2c + m)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"satisfying:","element":"span"}],[{"style":{"width":"22%"},"width":417,"height":86,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-3.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"the following guarantees hold:","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"1. If ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"style":{"fontStyle":"italic"},"text":"contains a ","element":"span"},{"style":{"height":19.1},"width":153.86,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-4.png","element":"img","alt":" g−1U (ε/2)","inline":true},{"style":{"fontStyle":"italic"},"text":"-cluster of size ","element":"span"},{"text":"2","element":"span"},{"style":{"fontStyle":"italic"},"text":"c ","element":"span"},{"text":"+ ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"style":{"fontStyle":"italic"},"text":", then at least one ","element":"span"},{"style":{"height":10.8},"width":40.73,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-5.png","element":"img","alt":" S′","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is ","element":"span"},{"style":{"height":7.2},"width":19,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-6.png","element":"img","alt":" ε","inline":true},{"style":{"fontStyle":"italic"},"text":"-equitable with probability at ","element":"span"},{"style":{"fontStyle":"italic"},"text":"least ","element":"span"},{"style":{"height":11.6},"width":99.85,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-7.png","element":"img","alt":" 1 − δ.","inline":true}],[{"style":{"fontStyle":"italic"},"text":"2. For all ","element":"span"},{"style":{"height":14},"width":282.72,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-8.png","element":"img","alt":" ε-equitable S′, C","inline":true},{"style":{"fontStyle":"italic"},"text":", the middle ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m ","element":"span"},{"style":{"fontStyle":"italic"},"text":"elements of ","element":"span"},{"style":{"height":10.8},"width":40.73,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-9.png","element":"img","alt":" S′ ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"with respect to the true order, is a ","element":"span"},{"style":{"height":19.11},"width":281.05,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-10.png","element":"img","alt":" 2g−1L (4ε)-cluster","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"with probability at least ","element":"span"},{"style":{"height":11.6},"width":99.85,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-11.png","element":"img","alt":" 1 − δ.","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Both statements follow from applying Lemma ","element":"span"},{"href":"#id-57","text":"3.5 ","element":"a"},{"text":"to subsets ","element":"span"},{"style":{"height":12.4},"width":383.78,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-12.png","element":"img","alt":" S′ ⊂ S of size 2c + m.","inline":true}],[{"text":"Proof of (1). By assumption, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"contains at least one subset ","element":"span"},{"style":{"height":10.8},"width":40.74,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-13.png","element":"img","alt":" S′","inline":true,"padRight":true},{"text":"of size ","element":"span"},{"text":"2","element":"span"},{"style":{"fontStyle":"italic"},"text":"c ","element":"span"},{"text":"+ ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m ","element":"span"},{"text":"which is a cluster. Applying statement (1) of Lemma ","element":"span"},{"href":"#id-57","text":"3.5 ","element":"a"},{"text":"to ","element":"span"},{"style":{"height":14.4},"width":275.86,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-14.png","element":"img","alt":" S′ gives that S′ ","inline":true,"padRight":true},{"text":"is equitable with probability at least ","element":"span"},{"style":{"height":11.6},"width":98.85,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-15.png","element":"img","alt":" 1 − δ.","inline":true}],[{"text":"Proof of (2). We prove statement (2) by the contrapositive: with probability ","element":"span"},{"style":{"height":11.6},"width":87.66,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-16.png","element":"img","alt":" 1 − δ","inline":true},{"text":", all subsets ","element":"span"},{"style":{"height":10.8},"width":40.73,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-17.png","element":"img","alt":" S′","inline":true,"padRight":true},{"text":"such that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"is not a ","element":"span"},{"style":{"height":19.1},"width":154.26,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-18.png","element":"img","alt":" 2g−1L (4ε)","inline":true},{"text":"-cluster are not equitable. This follows from statement (2) of Lemma ","element":"span"},{"href":"#id-57","text":"3.5 ","element":"a"},{"text":"and union ","element":"span"},{"text":"bounding over all","element":"span"},{"style":{"height":22.01},"width":87.07,"height":55.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-19.png","element":"img","alt":"� n|S′|�","inline":true},{"text":"possible subsets.","element":"span"}],[{"text":"We can now explain step 1 of our algorithm, cluster detection, in a bit more detail.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Step 1: ","element":"span"},{"text":"Draw a sample ","element":"span"},{"style":{"height":15.59},"width":443,"height":38.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-20.png","element":"img","alt":" S ∼ DnX, and set c and m","inline":true,"padRight":true},{"text":"corresponding to the desired cluster sizes for testing. For ","element":"span"},{"text":"every subset ","element":"span"},{"style":{"height":11.6},"width":117.04,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-21.png","element":"img","alt":" S′ ⊂ S","inline":true,"padRight":true},{"text":"of size ","element":"span"},{"text":"2","element":"span"},{"style":{"fontStyle":"italic"},"text":"c ","element":"span"},{"text":"+ ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":", check whether ","element":"span"},{"style":{"height":10.8},"width":40.73,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-22.png","element":"img","alt":" S′","inline":true,"padRight":true},{"text":"is ","element":"span"},{"style":{"height":7.2},"width":19,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-23.png","element":"img","alt":" ε","inline":true},{"text":"-equitable. By the contrapositive of Corollary ","element":"span"},{"href":"#id-58","text":"3.6 ","element":"a"},{"text":"(1), if no such ","element":"span"},{"style":{"height":10.8},"width":40.73,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-24.png","element":"img","alt":" S′","inline":true,"padRight":true},{"text":"is ","element":"span"},{"style":{"height":7.2},"width":19,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-25.png","element":"img","alt":" ε","inline":true},{"text":"-equitable, then no ","element":"span"},{"style":{"height":19.1},"width":153.4,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-26.png","element":"img","alt":" g−1U (ε/2)","inline":true},{"text":"-cluster exists in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":". Similarly, by Corollary ","element":"span"},{"href":"#id-58","text":"3.6 ","element":"a"},{"text":"(2) if ","element":"span"},{"style":{"height":10.8},"width":40.73,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-27.png","element":"img","alt":" S′","inline":true,"padRight":true},{"text":"is ","element":"span"},{"style":{"height":7.2},"width":19,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-28.png","element":"img","alt":"ε","inline":true},{"text":"-equitable, then it contains a ","element":"span"},{"style":{"height":19.1},"width":506.1,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-29.png","element":"img","alt":" 2g−1L (4ε)-cluster C of size m.","inline":true}],[{"text":"With step 1 out of the way, we will prove that steps 2a and 2b provide reliable learners with good coverage as long as the cluster assumption from step 1 holds. Since our focus has been on clusters, we will begin by showing how to build the learner for step 2a. Recall that to apply the symmetry argument of [","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"4","element":"a"},{"text":"] for Massart noise, we had to slot a set of extra points. We will adhere to a similar strategy for step 2a in which we slot an extra set of points and find a cluster there rather than in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"itself. To find this cluster, our first goal will be to prove that additionally drawn points measure as equitable with ","element":"span"},{"style":{"height":10.8},"width":40.73,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-30.png","element":"img","alt":" S′ ","inline":true,"padRight":true},{"text":"if and only if they are in the same cluster as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C","element":"span"},{"text":".","element":"span"}],[{"id":"id-62","style":{"fontWeight":"bold"},"text":"Lemma 3.7. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be a ","element":"span"},{"style":{"height":7.2},"width":19,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-31.png","element":"img","alt":" ε","inline":true},{"style":{"fontStyle":"italic"},"text":"-equitable set of size ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m ","element":"span"},{"text":"+ 2","element":"span"},{"style":{"fontStyle":"italic"},"text":"c ","element":"span"},{"style":{"fontStyle":"italic"},"text":"satisfying the conditions of Corollary ","element":"span"},{"href":"#id-58","style":{"fontStyle":"italic"},"text":"3.6. ","element":"a"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be the subset of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"style":{"fontStyle":"italic"},"text":"which is the ","element":"span"},{"style":{"height":19.1},"width":153.6,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-32.png","element":"img","alt":" 2g−1L (4ε)","inline":true},{"style":{"fontStyle":"italic"},"text":"-cluster specified in Corollary ","element":"span"},{"href":"#id-58","style":{"fontStyle":"italic"},"text":"3.6, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and let ","element":"span"},{"style":{"height":10.8},"width":40.73,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-33.png","element":"img","alt":" S′","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a set of independently ","element":"span"},{"style":{"fontStyle":"italic"},"text":"drawn points. Further, choose ","element":"span"},{"style":{"height":10.4},"width":71.3,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-34.png","element":"img","alt":" ε, m","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"to satisfy:","element":"span"}],[{"style":{"width":"26%"},"width":503,"height":454,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-35.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"The following guarantees hold ","element":"span"},{"style":{"height":11.6},"width":134.36,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-36.png","element":"img","alt":" ∀x ∈ S′ ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"with probability at least ","element":"span"},{"style":{"height":11.6},"width":99.85,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-37.png","element":"img","alt":" 1 − δ.","inline":true}],[{"style":{"width":"76%"},"width":1424,"height":128,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/26-38.png","element":"img"}],[{"style":{"width":"99%"},"width":1870,"height":157,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-0.png","element":"img"}],[{"text":"Then GTNC allows us to bound ","element":"span"},{"style":{"height":16},"width":380.64,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-1.png","element":"img","alt":" ηC(x, y) for all y ∈ C:","inline":true}],[{"style":{"width":"36%"},"width":690,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-2.png","element":"img"}],[{"text":"To show that ","element":"span"},{"style":{"height":16},"width":363.72,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-3.png","element":"img","alt":" v(x) ≤ |S|(1/2 + λ1)","inline":true},{"text":", we assume the worst case – that all elements of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"are smaller than ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":". Since ","element":"span"},{"style":{"height":16},"width":138.31,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-4.png","element":"img","alt":" C ∪ {x}","inline":true,"padRight":true},{"text":"is a cluster, we can bound ","element":"span"},{"style":{"fontStyle":"italic"},"text":"v","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":") ","element":"span"},{"text":"by the following Binomial:","element":"span"}],[{"style":{"width":"32%"},"width":610,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-5.png","element":"img"}],[{"text":"The probability that ","element":"span"},{"style":{"height":16},"width":357.63,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-6.png","element":"img","alt":" v(x) > |S|(1/2 + λ1)","inline":true,"padRight":true},{"text":"is then given by a Chernoff bound as","element":"span"}],[{"style":{"width":"32%"},"width":601,"height":66,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-7.png","element":"img"}],[{"text":"We can bound the probability that ","element":"span"},{"style":{"height":16},"width":356.78,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-8.png","element":"img","alt":" v(x) < |S|(1/2 − λ1)","inline":true,"padRight":true},{"text":"by rehashing the same argument for ","element":"span"},{"style":{"height":16},"width":251.26,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-9.png","element":"img","alt":" |S| − v(x), the","inline":true,"padRight":true},{"text":"number of elements ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"is less than. Thus the probability that ","element":"span"},{"style":{"height":16},"width":301.65,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-10.png","element":"img","alt":" C ∪ {x} is not λ1","inline":true},{"text":"-equitable is","element":"span"}],[{"style":{"width":"39%"},"width":739,"height":66,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-11.png","element":"img"}],[{"text":"Union bounding over ","element":"span"},{"style":{"height":10.8},"width":40.74,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-12.png","element":"img","alt":" S′ ","inline":true,"padRight":true},{"text":"completes the proof.","element":"span"}],[{"text":"Proof of (2). Similar to the proof of statement (2) of Corollary ","element":"span"},{"href":"#id-58","text":"3.6, ","element":"a"},{"text":"we prove the contrapositive: that all ","element":"span"},{"style":{"height":16},"width":138.31,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-13.png","element":"img","alt":" C ∪ {x}","inline":true,"padRight":true},{"text":"which are not ","element":"span"},{"style":{"height":19.1},"width":357.82,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-14.png","element":"img","alt":" g−1L (2λ1) + 2g−1L (4ε)","inline":true},{"text":"-clusters are not ","element":"span"},{"style":{"height":13.19},"width":39.25,"height":32.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-15.png","element":"img","alt":" λ1","inline":true},{"text":"-equitable with high probability. Assume ","element":"span"},{"style":{"height":19.1},"width":652.73,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-16.png","element":"img","alt":"C ∪ {x} is not a g−1L (2λ1) + 2g−1L (4ε)","inline":true},{"text":"-cluster. Since ","element":"span"},{"style":{"height":19.1},"width":697.52,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-17.png","element":"img","alt":" C is a 2g−1L (4ε)-cluster, ∀y ∈ C we have","inline":true}],[{"style":{"width":"24%"},"width":461,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-18.png","element":"img"}],[{"text":"Since we have assumed ","element":"span"},{"style":{"height":19.1},"width":242.98,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-19.png","element":"img","alt":" g−1L (2λ1) < ε0","inline":true},{"text":", GTNC gives ","element":"span"},{"style":{"height":14},"width":134.15,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-20.png","element":"img","alt":" ∀y ∈ C:","inline":true}],[{"style":{"width":"20%"},"width":388,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-21.png","element":"img"}],[{"text":"Since ","element":"span"},{"style":{"height":16},"width":138.32,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-22.png","element":"img","alt":" C ∪ {x}","inline":true,"padRight":true},{"text":"is not a cluster, it must either be the case that ","element":"span"},{"style":{"height":14},"width":234.67,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-23.png","element":"img","alt":" ∀y ∈ C, x > y","inline":true},{"text":", or ","element":"span"},{"style":{"height":14},"width":234.67,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-24.png","element":"img","alt":" ∀y ∈ C, x < y","inline":true},{"text":". Assume the latter without loss of generality. We can bound ","element":"span"},{"style":{"fontStyle":"italic"},"text":"v","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":") ","element":"span"},{"text":"by a Binomial:","element":"span"}],[{"style":{"width":"99%"},"width":1871,"height":236,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-25.png","element":"img"}],[{"text":"Knowing that additionally drawn points which measure as equitable with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"come from a cluster, we can feed them into an inference LP based on this assumption. However, to infer remaining points in the instance space, the LP must also know the label of the cluster we feed in. Since we are assuming our points have some margin ","element":"span"},{"style":{"height":10.4},"width":22,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-26.png","element":"img","alt":" γ","inline":true},{"text":", we can solve for the label of the cluster with high probability by majority vote.","element":"span"}],[{"id":"id-63","style":{"fontWeight":"bold"},"text":"Lemma 3.8 ","element":"span"},{"text":"(Cluster Labeling)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Assume that a set ","element":"span"},{"style":{"height":24.43},"width":294.68,"height":61.07,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-27.png","element":"img","alt":" S, |S| ≥ 2 log(1/δ)gL(γ)2","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":", consists entirely of one label and has ","element":"span"},{"style":{"fontStyle":"italic"},"text":"margin ","element":"span"},{"style":{"height":10.4},"width":22,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-28.png","element":"img","alt":" γ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"with respect to the decision boundary. The probability that this true label differs from the majority label measured by the oracle ","element":"span"},{"style":{"height":14},"width":53.5,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-29.png","element":"img","alt":" QL","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is at most ","element":"span"},{"style":{"height":11.6},"width":31.22,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-30.png","element":"img","alt":" δ.","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"This follows from applying a Chernoff bound to the fact that each point has at least a ","element":"span"},{"style":{"height":16},"width":207.38,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-31.png","element":"img","alt":" 1/2 + gL(γ)","inline":true,"padRight":true},{"text":"probability of being correct.","element":"span"}],[{"text":"Finally, we need to show that the LP based upon the structure and label of clustered points has good coverage. We do this by an argument inspired by inference dimension: that given a ","element":"span"},{"style":{"height":16},"width":63.8,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-32.png","element":"img","alt":" γ/d","inline":true},{"text":"-cluster ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"of large enough size, there exists a point in ","element":"span"},{"style":{"height":11.6},"width":102.48,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-33.png","element":"img","alt":" x ∈ C","inline":true,"padRight":true},{"text":"such that the knowledge that ","element":"span"},{"style":{"height":16},"width":142.7,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/27-34.png","element":"img","alt":" C − {x}","inline":true,"padRight":true},{"text":"is a cluster is sufficient to infer the label of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":". This will allow us to use the symmetry argument of [","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"4","element":"a"},{"text":"] to show that step 2a has good coverage.","element":"span"}],[{"style":{"width":"36%"},"width":676,"height":556,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/28-0.png","element":"img"}],[{"text":"Figure 1: The above image illustrates the construction of sets ","element":"figcaption","subtype":"caption"},{"style":{"height":13.59},"width":260.77,"height":33.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/28-1.png","element":"img","alt":" Cmin and Cmax","inline":true,"padRight":true},{"text":"which sandwich the cluster ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"C","element":"figcaption","subtype":"caption"},{"text":".","element":"figcaption","subtype":"caption"}],[{"id":"id-64","style":{"fontWeight":"bold"},"text":"Lemma 3.9. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be a set, and ","element":"span"},{"style":{"height":15.59},"width":77.18,"height":38.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/28-2.png","element":"img","alt":" Hd,γ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"the set of hyperplanes with margin ","element":"span"},{"style":{"height":10.4},"width":22,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/28-3.png","element":"img","alt":" γ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"with respect to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Consider a query set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q ","element":"span"},{"style":{"fontStyle":"italic"},"text":"containing a cluster query along with the standard label queries. Given a subset ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"style":{"fontStyle":"italic"},"text":", a cluster-query returns 1 if ","element":"span"},{"style":{"height":16},"width":174.84,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/28-4.png","element":"img","alt":" S is a γ/d","inline":true},{"style":{"fontStyle":"italic"},"text":"-cluster, and ","element":"span"},{"text":"0 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"otherwise. Then for any ","element":"span"},{"style":{"height":16},"width":321.34,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/28-5.png","element":"img","alt":" γ/d-cluster C ⊆ X","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"of size at least ","element":"span"},{"text":"24","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"log(","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":"+1)","element":"span"},{"style":{"fontStyle":"italic"},"text":":","element":"span"}],[{"style":{"width":"38%"},"width":722,"height":57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/28-6.png","element":"img"}],[{"style":{"height":16},"width":669.38,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/28-7.png","element":"img","alt":"Proof. A γ/d-cluster C = {x1, . . . , xn}","inline":true,"padRight":true},{"text":"infers a point y if there is a solution to the following system of linear equations:","element":"span"}],[{"id":"id-59","style":{"width":"57%"},"width":1069,"height":216,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/28-8.png","element":"img"}],[{"text":"Informally, because ","element":"span"},{"style":{"height":16},"width":176.68,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/28-9.png","element":"img","alt":" C is a γ/d","inline":true},{"text":"-cluster and all points have margin ","element":"span"},{"style":{"height":10.4},"width":22,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/28-10.png","element":"img","alt":" γ","inline":true},{"text":", it infers the labels not just of points in its convex hull, but in a ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"times expansion of the hull. We will show that a large enough cluster ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"must contain some point ","element":"span"},{"style":{"fontStyle":"italic"},"text":"y ","element":"span"},{"text":"s.t. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"style":{"fontStyle":"italic"},"text":"\\ {","element":"span"},{"style":{"fontStyle":"italic"},"text":"y","element":"span"},{"style":{"fontStyle":"italic"},"text":"} ","element":"span"},{"text":"infers ","element":"span"},{"style":{"fontStyle":"italic"},"text":"y","element":"span"},{"text":". Our strategy relies on the fact that if ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"does not infer ","element":"span"},{"style":{"fontStyle":"italic"},"text":"y","element":"span"},{"text":", adding ","element":"span"},{"style":{"fontStyle":"italic"},"text":"y ","element":"span"},{"text":"to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"expands the volume of its convex hull by a multiplicative factor. Since we can upper bound the volume of the convex hull of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"by the volume of the largest simplex times the size of a decomposition of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"into simplices (a triangulation), this multiplicative volume expansion contradicts the upper bound for large enough ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C","element":"span"},{"text":".","element":"span"}],[{"text":"In order to prove that adding a point multiplicatively expands the volume of the convex hull, we will need to prove the existence of a certain affine linear function. In particular, if ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"y ","element":"span"},{"text":"are such that this system of equations has no solution, then there exists an affine function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"L ","element":"span"},{"text":"such that:","element":"span"}],[{"style":{"width":"100%"},"width":1878,"height":262,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/28-11.png","element":"img"}],[{"text":"ear combination of the inequalities and real linear combination of the equalities that sum to the contradiction ","element":"span"},{"style":{"height":13.2},"width":101.77,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/28-12.png","element":"img","alt":"1 ≤ 0","inline":true,"padRight":true},{"text":"by LP-duality. Since ","element":"span"},{"style":{"height":10},"width":99.09,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/28-13.png","element":"img","alt":" ai, xi,","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"y ","element":"span"},{"text":"do not appear in this contradiction, the linear combinations of Equation ","element":"span"},{"href":"#id-59","text":"(8) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-59","text":"(9) ","element":"a"},{"text":"must cancel. To see this explicitly, let the linear combination of ","element":"span"},{"href":"#id-59","text":"8 ","element":"a"},{"text":"be denoted ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T","element":"span"},{"text":", then the equality becomes:","element":"span"}],[{"id":"id-61","style":{"width":"17%"},"width":335,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/28-14.png","element":"img"}],[{"text":"Note that Equation ","element":"span"},{"href":"#id-59","text":"(9) ","element":"a"},{"text":"in a truly linear form is a set of ","element":"span"},{"style":{"height":13.39},"width":37.1,"height":33.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-0.png","element":"img","alt":" 2d","inline":true,"padRight":true},{"text":"equations ","element":"span"},{"style":{"height":16},"width":112.59,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-1.png","element":"img","alt":"� aiei","inline":true,"padRight":true},{"text":"for ","element":"span"},{"style":{"height":17.39},"width":213.03,"height":43.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-2.png","element":"img","alt":" e ∈ {−1, 1}d","inline":true},{"text":". The positive real linear combination of these terms is then of the form","element":"span"}],[{"style":{"width":"7%"},"width":136,"height":57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-3.png","element":"img"}],[{"text":"for some ","element":"span"},{"style":{"height":14.18},"width":111.59,"height":35.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-4.png","element":"img","alt":" b ∈ Rd","inline":true},{"text":". Since these two sums must cancel, we get that the ","element":"span"},{"style":{"height":13.19},"width":28.1,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-5.png","element":"img","alt":" bi","inline":true,"padRight":true},{"text":"are in fact ","element":"span"},{"style":{"height":16},"width":127.14,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-6.png","element":"img","alt":" −T(xi)","inline":true},{"text":". Summing the two equations then gives:","element":"span"}],[{"style":{"width":"51%"},"width":960,"height":64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-7.png","element":"img"}],[{"text":"Now define ","element":"span"},{"style":{"height":10.8},"width":140.32,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-8.png","element":"img","alt":" L = −T","inline":true},{"text":", which remains an affine linear function. This sign only affects the left-hand term, and thus we get:","element":"span"}],[{"style":{"width":"49%"},"width":918,"height":213,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-9.png","element":"img"}],[{"text":"Noting that ","element":"span"},{"style":{"height":16},"width":337.86,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-10.png","element":"img","alt":" L(xmax) − L(xmin)","inline":true,"padRight":true},{"text":"is at most ","element":"span"},{"style":{"height":22.04},"width":223.6,"height":55.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-11.png","element":"img","alt":" 2 maxi |L(xi)|","inline":true,"padRight":true},{"text":"proves the claim.","element":"span"}],[{"text":"Using the function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"L ","element":"span"},{"text":"we can show how ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"expands in volume when adding an un-inferred point:","element":"span"}],[{"id":"id-60","style":{"width":"63%"},"width":1186,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-12.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof of ","element":"span"},{"href":"#id-60","text":"(11)","element":"a"},{"style":{"fontStyle":"italic"},"text":": ","element":"span"},{"text":"Our strategy will be to sandwich the convex hull of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"in the difference of two cones defined by ","element":"span"},{"style":{"fontStyle":"italic"},"text":"L ","element":"span"},{"text":"with apex ","element":"span"},{"style":{"fontStyle":"italic"},"text":"y","element":"span"},{"text":". For arbitrary points ","element":"span"},{"style":{"height":11.6},"width":102.54,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-13.png","element":"img","alt":" x ∈ C","inline":true},{"text":", let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"x, y","element":"span"},{"text":") ","element":"span"},{"text":"be the line passing through ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"and ","element":"span"},{"style":{"height":16},"width":190.9,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-14.png","element":"img","alt":" y, N(xmin)","inline":true,"padRight":true},{"text":"be the plane given by ","element":"span"},{"style":{"height":16},"width":272.08,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-15.png","element":"img","alt":" L(x) = L(xmin)","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":16},"width":151.75,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-16.png","element":"img","alt":" N(xmax)","inline":true,"padRight":true},{"text":"be the plane given by ","element":"span"},{"style":{"height":16},"width":278.08,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-17.png","element":"img","alt":" L(x) = L(xmax)","inline":true},{"text":". The cone which does not contain ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"is then defined by its apex ","element":"span"},{"style":{"height":14.4},"width":305.58,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-18.png","element":"img","alt":" y and base Cmax:","inline":true}],[{"style":{"width":"54%"},"width":1017,"height":43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-19.png","element":"img"}],[{"text":"Likewise, we define the cone that contains both Cone(","element":"span"},{"style":{"height":14},"width":131.84,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-20.png","element":"img","alt":"Cmax, y","inline":true},{"text":") and ConvHull(","element":"span"},{"style":{"fontStyle":"italic"},"text":"C","element":"span"},{"text":") as the cone with apex y and base ","element":"span"},{"style":{"height":13.19},"width":100.73,"height":32.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-21.png","element":"img","alt":" Cmin:","inline":true}],[{"style":{"width":"54%"},"width":1017,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-22.png","element":"img"}],[{"text":"We refer to these cones respectively as Cone(","element":"span"},{"style":{"height":14},"width":131.85,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-23.png","element":"img","alt":"Cmax, y","inline":true},{"text":") and Cone(","element":"span"},{"style":{"height":14},"width":127.46,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-24.png","element":"img","alt":"Cmin, y","inline":true},{"text":"). Note that ","element":"span"},{"style":{"height":13.19},"width":92.06,"height":32.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-25.png","element":"img","alt":" Cmax","inline":true,"padRight":true},{"text":"is similar to ","element":"span"},{"style":{"height":13.19},"width":88.04,"height":32.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-26.png","element":"img","alt":" Cmin","inline":true,"padRight":true},{"text":"and that Equation ","element":"span"},{"href":"#id-61","text":"(10) ","element":"a"},{"text":"bounds the ratio in volume of these cones:","element":"span"}],[{"style":{"width":"84%"},"width":1581,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-27.png","element":"img"}],[{"text":"Further, since ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"is sandwiched between the two cones we have ","element":"span"},{"style":{"height":15.6},"width":790.98,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-28.png","element":"img","alt":" ConvHull(C) ⊂ Cone(Cmin, y) − Cone(C2, y),","inline":true,"padRight":true},{"text":"and can bound the ratio in volume between the Convex Hull of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"and the smaller cone:","element":"span"}],[{"style":{"width":"55%"},"width":1049,"height":211,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-29.png","element":"img"}],[{"text":"Finally, because the Convex Hull of ","element":"span"},{"style":{"height":16},"width":134.14,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-30.png","element":"img","alt":" C ∪ {y}","inline":true,"padRight":true},{"text":"contains both Cone(","element":"span"},{"style":{"height":14},"width":131.84,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/29-31.png","element":"img","alt":"Cmax, y","inline":true},{"text":") and ConvHull(","element":"span"},{"style":{"fontStyle":"italic"},"text":"C","element":"span"},{"text":"), this allows us to","element":"span"}],[{"text":"lower bound the expansion factor of including ","element":"span"},{"style":{"fontStyle":"italic"},"text":"y ","element":"span"},{"text":"into ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C","element":"span"},{"text":":","element":"span"}],[{"style":{"width":"55%"},"width":1048,"height":407,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/30-0.png","element":"img"}],[{"text":"Using Equation ","element":"span"},{"href":"#id-60","text":"(11)","element":"a"},{"text":", we can build our contradiction on the volume of the convex hull for large enough ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C","element":"span"},{"text":". For analysis, we denote the size of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"by ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":". To start, we note a simple upper bound on the volume of the convex hull of any ","element":"span"},{"style":{"height":16.59},"width":341.83,"height":41.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/30-1.png","element":"img","alt":" n point set C ∈ Rd:","inline":true}],[{"style":{"width":"27%"},"width":524,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/30-2.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":13.19},"width":86.83,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/30-3.png","element":"img","alt":" Vmax","inline":true,"padRight":true},{"text":"is the volume of the largest simplex with vertices in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C","element":"span"},{"text":". This follows from choosing any vertex ","element":"span"},{"style":{"height":11.6},"width":103.01,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/30-4.png","element":"img","alt":"x ∈ C","inline":true,"padRight":true},{"text":"and noting that choosing every simplex which contains ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"is a triangulation of ConvHull(","element":"span"},{"style":{"fontStyle":"italic"},"text":"C","element":"span"},{"text":"). While this triangulation is certainly not optimal, it is sufficient for our purposes.","element":"span"}],[{"text":"Since there exists some ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h ","element":"span"},{"text":"s.t. no point in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"can be inferred from the rest, every point added to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"after the largest simplex multiplies the volume by ","element":"span"},{"style":{"height":22.18},"width":71.58,"height":55.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/30-5.png","element":"img","alt":"e2e2−1","inline":true},{"text":". This gives a lower bound on the volume of ConvHull(","element":"span"},{"style":{"fontStyle":"italic"},"text":"C","element":"span"},{"text":") ","element":"span"},{"text":"of:","element":"span"}],[{"style":{"width":"37%"},"width":696,"height":91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/30-6.png","element":"img"}],[{"text":"Together, these bounds give the equation:","element":"span"}],[{"style":{"width":"21%"},"width":397,"height":107,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/30-7.png","element":"img"}],[{"text":"Setting ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n > ","element":"span"},{"text":"24","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"log(","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"+ 1) ","element":"span"},{"text":"gives a contradiction.","element":"span"}],[{"text":"With Lemmas ","element":"span"},{"href":"#id-62","text":"3.7, ","element":"a"},{"href":"#id-63","text":"3.8, ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-64","text":"3.9 ","element":"a"},{"text":"in hand, we can now give a more detailed explanation of step 2a:","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Step 2a (high noise): ","element":"span"},{"text":"It is assumed by step 1 that we have detected an ","element":"span"},{"style":{"height":7.2},"width":19,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/30-8.png","element":"img","alt":" ε","inline":true},{"text":"-equitable subset ","element":"span"},{"style":{"height":10.8},"width":40.74,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/30-9.png","element":"img","alt":" S′","inline":true},{"text":". Draw an additional set of points ","element":"span"},{"style":{"height":16},"width":222.18,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/30-10.png","element":"img","alt":" {x1, . . . , xm}","inline":true},{"text":", and for each point test whether ","element":"span"},{"style":{"height":16},"width":237.46,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/30-11.png","element":"img","alt":" S′ ∪{xi} is λ1","inline":true},{"text":"-equitable. By Lemma ","element":"span"},{"href":"#id-62","text":"3.7, ","element":"a"},{"text":"the points which measure as equitable with ","element":"span"},{"style":{"height":10.8},"width":40.73,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/30-12.png","element":"img","alt":" S′","inline":true,"padRight":true},{"text":"make up a cluster. Using Lemma ","element":"span"},{"href":"#id-63","text":"3.8 ","element":"a"},{"text":"to label these points, build an LP based on the labels and cluster structure. Applying Lemma ","element":"span"},{"href":"#id-64","text":"3.9 ","element":"a"},{"text":"and the symmetry argument of ","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"[4] ","element":"a"},{"text":"shows that this LP has good coverage.","element":"span"}],[{"text":"It is left to show that step 2b has good coverage. ","element":"span"},{"text":"Step 2b will follow a similar strategy to the Massart case, using points well-separated in an MLE ordering to build our LP. However, since we are still in the regime of unbounded error, we will need to exploit the fact that our sample has no large clusters to show that this LP infers correctly with high probability. Notice that a sample with no clusters consists mostly of pairs of points whose comparisons are bounded in error. With this in mind, we modify the pointwise movement bounds of ","element":"span"},{"href":"#id-18","referenceIndex":19,"text":"[19] ","element":"a"},{"text":"to differentiate between pairs of points with bounded and unbounded comparison error.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Definition 3.10. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be a set with a noisy comparison oracle ","element":"span"},{"style":{"height":14},"width":55.5,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/30-13.png","element":"img","alt":" QC","inline":true},{"style":{"fontStyle":"italic"},"text":". We call a comparison between points ","element":"span"},{"style":{"height":14},"width":174.14,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/30-14.png","element":"img","alt":"x, y ∈ S λ","inline":true},{"style":{"fontStyle":"italic"},"text":"-far if the probability that ","element":"span"},{"style":{"height":14},"width":55.5,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/30-15.png","element":"img","alt":" QC","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"returns the correct comparison is at least ","element":"span"},{"style":{"height":16},"width":130.21,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/30-16.png","element":"img","alt":" 1/2 + λ","inline":true},{"style":{"fontStyle":"italic"},"text":". Otherwise we call the comparison ","element":"span"},{"style":{"height":11.6},"width":133.04,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/30-17.png","element":"img","alt":" λ-close.","inline":true}],[{"text":"To prove a point-wise movement bound, we will follow exactly the strategy of [","element":"span"},{"href":"#id-18","referenceIndex":19,"text":"19","element":"a"},{"text":"]. First, we prove that it is unlikely that an ordering which disagrees on many far comparisons from the true order is an MLE ordering. Second, we use this to upper bound the total number of wrong far comparisons in any MLE order with high probability. Finally, we prove that as long as no large cluster exists, a single point cannot move too far without contradicting the upper bound on total far errors.","element":"span"}],[{"id":"id-65","style":{"height":11.2},"width":367.96,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-0.png","element":"img","alt":"Lemma 3.11. Let σ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a permutation which differs from the true order on ","element":"span"},{"style":{"height":13.19},"width":76.24,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-1.png","element":"img","alt":" σc λ","inline":true},{"style":{"fontStyle":"italic"},"text":"-close comparisons, and ","element":"span"},{"style":{"height":11.59},"width":39.77,"height":28.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-2.png","element":"img","alt":" σf","inline":true},{"style":{"height":10.8},"width":23,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-3.png","element":"img","alt":"λ","inline":true},{"style":{"fontStyle":"italic"},"text":"-far comparisons. The probability that ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-4.png","element":"img","alt":" σ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is an MLE order is","element":"span"}],[{"style":{"width":"23%"},"width":441,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-5.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"To be an MLE order, ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-6.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"must beat the true order on half or more of the comparisons on which they differ. We can bound this probability by the Poisson Binomial:","element":"span"}],[{"style":{"width":"46%"},"width":866,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-7.png","element":"img"}],[{"text":"A Chernoff bound then gives the desired result.","element":"span"}],[{"text":"Using this upper bound, we show that any order which disagrees with the true ordering on more than ","element":"span"},{"style":{"height":18.83},"width":134.29,"height":47.07,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-8.png","element":"img","alt":"˜Ω(n3/2)","inline":true,"padRight":true},{"text":"comparisons is not an MLE ordering with high probability.","element":"span"}],[{"id":"id-66","style":{"fontWeight":"bold"},"text":"Lemma 3.12 ","element":"span"},{"text":"(Total Far Movement)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The probability that an MLE order disagrees with the identity on ","element":"span"},{"style":{"height":16.57},"width":146.39,"height":41.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-9.png","element":"img","alt":"c1n3/2 λ","inline":true},{"style":{"fontStyle":"italic"},"text":"-far comparisons, where","element":"span"}],[{"style":{"width":"25%"},"width":473,"height":89,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-10.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"is ","element":"span"},{"style":{"height":14},"width":61.06,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-11.png","element":"img","alt":" ≤ δ","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"For a given permutation ","element":"span"},{"style":{"height":18.97},"width":389.11,"height":47.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-12.png","element":"img","alt":" σ, assume σf > c1n3/2","inline":true},{"text":". By Lemma ","element":"span"},{"href":"#id-65","text":"3.11, ","element":"a"},{"text":"the probability that ","element":"span"},{"style":{"height":15.99},"width":239.53,"height":39.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-13.png","element":"img","alt":" σf is an MLE","inline":true,"padRight":true},{"text":"order is at most:","element":"span"}],[{"style":{"width":"100%"},"width":1876,"height":293,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-14.png","element":"img"}],[{"text":"Finally, we show a bound on point-wise movement by proving that any point which moves more than ","element":"span"},{"style":{"height":18.83},"width":134.29,"height":47.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-15.png","element":"img","alt":"˜Ω(n3/4)","inline":true,"padRight":true},{"text":"from its true position creates ","element":"span"},{"style":{"height":18.83},"width":134.29,"height":47.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-16.png","element":"img","alt":" ˜Ω(n3/2)","inline":true,"padRight":true},{"text":"total far errors.","element":"span"}],[{"id":"id-67","style":{"fontWeight":"bold"},"text":"Lemma 3.13 ","element":"span"},{"text":"(Point-wise Far Movement)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Given a sample ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"style":{"fontStyle":"italic"},"text":"of size ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"style":{"height":16},"width":193.06,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-17.png","element":"img","alt":" λ ≤ gL(ε0)","inline":true},{"style":{"fontStyle":"italic"},"text":", assume that the sample does not have a ","element":"span"},{"style":{"height":19.1},"width":117.93,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-18.png","element":"img","alt":" g−1L (λ)","inline":true},{"style":{"fontStyle":"italic"},"text":"-cluster of size ","element":"span"},{"style":{"height":18.18},"width":408.04,"height":45.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-19.png","element":"img","alt":" m. Let l = (2c1)1/2n3/4","inline":true},{"style":{"fontStyle":"italic"},"text":". Then with probability at least ","element":"span"},{"style":{"height":14.8},"width":119.56,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-20.png","element":"img","alt":" 1 − 2δ,","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"no point moves by further than ","element":"span"},{"style":{"height":9.19},"width":86.12,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-21.png","element":"img","alt":" c2m2","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"in an MLE order, where","element":"span"}],[{"style":{"width":"37%"},"width":707,"height":99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-22.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Assume without loss of generality that the true order on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"is the identity ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":", . . . , n","element":"span"},{"text":". Denote by ","element":"span"},{"style":{"height":16.39},"width":54.16,"height":40.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-23.png","element":"img","alt":" Aij","inline":true,"padRight":true},{"text":"the event that ","element":"span"},{"style":{"height":16},"width":317.64,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-24.png","element":"img","alt":" i maps to σ(i) = j","inline":true,"padRight":true},{"text":"in an MLE order, ","element":"span"},{"style":{"height":16},"width":242.45,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-25.png","element":"img","alt":" |i − j| > c2m2","inline":true},{"text":", and at most ","element":"span"},{"style":{"fontStyle":"italic"},"text":"l ","element":"span"},{"text":"elements from outside the range ","element":"span"},{"style":{"height":16},"width":613.64,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-26.png","element":"img","alt":" [i − l − m, j + l + m] map into [i, j]","inline":true},{"text":". Note that if more than ","element":"span"},{"style":{"fontStyle":"italic"},"text":"l ","element":"span"},{"text":"of such elements map into ","element":"span"},{"text":"[","element":"span"},{"style":{"fontStyle":"italic"},"text":"i, j","element":"span"},{"text":"] ","element":"span"},{"text":"then the order must differ on at least ","element":"span"},{"style":{"height":22.19},"width":66.35,"height":55.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-27.png","element":"img","alt":"l22 λ","inline":true},{"text":"-far comparisons from the identity. This follows from the fact that each such ","element":"span"},{"text":"element must shift ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m ","element":"span"},{"text":"+ ","element":"span"},{"style":{"fontStyle":"italic"},"text":"l ","element":"span"},{"text":"places towards ","element":"span"},{"text":"[","element":"span"},{"style":{"fontStyle":"italic"},"text":"i, j","element":"span"},{"text":"]","element":"span"},{"text":", but has at maximum ","element":"span"},{"style":{"height":10.8},"width":71.26,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-28.png","element":"img","alt":" m λ","inline":true},{"text":"-close comparisons in that direction, and that each comparison is counted at most twice.","element":"span"}],[{"text":"For ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"to be in slot ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"in an MLE order, it must beat the identity on more than half of elements in between. Since we have assumed all but ","element":"span"},{"style":{"fontStyle":"italic"},"text":"l ","element":"span"},{"text":"of the elements between ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"in the order are from ","element":"span"},{"style":{"height":16},"width":372.68,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/31-29.png","element":"img","alt":" [i − l − m, j + l + m],","inline":true,"padRight":true},{"text":"then this range must contain at least ","element":"span"},{"style":{"height":16},"width":264.36,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-0.png","element":"img","alt":" c2m2/2 − l − 1","inline":true,"padRight":true},{"text":"incorrect comparisons with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"text":". This further implies that at least ","element":"span"},{"style":{"height":16},"width":362.03,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-1.png","element":"img","alt":" c2m2/2 − 2l − m − 1","inline":true,"padRight":true},{"text":"comparisons with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"must be incorrect in the range ","element":"span"},{"text":"[","element":"span"},{"style":{"fontStyle":"italic"},"text":"i, j ","element":"span"},{"text":"+ ","element":"span"},{"style":{"fontStyle":"italic"},"text":"l ","element":"span"},{"text":"+ ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":"]","element":"span"},{"text":". By our assumption on cluster size, all but ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m ","element":"span"},{"text":"of these comparisons are ","element":"span"},{"style":{"height":10.8},"width":23,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-2.png","element":"img","alt":" λ","inline":true},{"text":"-far, so we can bound the probability of ","element":"span"},{"style":{"height":16.39},"width":54.16,"height":40.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-3.png","element":"img","alt":" Aij","inline":true,"padRight":true},{"text":"by the Poisson Binomial:","element":"span"}],[{"style":{"width":"60%"},"width":1134,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-4.png","element":"img"}],[{"text":"Combining our assumptions on ","element":"span"},{"style":{"height":9.19},"width":50.99,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-5.png","element":"img","alt":" m2","inline":true,"padRight":true},{"text":"with a Chernoff bound then gives:","element":"span"}],[{"style":{"width":"21%"},"width":407,"height":84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-6.png","element":"img"}],[{"text":"Union bounding over pairs ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i, j ","element":"span"},{"text":"then gives that if any point moves by more than ","element":"span"},{"style":{"height":9.19},"width":86.11,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-7.png","element":"img","alt":" c2m2","inline":true,"padRight":true},{"text":"in an MLE ordering, the total number of wrong ","element":"span"},{"style":{"height":10.8},"width":23,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-8.png","element":"img","alt":" λ","inline":true},{"text":"-far comparisons are more than ","element":"span"},{"style":{"height":16.58},"width":107.26,"height":41.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-9.png","element":"img","alt":" c1n3/2 ","inline":true,"padRight":true},{"text":"with probability ","element":"span"},{"style":{"height":11.6},"width":87.66,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-10.png","element":"img","alt":" 1 − δ","inline":true},{"text":". By Lemma ","element":"span"},{"href":"#id-66","text":"3.12, ","element":"a"},{"text":"the probability that this occurs is ","element":"span"},{"style":{"height":14},"width":61.06,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-11.png","element":"img","alt":" ≤ δ","inline":true},{"text":", giving the desired result.","element":"span"}],[{"text":"With a point-wise movement bound in hand, step 2b essentially follows the same strategy as Lemma ","element":"span"},{"href":"#id-51","text":"2.7 ","element":"a"},{"text":"with a different set of parameters.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Step 2b (low noise): ","element":"span"},{"text":"Draw an additional sample of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m ","element":"span"},{"text":"points, and use the labels and comparisons of all pairs of points separated by ","element":"span"},{"style":{"height":18.83},"width":218.66,"height":47.07,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-12.png","element":"img","alt":"˜Ω(n3/4) in S","inline":true,"padRight":true},{"text":"to build an inference LP. This LP correctly infers points with high probability by Lemma ","element":"span"},{"href":"#id-67","text":"3.13, ","element":"a"},{"text":"and has large coverage due to the space’s finite inference dimension.","element":"span"}],[{"text":"All that remains is step 3, which repeats steps 1 and 2 until reaching the desired coverage. Tying all of these together, we present the proof of Theorem ","element":"span"},{"href":"#id-68","text":"3.2: ","element":"a"},{"text":"learning margin ","element":"span"},{"style":{"height":10.4},"width":22,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-13.png","element":"img","alt":" γ","inline":true},{"text":", finite inference dimension non-homogeneous linear separators with GTNC noise.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"(Proof of Theorem ","element":"span"},{"href":"#id-68","text":"3.2)","element":"a"}],[{"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"be the subsample described in step 1 of size ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":", and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"c ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m ","element":"span"},{"text":"the parameters defining the size of subsets we check for ","element":"span"},{"style":{"height":9.59},"width":41.58,"height":23.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-14.png","element":"img","alt":" εT","inline":true,"padRight":true},{"text":"-equitability. Further, in the case that some subset tests as equitable, let ","element":"span"},{"style":{"height":10.8},"width":40.73,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-15.png","element":"img","alt":" S′ ","inline":true,"padRight":true},{"text":"be the additionally drawn points. To begin, we set ","element":"span"},{"style":{"height":9.59},"width":41.58,"height":23.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-16.png","element":"img","alt":" εT","inline":true,"padRight":true},{"text":"such that if we measure an equitable subset ","element":"span"},{"style":{"height":15.59},"width":137.49,"height":38.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-17.png","element":"img","alt":" Seq ⊂ S","inline":true},{"text":", points ","element":"span"},{"style":{"height":11.6},"width":113.4,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-18.png","element":"img","alt":" x ∈ S′","inline":true,"padRight":true},{"text":"s.t. ","element":"span"},{"style":{"height":19.21},"width":492.28,"height":48.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-19.png","element":"img","alt":"Seq ∪ {x} is 2gU(2g−1L (4εT ))","inline":true},{"text":"-equitable make up a ","element":"span"},{"style":{"height":16},"width":63.8,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-20.png","element":"img","alt":" γ/d","inline":true},{"text":"-cluster (see Lemma ","element":"span"},{"href":"#id-62","text":"3.7)","element":"a"},{"text":":","element":"span"}],[{"style":{"width":"72%"},"width":1360,"height":294,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-21.png","element":"img"}],[{"text":"Note that this also satisfies the requirement on ","element":"span"},{"style":{"height":9.59},"width":41.58,"height":23.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-22.png","element":"img","alt":" εT","inline":true,"padRight":true},{"text":"from Lemma ","element":"span"},{"href":"#id-62","text":"3.7. ","element":"a"},{"text":"To satisfy Lemmas ","element":"span"},{"href":"#id-58","text":"3.6, ","element":"a"},{"href":"#id-62","text":"3.7, ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-63","text":"3.8, ","element":"a"},{"text":"we set ","element":"span"},{"style":{"height":16},"width":298.16,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-23.png","element":"img","alt":" c, m, and |S′| to:","inline":true}],[{"style":{"width":"74%"},"width":1387,"height":108,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-24.png","element":"img"}],[{"text":"Note that ","element":"span"},{"style":{"height":9.19},"width":33.24,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-25.png","element":"img","alt":" c1","inline":true,"padRight":true},{"text":"is a simplified (and somewhat larger) version of the parameter from Lemma ","element":"span"},{"href":"#id-66","text":"3.12 ","element":"a"},{"text":"where ","element":"span"},{"style":{"height":10.8},"width":23,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-26.png","element":"img","alt":" λ","inline":true,"padRight":true},{"text":"has been set to ","element":"span"},{"style":{"height":19.1},"width":267.61,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-27.png","element":"img","alt":" (gL(g−1U (εT /2))","inline":true},{"text":". We must further set parameters ","element":"span"},{"style":{"height":13.59},"width":176.86,"height":33.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-28.png","element":"img","alt":" c2 and m2","inline":true,"padRight":true},{"text":"to satisfy Lemma ","element":"span"},{"href":"#id-67","text":"3.13:","element":"a"}],[{"style":{"width":"30%"},"width":566,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-29.png","element":"img"}],[{"text":"Finally, we must select the sample size ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"itself. To employ the same slotting strategy as Theorem ","element":"span"},{"href":"#id-50","text":"2.3, ","element":"a"},{"text":"we need ","element":"span"},{"style":{"height":16},"width":82.28,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-30.png","element":"img","alt":" Ω(k)","inline":true,"padRight":true},{"text":"blocks of size ","element":"span"},{"style":{"height":9.19},"width":86.11,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-31.png","element":"img","alt":" c2m2","inline":true},{"text":". This gives the requirement on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":":","element":"span"}],[{"style":{"width":"35%"},"width":671,"height":239,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/32-32.png","element":"img"}],[{"text":"To satisfy this condition, it is enough let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"be:","element":"span"}],[{"style":{"width":"42%"},"width":801,"height":152,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-0.png","element":"img"}],[{"text":"where the additional factor in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"ensures that ","element":"span"},{"style":{"height":13.59},"width":176.73,"height":33.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-1.png","element":"img","alt":" m and m2","inline":true,"padRight":true},{"text":"satisfy the constraints of Lemmas ","element":"span"},{"href":"#id-62","text":"3.7 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-67","text":"3.13.","element":"a"}],[{"text":"We will now structure our analysis as in the 3 step informal explanation.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Step 1: ","element":"span"},{"text":"Draw the sample ","element":"span"},{"style":{"height":15.19},"width":139.86,"height":37.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-2.png","element":"img","alt":" S ∼ DnX","inline":true},{"text":", where in later iterations ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D ","element":"span"},{"text":"is restricted to un-inferred points by rejection ","element":"span"},{"text":"sampling. Check ","element":"span"},{"style":{"height":13.59},"width":142.58,"height":33.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-3.png","element":"img","alt":" S for εT","inline":true,"padRight":true},{"text":"-equitable subsets of size ","element":"span"},{"text":"2","element":"span"},{"style":{"fontStyle":"italic"},"text":"c ","element":"span"},{"text":"+ ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":".","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Step 2a (high noise): ","element":"span"},{"text":"Assume that at least one subset, ","element":"span"},{"style":{"height":15.59},"width":54.55,"height":38.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-4.png","element":"img","alt":" Seq","inline":true},{"text":", is equitable with true cluster ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C","element":"span"},{"text":". Draw an additional set ","element":"span"},{"style":{"height":10.8},"width":40.73,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-5.png","element":"img","alt":" S′ ","inline":true,"padRight":true},{"text":"and test for each ","element":"span"},{"style":{"height":21.63},"width":541.59,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-6.png","element":"img","alt":" x ∈ S′ whether Seq ∪ x is gL(γ′)2","inline":true,"padRight":true},{"text":"-equitable. With probability ","element":"span"},{"style":{"height":15.6},"width":233.13,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-7.png","element":"img","alt":" 1 − O(δr), we","inline":true,"padRight":true},{"text":"can identify by Lemma ","element":"span"},{"href":"#id-62","text":"3.7 ","element":"a"},{"text":"and correctly label by Lemma ","element":"span"},{"href":"#id-63","text":"3.8 ","element":"a"},{"text":"at least ","element":"span"},{"style":{"height":24.43},"width":663.69,"height":61.07,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-8.png","element":"img","alt":" 96d log(d + 1) + 2 log(1/δr)gL(γ)2 points of S′","inline":true}],[{"text":"which are in a ","element":"span"},{"style":{"height":16},"width":63.8,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-9.png","element":"img","alt":" γ/d","inline":true,"padRight":true},{"text":"cluster. We build our learner based off of this cluster. Recall that the expected coverage of the learner is given by the probability that an additional point is inferred. To compute this, we first note that the probability an additional point lands inside the cluster is at least ","element":"span"},{"style":{"height":16},"width":139.86,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-10.png","element":"img","alt":" Ω(m/n)","inline":true},{"text":". Assuming this occurs, Lemma ","element":"span"},{"href":"#id-64","text":"3.9 ","element":"a"},{"text":"and the symmetry argument of [","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"4","element":"a"},{"text":"] give the point a ","element":"span"},{"text":"3","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"4","element":"span"},{"text":"’s probability of being inferred. Together with our high probability assumptions, this gives an expected coverage of ","element":"span"},{"style":{"height":16},"width":137.23,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-11.png","element":"img","alt":" Ω(m/n)","inline":true,"padRight":true},{"text":"for small enough ","element":"span"},{"style":{"height":14.4},"width":160.74,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-12.png","element":"img","alt":" δr. Thus,","inline":true,"padRight":true},{"text":"the probability that the coverage of our weak learner is ","element":"span"},{"style":{"height":16},"width":137.22,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-13.png","element":"img","alt":" Ω(m/n)","inline":true,"padRight":true},{"text":"is at least ","element":"span"},{"style":{"height":16},"width":137.23,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-14.png","element":"img","alt":" Ω(m/n)","inline":true,"padRight":true},{"text":"by the Markov inequality.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Step 2b (low noise): ","element":"span"},{"text":"Assume instead that no subset was ","element":"span"},{"style":{"height":9.59},"width":41.58,"height":23.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-15.png","element":"img","alt":" εT","inline":true,"padRight":true},{"text":"-equitable. By statement 1 of Corollary ","element":"span"},{"href":"#id-58","text":"3.6, ","element":"a"},{"text":"this implies that no ","element":"span"},{"style":{"height":19.1},"width":176.55,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-16.png","element":"img","alt":" g−1U (εT /2)","inline":true},{"text":"-cluster of size ","element":"span"},{"text":"2","element":"span"},{"style":{"fontStyle":"italic"},"text":"c ","element":"span"},{"text":"+ ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m ","element":"span"},{"text":"exists in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":". Sort ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"into an MLE order. By Lemma ","element":"span"},{"href":"#id-67","text":"3.13, ","element":"a"},{"text":"no point in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"has moved by further than ","element":"span"},{"style":{"height":9.19},"width":86.12,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-17.png","element":"img","alt":" c2m2","inline":true,"padRight":true},{"text":"from its true position with probability at least ","element":"span"},{"style":{"height":13.99},"width":246.59,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-18.png","element":"img","alt":" 1 − δr. S is of","inline":true,"padRight":true},{"text":"the appropriate size to apply the argument from Lemma ","element":"span"},{"href":"#id-48","text":"2.6, ","element":"a"},{"text":"so slotting ","element":"span"},{"style":{"fontStyle":"italic"},"text":"O","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":") ","element":"span"},{"text":"extra points gives constant coverage with constant probability.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Step 3: ","element":"span"},{"text":"Steps 1 and 2 build a weak learner which we must string together to get coverage ","element":"span"},{"style":{"height":14.4},"width":263.75,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-19.png","element":"img","alt":" 1 − ε mirroring","inline":true,"padRight":true},{"text":"Theorem ","element":"span"},{"href":"#id-50","text":"2.3. ","element":"a"},{"text":"Our worst case per-step coverage is ","element":"span"},{"style":{"height":16},"width":140,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-20.png","element":"img","alt":" Ω(m/n)","inline":true,"padRight":true},{"text":"with probability ","element":"span"},{"style":{"height":16},"width":140,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-21.png","element":"img","alt":" Ω(m/n)","inline":true},{"text":". After repeating the learner ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"times, the coverage becomes:","element":"span"}],[{"style":{"width":"39%"},"width":742,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-22.png","element":"img"}],[{"text":"Denoting the reliability and usefullness parameters again as ","element":"span"},{"style":{"height":15.54},"width":42.22,"height":38.85,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-23.png","element":"img","alt":" δwr","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":15.54},"width":42.22,"height":38.85,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-24.png","element":"img","alt":" δwu","inline":true,"padRight":true},{"text":", setting ","element":"span"},{"style":{"height":28.8},"width":326.56,"height":72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-25.png","element":"img","alt":" t = ˜O�n log(1/δwu )m �","inline":true},{"text":"is then sufficient to give this coverage with probability at least ","element":"span"},{"style":{"height":15.54},"width":124.65,"height":38.85,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-26.png","element":"img","alt":" 1 − δwu .","inline":true}],[{"text":"Restricting to the distribution of un-inferred points via rejection sampling, repeating the above ","element":"span"},{"style":{"height":28.8},"width":235.38,"height":72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-27.png","element":"img","alt":" O�n log(1/ε)m �","inline":true}],[{"text":"times will have coverage ","element":"span"},{"style":{"height":10.4},"width":69.52,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-28.png","element":"img","alt":" 1−ε","inline":true,"padRight":true},{"text":"with probability ","element":"span"},{"style":{"height":28.8},"width":330.91,"height":72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-29.png","element":"img","alt":" 1−O�n log(1/ε)m δwu�","inline":true},{"text":", and correctness ","element":"span"},{"style":{"height":28.85},"width":499.18,"height":72.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-30.png","element":"img","alt":" 1−O�n2 log(1/ε) log(1/δwu )m2 δwr�.","inline":true,"padRight":true},{"text":"Thus setting ","element":"span"},{"style":{"height":15.54},"width":177.98,"height":38.85,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-31.png","element":"img","alt":" δwr and δwu ","inline":true,"padRight":true},{"text":"of our weak learner to:","element":"span"}],[{"style":{"width":"21%"},"width":394,"height":365,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/33-32.png","element":"img"}],[{"text":"gives the desired coverage and error by union bounding over the number of applications.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Query Complexity: ","element":"span"},{"text":"Now we compute the Query complexity of our algorithm. Because we check equitability for every subset, at each iteration our algorithm must make ","element":"span"},{"style":{"height":17.39},"width":106.3,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-0.png","element":"img","alt":" O(n2)","inline":true,"padRight":true},{"text":"comparisons. This is dominated by the slotting complexity, which we upper bound as ","element":"span"},{"style":{"height":18.83},"width":126.64,"height":47.07,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-1.png","element":"img","alt":"˜O(dn2)","inline":true,"padRight":true},{"text":"for simplicity. The worst-case number of iterations for our algorithm is ","element":"span"},{"style":{"height":16},"width":209.86,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-2.png","element":"img","alt":" α log(α/δu),","inline":true,"padRight":true},{"text":"giving a total query complexity of:","element":"span"}],[{"style":{"width":"39%"},"width":739,"height":101,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-3.png","element":"img"}],[{"text":"For sample complexity, we follow the same argument of Theorem ","element":"span"},{"href":"#id-50","text":"2.3, ","element":"a"},{"text":"ending our algorithm if we reject too many samples in a row. Letting ","element":"span"},{"style":{"height":28.8},"width":512.17,"height":72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-4.png","element":"img","alt":" N = O�d log(d)nα log�αδu��","inline":true},{"text":", the sample complexity is then:","element":"span"}],[{"style":{"width":"16%"},"width":315,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-5.png","element":"img"}],[{"text":"Our time complexity, however, diverges from the Massart case due to our need to test all subsets for equitability. In particular, we check all","element":"span"},{"style":{"height":20.81},"width":119.13,"height":52.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-6.png","element":"img","alt":"� n2c+m�","inline":true},{"text":"subsets, which is exponential in inference dimension and noise parameters, and quasi-polynomial in the error parameter ","element":"span"},{"style":{"height":13.99},"width":32.71,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-7.png","element":"img","alt":" δr","inline":true},{"text":". Further, with unbounded error we cannot employ the sorting algorithm from ","element":"span"},{"href":"#id-18","referenceIndex":19,"text":"[19]","element":"a"},{"text":", making sorting an exponentially expensive step as well.","element":"span"}],[{"text":"As a direct corollary, we show that this gives us a query efficient","element":"span"},{"style":{"height":7.6},"width":16,"height":19,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-8.png","element":"img","alt":"3 ","inline":true,"padRight":true},{"text":"algorithm for the special case of TNC.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Corollary 3.14. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let the hypothesis class ","element":"span"},{"style":{"height":16.79},"width":162.81,"height":41.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-9.png","element":"img","alt":" (X, Hd,γ)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"have inference dimension ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"style":{"fontStyle":"italic"},"text":". Then ","element":"span"},{"style":{"height":16.79},"width":162.81,"height":41.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-10.png","element":"img","alt":" (X, Hd,γ)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is ARPUlearnable under model (TNC","element":"span"},{"style":{"height":16},"width":279.53,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-11.png","element":"img","alt":"(m, M, κ, ε0),CX","inline":true},{"style":{"fontStyle":"italic"},"text":") with query complexity:","element":"span"}],[{"style":{"width":"58%"},"width":1103,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-12.png","element":"img"}],[{"text":"As an example of an explicit concept class, consider the query complexity of half-spaces with fixed minimal-ratio (the ratio between the closest and furthest points from the decision boundary), a case studied in ","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"[4]","element":"a"},{"text":".","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Example 3.15. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"height":15.79},"width":135.89,"height":39.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-13.png","element":"img","alt":" X ⊆ Rd","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be an instance space, and ","element":"span"},{"style":{"height":15.59},"width":103.08,"height":38.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-14.png","element":"img","alt":" Hd,γ,η","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"the class of hyperplanes with margin ","element":"span"},{"style":{"height":10.4},"width":22,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-15.png","element":"img","alt":" γ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and minimal ratio ","element":"span"},{"style":{"height":10.4},"width":20,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-16.png","element":"img","alt":" η","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"with respect to ","element":"span"},{"style":{"height":16.39},"width":356.15,"height":40.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-17.png","element":"img","alt":" X. Then (X, Hd,γ,η)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is ARPU-learnable under model ","element":"span"},{"style":{"height":15.6},"width":414.4,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-18.png","element":"img","alt":" (TNC(m, M, κ, ε0), CX)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"with query complexity:","element":"span"}],[{"style":{"width":"64%"},"width":1209,"height":96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-19.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"3.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"GTNC with Weak Distributional Conditions","element":"span"}],[{"text":"Our algorithm for learning with GTNC noise introduced an additional restrictive condition on the set system: margin ","element":"span"},{"style":{"height":10.4},"width":22,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-20.png","element":"img","alt":" γ","inline":true},{"text":". We will show that this assumption and the assumption of finite inference dimension may be replaced with weak concentration and anti-concentration conditions on the distribution. In this case, however, it is difficult to show a gap between label only and comparison ARPU-learning for two reasons. The first is that learning in this regime in simply harder–it is the first case we show where comparisons do not provide an exponential improvement in the active PAC setting over its passive counterpart. The second is that in the membership query setting, label queries in the TNC model can give comparison like information, making it difficult to apply our lower bounding techniques. We will begin by proving this first statement by showing a lower bound polynomial in ","element":"span"},{"style":{"height":13.39},"width":59.49,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-21.png","element":"img","alt":" ε−1 ","inline":true,"padRight":true},{"text":"for active PAC learning with labels and comparisons.","element":"span"}],[{"id":"id-71","style":{"fontWeight":"bold"},"text":"Lemma 3.16. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"height":19.91},"width":387.09,"height":49.77,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-22.png","element":"img","alt":" s = min�1, g−1L (1/8)�","inline":true},{"style":{"fontStyle":"italic"},"text":", and ","element":"span"},{"style":{"height":25.83},"width":788.36,"height":64.56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-23.png","element":"img","alt":" c1 = maxa∈[2ε,s](8gL(4ε), 2(gL(a) − gL(a − 2ε)))","inline":true},{"style":{"fontStyle":"italic"},"text":". The query ","element":"span"},{"style":{"fontStyle":"italic"},"text":"complexity of actively PAC-learning ","element":"span"},{"style":{"height":17.39},"width":146.87,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-24.png","element":"img","alt":" (R2, H2)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"under model ","element":"span"},{"style":{"height":16},"width":427.68,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-25.png","element":"img","alt":" (GTNC(gL, gU, ε0), SC2)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is at least","element":"span"}],[{"style":{"width":"58%"},"width":1098,"height":188,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/34-26.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"The adversary begins by choosing the distribution over ","element":"span"},{"style":{"height":13.39},"width":44.78,"height":33.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-0.png","element":"img","alt":" R2 ","inline":true,"padRight":true},{"text":"to be uniform over the square ","element":"span"},{"style":{"height":17.39},"width":187.35,"height":43.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-1.png","element":"img","alt":" S = [0, s]2.","inline":true,"padRight":true},{"text":"We will use ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"a, b","element":"span"},{"text":") ","element":"span"},{"text":"to denote points in ","element":"span"},{"style":{"height":13.39},"width":44.78,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-2.png","element":"img","alt":" R2","inline":true},{"text":". Consider two parallel hyperplanes ","element":"span"},{"style":{"height":14},"width":78.63,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-3.png","element":"img","alt":" h, hε","inline":true,"padRight":true},{"text":"defined as:","element":"span"}],[{"style":{"width":"25%"},"width":477,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-4.png","element":"img"}],[{"text":"We denote the region between the two hyperplanes by ","element":"span"},{"style":{"height":16},"width":515.48,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-5.png","element":"img","alt":" ∆ := {(a, b) ∈ S : 0 ≤ a ≤ 2ε}","inline":true},{"text":", and twice the region as","element":"span"}],[{"style":{"width":"24%"},"width":458,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-6.png","element":"img"}],[{"text":"By Yao’s minimax principle it is enough to show that the adversary may pick a distribution over hyperplanes such that no learner can learn the labels with ","element":"span"},{"style":{"height":9.6},"width":61.08,"height":24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-7.png","element":"img","alt":" < ε","inline":true,"padRight":true},{"text":"error with probability ","element":"span"},{"style":{"height":16},"width":102.2,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-8.png","element":"img","alt":" ≥ 7/8","inline":true},{"text":". In particular, the adversary considers a uniform distribution over hyperplanes ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h ","element":"span"},{"text":"and ","element":"span"},{"style":{"height":13.19},"width":37.96,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-9.png","element":"img","alt":" hε","inline":true},{"text":". Note that any algorithm which correctly labels more than half of the points between ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h ","element":"span"},{"text":"and ","element":"span"},{"style":{"height":13.19},"width":37.96,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-10.png","element":"img","alt":" hε","inline":true,"padRight":true},{"text":"(i.e. at least ","element":"span"},{"style":{"height":7.2},"width":19,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-11.png","element":"img","alt":" ε","inline":true,"padRight":true},{"text":"mass of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":") can be seen as identifying the hyperplane ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h ","element":"span"},{"text":"or ","element":"span"},{"style":{"height":13.19},"width":37.96,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-12.png","element":"img","alt":" hε","inline":true},{"text":". We now show how to lower bound the number of label or comparison queries needed to identify the target hyperplane ","element":"span"},{"style":{"height":13.19},"width":136.06,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-13.png","element":"img","alt":" h or hε.","inline":true}],[{"text":"Given a set of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"query responses ","element":"span"},{"style":{"height":14},"width":189.44,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-14.png","element":"img","alt":" Q1, . . . , Qn","inline":true,"padRight":true},{"text":"from the learner, we argue that the learner cannot succeed with probability greater than:","element":"span"}],[{"id":"id-69","style":{"width":"38%"},"width":722,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-15.png","element":"img"}],[{"text":"since it can do no better than simply picking the more likely hyperplane given the set of queries. Taking the maximum over all possible sets of query responses then gives a lower bound on the number of samples. In other words, to show that the learner must make at least ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"queries, it suffices to show that this maximum is less than ","element":"span"},{"text":"7","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"8","element":"span"},{"text":":","element":"span"}],[{"style":{"width":"73%"},"width":1377,"height":62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-16.png","element":"img"}],[{"text":"Using Bayes theorem, we can rewrite these probabilities as:","element":"span"}],[{"style":{"width":"44%"},"width":839,"height":237,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-17.png","element":"img"}],[{"text":"Note in this case that query response ","element":"span"},{"style":{"height":14},"width":55.77,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-18.png","element":"img","alt":" Qi,","inline":true,"padRight":true},{"text":"which rolls together both the point or pair of points being queried and the value which the oracle returns, is dependent on ","element":"span"},{"style":{"height":14},"width":221.62,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-19.png","element":"img","alt":" Qi−1, . . . , Q1","inline":true,"padRight":true},{"text":"due to being in an active setting–the chosen point or pair is dependent on the previous responses ","element":"span"},{"style":{"height":14},"width":221.62,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-20.png","element":"img","alt":" Qi−1, . . . , Q1","inline":true},{"text":". We can now rewrite Equation ","element":"span"},{"href":"#id-69","text":"(13) ","element":"a"},{"text":"as:","element":"span"}],[{"style":{"width":"48%"},"width":908,"height":100,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-21.png","element":"img"}],[{"text":"To analyze this, note that each term in the product is simply the ratio of probabilities that a label query on some point ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"or comparison on pair of points ","element":"span"},{"style":{"height":14},"width":136.16,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-22.png","element":"img","alt":" x, y ∈ S","inline":true,"padRight":true},{"text":"(where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x, y ","element":"span"},{"text":"are determined by ","element":"span"},{"style":{"height":14},"width":221.62,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-23.png","element":"img","alt":" Qi−1, . . . , Q1","inline":true},{"text":") will return ","element":"span"},{"style":{"height":14},"width":42.5,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-24.png","element":"img","alt":" Qi","inline":true},{"text":". Then we can bound this product from above and below by looking at the maximum and minimum such ratio across all points and pairs in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":". Recall that these probabilities are chosen by the adversary from a range defined by the GTNC parameters. For simplicity, when the ranges on a query for ","element":"span"},{"style":{"height":13.59},"width":150.42,"height":33.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-25.png","element":"img","alt":" h and hε","inline":true,"padRight":true},{"text":"overlap, we let the adversary choose the same probability, but otherwise always choose the lower bound ","element":"span"},{"style":{"height":10},"width":53.86,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-26.png","element":"img","alt":" gL.","inline":true}],[{"style":{"width":"96%"},"width":1811,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-27.png","element":"img"}],[{"text":"or comparison for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":", as this will always have the larger ratio. For a point ","element":"span"},{"style":{"height":16},"width":161.3,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-28.png","element":"img","alt":" (a, b) ∈ S","inline":true},{"text":", the ratio for the correct label ","element":"span"},{"style":{"height":16},"width":257.15,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-29.png","element":"img","alt":" (Qi = +) for h","inline":true,"padRight":true},{"text":"is given by:","element":"span"}],[{"style":{"width":"54%"},"width":1013,"height":123,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-30.png","element":"img"}],[{"text":"For comparisons, we only have to consider pairs ","element":"span"},{"style":{"height":15.6},"width":361.96,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-31.png","element":"img","alt":" (a1, b1), (a2, b2) ∈ 2∆","inline":true},{"text":", since the adversary will otherwise pick a ratio of ","element":"span"},{"text":"1","element":"span"},{"text":". In this case, the maximum is given by the correct comparison with ratio:","element":"span"}],[{"style":{"width":"31%"},"width":587,"height":95,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/35-32.png","element":"img"}],[{"text":"Thus we can bound the product of the ratios from above by:","element":"span"}],[{"style":{"width":"89%"},"width":1676,"height":110,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/36-0.png","element":"img"}],[{"text":"To bound the ratio from below, we look at the probability for the incorrect label or comparison. For labels, this is:","element":"span"}],[{"style":{"width":"54%"},"width":1015,"height":109,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/36-1.png","element":"img"}],[{"text":"Likewise, the minimum ratio for comparisons is:","element":"span"}],[{"style":{"width":"31%"},"width":587,"height":95,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/36-2.png","element":"img"}],[{"text":"Thus we can also bound the product of the ratios from below as:","element":"span"}],[{"style":{"width":"91%"},"width":1711,"height":310,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/36-3.png","element":"img"}],[{"text":"Recalling that ","element":"span"},{"style":{"height":16},"width":148.11,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/36-4.png","element":"img","alt":" c1 < 1/2","inline":true,"padRight":true},{"text":"due to the initial values of ","element":"span"},{"style":{"height":14.4},"width":366.8,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/36-5.png","element":"img","alt":" s and ε, setting n to:","inline":true}],[{"style":{"width":"9%"},"width":184,"height":91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/36-6.png","element":"img"}],[{"text":"satisfies this and in turn Equation ","element":"span"},{"href":"#id-69","text":"(13)","element":"a"},{"text":", completing the proof.","element":"span"}],[{"text":"Note that for notational simplicity the adversary has chosen a non-isotropic distribution, but the bound is easily modified to hold for a distribution in ","element":"span"},{"style":{"height":13.19},"width":91.07,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/36-7.png","element":"img","alt":" ISC2","inline":true},{"text":". Specifying to the Tsybakov Low Noise condition gives the following lower bound.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Corollary 3.17 ","element":"span"},{"text":"(Restatement of Lemma ","element":"span"},{"href":"#id-70","text":"1.12)","element":"a"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The query complexity of actively PAC-learning ","element":"span"},{"style":{"height":17.39},"width":147.18,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/36-8.png","element":"img","alt":" (R2, H2)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"under model ","element":"span"},{"style":{"height":16},"width":424.29,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/36-9.png","element":"img","alt":" (TNC(m, M, κ, ε0), SC2)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is at least","element":"span"}],[{"style":{"width":"63%"},"width":1192,"height":217,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/36-10.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Observe that for ","element":"span"},{"style":{"height":17.38},"width":897.34,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/36-11.png","element":"img","alt":" f(x) = mxκ−1, |∇f(x)| ≤ m(κ − 1) for all x ∈ [0, s]","inline":true},{"text":". By the mean value theorem,","element":"span"}],[{"style":{"width":"43%"},"width":817,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/36-12.png","element":"img"}],[{"text":"Specifying to the TNC model from GTNC, we have ","element":"span"},{"style":{"height":16},"width":238.55,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/36-13.png","element":"img","alt":" gL(x) = f(x)","inline":true},{"text":", and thus that ","element":"span"},{"style":{"height":16},"width":384.81,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/36-14.png","element":"img","alt":" gL(x) − gL(x − 2ε) =","inline":true},{"style":{"height":16},"width":540.99,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/36-15.png","element":"img","alt":"f(x) − f(x − 2ε) ≤ 2m(κ − 1)ε","inline":true,"padRight":true},{"text":"for ","element":"span"},{"style":{"height":16},"width":168.51,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/36-16.png","element":"img","alt":" x ∈ [2ε, s]","inline":true},{"text":", and ","element":"span"},{"style":{"height":17.38},"width":328.24,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/36-17.png","element":"img","alt":" 8gL(4ε) = Θ(εκ−1)","inline":true},{"text":". Plugging this into Lemma ","element":"span"},{"href":"#id-71","text":"3.16 ","element":"a"},{"text":"then gives the desired bound.","element":"span"}],[{"text":"Note that this bound is tight with respect to ","element":"span"},{"style":{"height":7.2},"width":19,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/36-18.png","element":"img","alt":" ε","inline":true,"padRight":true},{"text":"for ","element":"span"},{"style":{"height":11.6},"width":102.9,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/36-19.png","element":"img","alt":" κ > 2","inline":true},{"text":", and not far off for ","element":"span"},{"style":{"height":11.6},"width":183.16,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/36-20.png","element":"img","alt":" 1 < κ < 2","inline":true},{"text":", as Hanneke and Yang [","element":"span"},{"href":"#id-11","referenceIndex":12,"text":"12","element":"a"},{"text":"] provide a label only active PAC-learning algorithm with ","element":"span"},{"style":{"height":20.32},"width":106.24,"height":50.81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/36-21.png","element":"img","alt":"˜Od( 1ε)","inline":true,"padRight":true},{"text":"queries and ","element":"span"},{"style":{"height":23.5},"width":236.42,"height":58.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/36-22.png","element":"img","alt":" ˜Od(� 1ε�2−2/κ)","inline":true,"padRight":true},{"text":"queries respectively. However, while comparison queries alone may not enough to exponentially improve the query complexity over passive PAC-learning (which is also polynomial in ","element":"span"},{"href":"#id-7","referenceIndex":8,"style":{"height":17.39},"width":105.55,"height":43.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/36-23.png","element":"img","alt":" ε−1 [8","inline":true},{"text":"]), we will show that they are sufficient for ARPU-learning.","element":"span"}],[{"id":"id-74","style":{"fontWeight":"bold"},"text":"Theorem 3.18 ","element":"span"},{"text":"(Restatement of Theorem ","element":"span"},{"href":"#id-72","text":"1.10)","element":"a"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The hypothesis class ","element":"span"},{"style":{"height":17.39},"width":148.6,"height":43.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-0.png","element":"img","alt":" (Rd, Hd)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is ARPU-learnable under model ","element":"span"},{"style":{"height":16.79},"width":534.9,"height":41.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-1.png","element":"img","alt":" (GTNC(gL, gU, ε0), ACCd,c1,c2)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"with query complexity:","element":"span"}],[{"style":{"width":"68%"},"width":1284,"height":218,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-2.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"for small enough ","element":"span"},{"style":{"height":14.8},"width":160.76,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-3.png","element":"img","alt":" δr, where","inline":true}],[{"style":{"width":"41%"},"width":770,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-4.png","element":"img"}],[{"text":"The margin condition is necessary for Lemmas ","element":"span"},{"href":"#id-63","text":"3.8 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-64","text":"3.9–","element":"a"},{"text":"we cannot reliably label points or infer from clusters lying close to the decision boundary. If we were only interested in keeping our guarantee on coverage, it would be enough to set a fake margin ","element":"span"},{"style":{"height":10.4},"width":22,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-5.png","element":"img","alt":" γ","inline":true,"padRight":true},{"text":"such that anti-concentration gives that the set of points with such a margin has ","element":"span"},{"style":{"height":16},"width":81.64,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-6.png","element":"img","alt":" O(ε)","inline":true,"padRight":true},{"text":"probability mass. However, we also require that our algorithm is reliable, and thus with high probability cannot err on points close to the decision boundary. This suggests the following strategy: if a cluster is found in step 1, before using it for inference, test whether it is too close to the decision boundary. Because the error on our labels is proportional to their distance from the decision boundary, we can build a test similar to Lemma ","element":"span"},{"href":"#id-57","text":"3.5 ","element":"a"},{"text":"to detect this by measuring the relative sizes of the subsets with different labels.","element":"span"}],[{"id":"id-75","style":{"fontWeight":"bold"},"text":"Lemma 3.19 ","element":"span"},{"text":"(Margin Detection)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be a ","element":"span"},{"style":{"height":16},"width":63.8,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-7.png","element":"img","alt":" γ/d","inline":true},{"style":{"fontStyle":"italic"},"text":"-cluster with respect to the hyperplane ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"style":{"fontStyle":"italic"},"text":"of size at least","element":"span"}],[{"style":{"width":"58%"},"width":1087,"height":176,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-8.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Further, let ","element":"span"},{"style":{"height":16.39},"width":135.75,"height":40.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-9.png","element":"img","alt":" LDif(C)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"denote the difference in size between the sets ","element":"span"},{"style":{"height":16},"width":749.88,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-10.png","element":"img","alt":" {x ∈ C : QL(x) = 1} and {x ∈ C : QL(x) =","inline":true,"padRight":true},{"text":"0","element":"span"},{"style":{"fontStyle":"italic"},"text":"}","element":"span"},{"style":{"fontStyle":"italic"},"text":". With probability at least ","element":"span"},{"style":{"height":11.6},"width":100.85,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-11.png","element":"img","alt":" 1 − δ:","inline":true}],[{"style":{"width":"67%"},"width":1262,"height":120,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-12.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Assume without loss of generality that the true label of the majority of points in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"is 1.","element":"span"}],[{"text":"Proof of (1): If there exists a point ","element":"span"},{"style":{"height":11.6},"width":122.26,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-13.png","element":"img","alt":" x ∈ C","inline":true,"padRight":true},{"text":"with ","element":"span"},{"style":{"height":16},"width":173.12,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-14.png","element":"img","alt":" f(x) < γ","inline":true},{"text":", then the entire entire cluster lies within margin ","element":"span"},{"style":{"height":16},"width":240.3,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-15.png","element":"img","alt":" γ + γ/d < 2γ","inline":true},{"text":". By assumption ","element":"span"},{"style":{"height":14.4},"width":137.43,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-16.png","element":"img","alt":" 2γ < ε0","inline":true},{"text":", so the probability that a point measures as 1 is at most ","element":"span"},{"style":{"height":16},"width":231.66,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-17.png","element":"img","alt":"1/2 + gU(2γ)","inline":true,"padRight":true},{"text":"using Equation ","element":"span"},{"href":"#id-73","text":"(1)","element":"a"},{"text":". The probability that more than ","element":"span"},{"style":{"height":16},"width":336.8,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-18.png","element":"img","alt":" (1/2 + 2gU(2γ))|C|","inline":true,"padRight":true},{"text":"points label as 1 is then given by a Chernoff bound:","element":"span"}],[{"style":{"width":"54%"},"width":1023,"height":64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-19.png","element":"img"}],[{"text":"Since we have assumed the majority label is 1, the probability that more than ","element":"span"},{"style":{"height":16},"width":330.46,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-20.png","element":"img","alt":" (1/2 + 2gU(2γ))|C|","inline":true,"padRight":true},{"text":"label as 0 is upper bounded by this as well.","element":"span"}],[{"text":"Proof of (2): Assume ","element":"span"},{"style":{"height":11.6},"width":136.96,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-21.png","element":"img","alt":" ∀x ∈ C","inline":true,"padRight":true},{"text":"we have ","element":"span"},{"style":{"height":19.1},"width":379.5,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-22.png","element":"img","alt":" f(x) < g−1L (4gU(2γ))","inline":true},{"text":". Since ","element":"span"},{"style":{"height":19.1},"width":335.7,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-23.png","element":"img","alt":" g−1L (4gU(2γ)) < ε0","inline":true,"padRight":true},{"text":"by assumption, ","element":"span"},{"text":"the probability that any point measures as 1 is at least ","element":"span"},{"style":{"height":16},"width":249.3,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-24.png","element":"img","alt":" 1/2 + 4gU(2γ)","inline":true,"padRight":true},{"text":"using Equation ","element":"span"},{"href":"#id-73","text":"(1)","element":"a"},{"text":". The probability that less than ","element":"span"},{"style":{"height":16},"width":332.27,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-25.png","element":"img","alt":" (1/2 + 2gU(2γ))|C|","inline":true,"padRight":true},{"text":"points label as 1 is then given by a Chernoff bound:","element":"span"}],[{"style":{"width":"75%"},"width":1411,"height":138,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-26.png","element":"img"}],[{"text":"The idea is now to follow the structure of Theorems ","element":"span"},{"href":"#id-68","text":"3.2 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-35","text":"2.8 ","element":"a"},{"text":"with the one exception that we will check the closeness of every cluster to the decision boundary by checking whether ","element":"span"},{"style":{"height":16.79},"width":558.07,"height":41.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/37-27.png","element":"img","alt":" |LDif(C)| ≥ (1/2 + 2gU(2γ)). If","inline":true,"padRight":true},{"text":"a cluster measures as too close, we will avoid labeling the points, preserving the reliability of the algorithm.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"(Proof of Theorem ","element":"span"},{"href":"#id-74","text":"3.18)","element":"a"}],[{"text":"To ensure that our coverage is wide enough, we will need to set the margin parameter such that for any hyperplane, the probability mass of points within margin ","element":"span"},{"style":{"height":19.1},"width":256.16,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-0.png","element":"img","alt":" 2g−1L (4gU(2γ))","inline":true,"padRight":true},{"text":"is at most ","element":"span"},{"style":{"height":16},"width":58.51,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-1.png","element":"img","alt":" ε/2","inline":true},{"text":". ","element":"span"},{"text":"By our anti-concentration bound, it is enough to let ","element":"span"},{"style":{"height":14.8},"width":88.05,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-2.png","element":"img","alt":" γ be:","inline":true}],[{"style":{"width":"36%"},"width":692,"height":150,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-3.png","element":"img"}],[{"text":"Our goal is to learn the rest of the space up to ","element":"span"},{"style":{"height":16},"width":58.5,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-4.png","element":"img","alt":" ε/2","inline":true,"padRight":true},{"text":"error via Theorem ","element":"span"},{"href":"#id-68","text":"3.2, ","element":"a"},{"text":"assuming for the moment that the modification from Lemma ","element":"span"},{"href":"#id-75","text":"3.19 ","element":"a"},{"text":"will cause at most an overall loss of ","element":"span"},{"style":{"height":16},"width":58.51,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-5.png","element":"img","alt":" ε/2","inline":true,"padRight":true},{"text":"coverage. Noting that our space has good average inference dimension, i.e. ","element":"span"},{"href":"#id-5","referenceIndex":6,"style":{"height":17.28},"width":420.25,"height":43.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-6.png","element":"img","alt":" ACCd,c1,c2 ⊂ A(X,H),1 [6","inline":true},{"text":"], we will achieve this by applying Lemma ","element":"span"},{"href":"#id-52","text":"2.9. ","element":"a"},{"text":"Thus we need to prove that Thereom ","element":"span"},{"href":"#id-68","text":"3.2 ","element":"a"},{"text":"can be used to learn a ","element":"span"},{"style":{"height":16},"width":156.47,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-7.png","element":"img","alt":" (1 − ε/6)","inline":true,"padRight":true},{"text":"fraction of random samples ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"with probability at least ","element":"span"},{"style":{"height":16},"width":158.56,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-8.png","element":"img","alt":" (1 − ε/6)","inline":true,"padRight":true},{"text":"while querying only ","element":"span"},{"style":{"height":16},"width":288.78,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-9.png","element":"img","alt":" (1 − ε/6) points.","inline":true}],[{"text":"To begin, we must set ","element":"span"},{"style":{"height":16},"width":444.08,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-10.png","element":"img","alt":" εT to detect γ/d-clusters:","inline":true}],[{"style":{"width":"36%"},"width":693,"height":488,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-11.png","element":"img"}],[{"text":"and set the size of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"such that the algorithm in Theorem ","element":"span"},{"href":"#id-68","text":"3.2 ","element":"a"},{"text":"only queries an ","element":"span"},{"style":{"height":16},"width":58.51,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-12.png","element":"img","alt":" ε/6","inline":true,"padRight":true},{"text":"fraction of points. Letting ","element":"span"},{"style":{"fontStyle":"italic"},"text":"N ","element":"span"},{"text":"be the total number of points queried as given in Theorem ","element":"span"},{"href":"#id-68","text":"3.2, ","element":"a"},{"text":"it is then sufficient for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"style":{"fontStyle":"italic"},"text":"| ","element":"span"},{"text":"to satisfy:","element":"span"}],[{"style":{"width":"45%"},"width":847,"height":215,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-13.png","element":"img"}],[{"text":"Note that due to the distributional conditions, the inference dimension ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"of our sample is ","element":"span"},{"style":{"fontStyle":"italic"},"text":"O","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"log(","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":") log(","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"text":")) ","element":"span"},{"text":"with probability at least ","element":"span"},{"style":{"height":16},"width":152.97,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-14.png","element":"img","alt":" 1 − ε/12","inline":true,"padRight":true},{"text":"[","element":"span"},{"href":"#id-5","referenceIndex":6,"text":"6","element":"a"},{"text":"]. Applying the same argument from Theorem ","element":"span"},{"href":"#id-35","text":"2.8 ","element":"a"},{"text":"then gives that the learner of Theorem ","element":"span"},{"href":"#id-68","text":"3.2 ","element":"a"},{"text":"satisfies the conditions of Lemma ","element":"span"},{"href":"#id-52","text":"2.9. ","element":"a"},{"text":"Thus to have coverage ","element":"span"},{"style":{"height":16},"width":132.36,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-15.png","element":"img","alt":" 1 − ε/2","inline":true,"padRight":true},{"text":"with probability ","element":"span"},{"style":{"height":13.99},"width":105.43,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-16.png","element":"img","alt":" 1 − δu","inline":true,"padRight":true},{"text":"and reliability ","element":"span"},{"style":{"height":13.99},"width":101.43,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-17.png","element":"img","alt":" 1 − δr","inline":true},{"text":", it is sufficient to set our ","element":"span"},{"style":{"height":13.99},"width":32.71,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-18.png","element":"img","alt":" δr","inline":true,"padRight":true},{"text":"to ","element":"span"},{"style":{"height":28.8},"width":228.52,"height":72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-19.png","element":"img","alt":" O� δrlog(1/δu)�","inline":true},{"text":"and run the algorithm","element":"span"}],[{"text":"from Theorem ","element":"span"},{"href":"#id-68","text":"3.2 ","element":"a"},{"style":{"height":16},"width":342.14,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-20.png","element":"img","alt":" O(log(1/δu)) times.","inline":true}],[{"text":"We have ignored, up until now, the modification to Theorem ","element":"span"},{"href":"#id-68","text":"3.2 ","element":"a"},{"text":"in the cluster step. If a subset measures as equitable, after slotting our extra points to obtain the ","element":"span"},{"style":{"height":16},"width":234.05,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-21.png","element":"img","alt":" γ/d-cluster C","inline":true},{"text":", we use Lemma ","element":"span"},{"href":"#id-75","text":"3.19 ","element":"a"},{"text":"to test the margin of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C","element":"span"},{"text":". If the cluster has margin at least ","element":"span"},{"style":{"height":19.1},"width":232.64,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-22.png","element":"img","alt":" g−1L (4gU(2γ))","inline":true},{"text":", the test passes with high probability. Likewise, ","element":"span"},{"text":"if the cluster has margin less than ","element":"span"},{"style":{"height":10.4},"width":22,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-23.png","element":"img","alt":" γ","inline":true},{"text":", the test fails with high probability. If the test fails, we skip the iteration of the weak learner.","element":"span"}],[{"text":"How does this modification affect our reliability and coverage? A point can only be mislabeled if the test passes on a cluster with margin less than ","element":"span"},{"style":{"height":10.4},"width":22,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-24.png","element":"img","alt":" γ","inline":true},{"text":". Over all iterations of the learner, the probability of this occurring is less than ","element":"span"},{"style":{"height":13.99},"width":99.82,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-25.png","element":"img","alt":" 1 − δr","inline":true},{"text":", so our reliability guarantee is maintained up to a constant. To analyze coverage, note that Lemma ","element":"span"},{"href":"#id-64","text":"3.9 ","element":"a"},{"text":"only infers points within the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":"-convex hull of the cluster C. Thus, if C is ","element":"span"},{"style":{"height":16},"width":63.8,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-26.png","element":"img","alt":" γ/d","inline":true},{"text":"-cluster which does not have margin ","element":"span"},{"style":{"height":19.1},"width":235.84,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-27.png","element":"img","alt":" g−1L (4gU(2γ))","inline":true},{"text":", it infers points within at most a ","element":"span"},{"style":{"height":19.1},"width":256.16,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/38-28.png","element":"img","alt":" 2g−1L (4gU(2γ))","inline":true,"padRight":true},{"text":"margin. Since ","element":"span"},{"text":"we set ","element":"span"},{"style":{"height":10.4},"width":22,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/39-0.png","element":"img","alt":" γ","inline":true,"padRight":true},{"text":"such that this region has at most ","element":"span"},{"style":{"height":16},"width":58.51,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/39-1.png","element":"img","alt":" ε/2","inline":true,"padRight":true},{"text":"probability mass, we lose at most ","element":"span"},{"style":{"height":16},"width":58.5,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/39-2.png","element":"img","alt":" ε/2","inline":true,"padRight":true},{"text":"coverage for skipping clusters with margin less than ","element":"span"},{"style":{"height":19.1},"width":235.06,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/39-3.png","element":"img","alt":" g−1L (4gU(2γ))","inline":true},{"text":". We are left then with the loss in coverage caused by our test ","element":"span"},{"text":"failing on a cluster with margin at least ","element":"span"},{"style":{"height":19.1},"width":235.84,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/39-4.png","element":"img","alt":" g−1L (4gU(2γ))","inline":true},{"text":". Since this only occurs with probability ","element":"span"},{"style":{"height":13.99},"width":103.93,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/39-5.png","element":"img","alt":" 1 − δr","inline":true,"padRight":true},{"text":"by ","element":"span"},{"text":"Lemma ","element":"span"},{"href":"#id-75","text":"3.19, ","element":"a"},{"text":"for small enough ","element":"span"},{"style":{"height":13.99},"width":32.71,"height":34.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/39-6.png","element":"img","alt":" δr","inline":true,"padRight":true},{"text":"this only changes the constant on the coverage probability of our weak learner, and thus has no asymptotic affect.","element":"span"}],[{"text":"The total query complexity is then given by the complexity for running Theorem ","element":"span"},{"href":"#id-68","text":"3.2 ","element":"a"},{"style":{"height":16},"width":221.54,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/39-7.png","element":"img","alt":" O(log(1/δu))","inline":true,"padRight":true},{"text":"times with the appropriate parameters:","element":"span"}],[{"style":{"width":"80%"},"width":1509,"height":173,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/39-8.png","element":"img"}],[{"text":"Since s-concave distributions satisfy the requisite distributional properties ","element":"span"},{"href":"#id-2","referenceIndex":3,"text":"[3]","element":"a"},{"text":",","element":"span"}],[{"id":"id-76","style":{"fontWeight":"bold"},"text":"Corollary 3.20. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The hypothesis class ","element":"span"},{"style":{"height":17.39},"width":148.34,"height":43.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/39-9.png","element":"img","alt":" (Rd, Hd)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is ARPU-learnable under model ","element":"span"},{"style":{"height":16},"width":458.9,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/39-10.png","element":"img","alt":" (TNC(m, M, κ, ε0), ISCd)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"with query complexity:","element":"span"}],[{"style":{"width":"50%"},"width":951,"height":189,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/39-11.png","element":"img"}],[{"text":"When compared with the query complexity of the label only PAC-learning algorithm of [","element":"span"},{"href":"#id-11","referenceIndex":12,"text":"12","element":"a"},{"text":"], Corollary ","element":"span"},{"href":"#id-76","text":"3.20 ","element":"a"},{"text":"only shows improvement for a small range of parameters ","element":"span"},{"style":{"height":19.77},"width":185.81,"height":49.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.05497/images/39-12.png","element":"img","alt":" 1 < κ < 1514","inline":true},{"text":". However, it is not clear to the authors ","element":"span"},{"text":"that Hanneke and Yang’s algorithm can be extended to an ARPU learner without substantially increasing the query complexity with respect to dimension.","element":"span"}]]},{"heading":"References","paragraphs":[[{"id":"id-0","text":"[1] ","element":"span"},{"text":"Sanjoy Dasgupta. Analysis of a greedy active learning strategy. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Advances in neural information processing systems","element":"span"},{"text":", pages 337–344, 2005.","element":"span"}],[{"id":"id-1","text":"[2] ","element":"span"},{"text":"Maria-Florina Balcan and Phil Long. Active and passive learning of linear separators under log-concave distributions. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Conference on Learning Theory","element":"span"},{"text":", pages 288–316, 2013.","element":"span"}],[{"id":"id-2","text":"[3] ","element":"span"},{"text":"Maria-Florina F Balcan and Hongyang Zhang. Sample and computationally efficient learning algorithms under s-concave distributions. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Advances in Neural Information Processing Systems","element":"span"},{"text":", pages 4796–4805, 2017.","element":"span"}],[{"id":"id-3","text":"[4] ","element":"span"},{"text":"Daniel M Kane, Shachar Lovett, Shay Moran, and Jiapeng Zhang. Active classification with comparison queries. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)","element":"span"},{"text":", pages 355–366. IEEE, 2017.","element":"span"}],[{"id":"id-4","text":"[5] ","element":"span"},{"text":"Daniel Kane, Shachar Lovett, and Shay Moran. Generalized comparison trees for point-location problems. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"International Colloquium on Automata, Languages and Programming","element":"span"},{"text":", 2018.","element":"span"}],[{"id":"id-5","text":"[6] ","element":"span"},{"text":"Max Hopkins, Daniel M Kane, and Shachar Lovett. The power of comparisons for actively learning linear classifiers. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:1907.03816","element":"span"},{"text":", 2019.","element":"span"}],[{"id":"id-6","text":"[7] Rui M Castro and Robert D Nowak. Upper and lower error bounds for active learning.","element":"span"}],[{"id":"id-7","text":"[8] ","element":"span"},{"text":"Pascal Massart, Élodie Nédélec, et al. Risk bounds for statistical learning. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Annals of Statistics","element":"span"},{"text":", 34 (5):2326–2366, 2006.","element":"span"}],[{"id":"id-8","text":"[9] ","element":"span"},{"text":"Enno Mammen, Alexandre B Tsybakov, et al. Smooth discrimination analysis. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Annals of Statistics","element":"span"},{"text":", 27(6):1808–1829, 1999.","element":"span"}],[{"id":"id-9","text":"[10] ","element":"span"},{"text":"Maria-Florina Balcan, Andrei Broder, and Tong Zhang. Margin based active learning. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"International Conference on Computational Learning Theory","element":"span"},{"text":", pages 35–50. Springer, 2007.","element":"span"}],[{"id":"id-10","text":"[11] ","element":"span"},{"text":"Steve Hanneke et al. Rates of convergence in active learning. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Annals of Statistics","element":"span"},{"text":", 39(1):333–361, 2011.","element":"span"}],[{"id":"id-11","text":"[12] ","element":"span"},{"text":"Steve Hanneke and Liu Yang. Minimax analysis of active learning. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Journal of Machine Learning Research","element":"span"},{"text":", 16(1):3487–3602, 2015.","element":"span"}],[{"id":"id-12","text":"[13] ","element":"span"},{"text":"Yining Wang and Aarti Singh. Noise-adaptive margin-based active learning and lower bounds under tsybakov noise condition. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Thirtieth AAAI Conference on Artificial Intelligence","element":"span"},{"text":", 2016.","element":"span"}],[{"id":"id-13","text":"[14] ","element":"span"},{"text":"Pranjal Awasthi, Maria-Florina Balcan, Nika Haghtalab, and Ruth Urner. Efficient learning of linear separators under bounded noise. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Conference on Learning Theory","element":"span"},{"text":", pages 167–190, 2015.","element":"span"}],[{"id":"id-14","text":"[15] ","element":"span"},{"text":"Yichong Xu, Hongyang Zhang, Kyle Miller, Aarti Singh, and Artur Dubrawski. Noise-tolerant interactive learning using pairwise comparisons. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Advances in Neural Information Processing Systems","element":"span"},{"text":", pages 2431–2440, 2017.","element":"span"}],[{"id":"id-15","text":"[16] ","element":"span"},{"text":"Ronald L Rivest and Robert H Sloan. Learning complicated concepts reliably and usefully. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"AAAI","element":"span"},{"text":", pages 635–640, 1988.","element":"span"}],[{"id":"id-16","text":"[17] ","element":"span"},{"text":"Lihong Li, Michael L Littman, Thomas J Walsh, and Alexander L Strehl. Knows what it knows: a framework for self-aware learning. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Machine learning","element":"span"},{"text":", 82(3):399–443, 2011.","element":"span"}],[{"id":"id-17","text":"[18] Ran El-Yaniv and Yair Wiener. Active learning via perfect selective classification. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Machine Learning Research","element":"span"},{"text":", 13(Feb):255–279, 2012.","element":"span"}],[{"id":"id-18","text":"[19] ","element":"span"},{"text":"Mark Braverman and Elchanan Mossel. Sorting from noisy information. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:0910.1191","element":"span"},{"text":", 2009.","element":"span"}],[{"id":"id-22","text":"[20] ","element":"span"},{"text":"Leslie G Valiant. A theory of the learnable. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the sixteenth annual ACM symposium on Theory of computing","element":"span"},{"text":", pages 436–445. ACM, 1984.","element":"span"}],[{"id":"id-23","text":"[21] Vladimir Vapnik and Alexey Chervonenkis. Theory of pattern recognition, 1974.","element":"span"}],[{"id":"id-24","text":"[22] ","element":"span"},{"text":"Benjamin Satzger, Markus Endres, and Werner Kiessling. A preference-based recommender system. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the 7th International Conference on E-Commerce and Web Technologies","element":"span"},{"text":", EC-Web’06, pages 31–40, Berlin, Heidelberg, 2006. Springer-Verlag. ISBN 3-540-37743-3, 978-3-540-37743-6.","element":"span"}],[{"id":"id-25","text":"[23] ","element":"span"},{"text":"Dana Angluin and Philip Laird. Learning from noisy examples. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Machine Learning","element":"span"},{"text":", 2(4):343–370, 1988.","element":"span"}],[{"id":"id-26","text":"[24] ","element":"span"},{"text":"Aaditya Ramdas and Aarti Singh. Optimal rates for stochastic convex optimization under tsybakov noise condition. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"International Conference on Machine Learning","element":"span"},{"text":", pages 365–373, 2013.","element":"span"}],[{"id":"id-33","text":"[25] ","element":"span"},{"text":"Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IEEE Transactions on Neural Networks","element":"span"},{"text":", 20(3):542–542, 2009.","element":"span"}],[{"id":"id-37","text":"[26] ","element":"span"},{"text":"Pranjal Awasthi, Maria-Florina Balcan, Nika Haghtalab, and Hongyang Zhang. Learning and 1-bit compressed sensing under asymmetric noise. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Conference on Learning Theory","element":"span"},{"text":", pages 152–192, 2016.","element":"span"}],[{"id":"id-40","text":"[27] ","element":"span"},{"text":"Tomáš Gavenčiak, Barbara Geissmann, and Johannes Lengler. Sorting by swaps with noisy comparisons. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Algorithmica","element":"span"},{"text":", 81(2):796–827, 2019.","element":"span"}],[{"text":"[28] Barbara Geissmann, Stefano Leucci, Chih-Hung Liu, and Paolo Penna. Optimal sorting with persistent comparison errors. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:1804.07575","element":"span"},{"text":", 2018.","element":"span"}],[{"id":"id-41","text":"[29] ","element":"span"},{"text":"Rolf Klein, Rainer Penninger, Christian Sohler, and David P Woodruff. Tolerant algorithms. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"European Symposium on Algorithms","element":"span"},{"text":", pages 736–747. Springer, 2011.","element":"span"}]]}],"_version":"3.3.2"},"paperNode":"$1b:props:children:props:children:0:props:product"}]]]}]}]