Today we proved the upper and lower bound theorems relating sample size for PAC-learning and VC-dimension.
We also presented the Chernoff and Hoeffding bounds, which we will use other times during the course (and are very often used in algorithmic design and complexity).
In fact, it was needed at one point of the proof above.
Exercise 9: Complete the proof that Pr(A) <= 2Pr(B) that we left pending in the proof of the upper bound theorem. This uses Hoeffding. Additional exercise, not stated in class:
Exercise 10: The upper and bound theorems presented in class gave conditions on the sample size for consistent algorithms, that is, learning algorithms that always achieve 0 errors on the sample. Not algorithms are like consistent. Extend the upper bound theorem to show that for sufficiently
large sample size, we have
|error_D(c) - error_S(c)| < epsilon with probability 1-delta
This is a more general form of generalization bound.
(Hint: go over the proof for the consistent case, see where the consistency hypothesis (error_S(c) = 0) is used, and see what you can claim for non-consistent algorithms. The Chernoff or Hoeffding bound will be used. You should get a dependence on epsilon of the form 1/epsilon^2 rather than 1/epsilon).
Saturday, October 30, 2010
2010-10-26 Generalization bounds and the VC-dimension
Today we defined version space, growth function, and VC-dimension of a concept class.
We stated and proved Sauer's lemma, which tells us how the growth function grows (sorry for the redundance).
We also stated two theorems essentially saying that the sample size needed for PAC learning a class is both upper and lower bounded by the VC-dimension, up to constants, when we fix the epsilon and delta parameters.
Exercise 8:
- find the VC-dimension of hyperrectangles in R^d
- find the VC-dimension of unions of at most k rectangles in R^2
- find the VC-dimension of linear halfspaces in R^2
- find the VC-dimension of the class of sinusoid concepts S_{a,b,c} on R. Concept S_{a,b,c} is the set of points such that a*sin(b*x)+c > 0.
We stated and proved Sauer's lemma, which tells us how the growth function grows (sorry for the redundance).
We also stated two theorems essentially saying that the sample size needed for PAC learning a class is both upper and lower bounded by the VC-dimension, up to constants, when we fix the epsilon and delta parameters.
Exercise 8:
- find the VC-dimension of hyperrectangles in R^d
- find the VC-dimension of unions of at most k rectangles in R^2
- find the VC-dimension of linear halfspaces in R^2
- find the VC-dimension of the class of sinusoid concepts S_{a,b,c} on R. Concept S_{a,b,c} is the set of points such that a*sin(b*x)+c > 0.
Friday, October 22, 2010
2010-10-22 More on PAC learning
Things we covered in today's class:
Exercise 5: Generalize the PAC algorithm for rectangles to
n-dimestional hyperrectangle.
Exercise 6: Show that an algorithm that is always able to output
consistent hypotheses is a PAC learning algorithm, given that the
sample size is at least $\frac{1}{\epsilon} \ln \frac{|H|}{\delta}$.
Exercise 7: Show that all four alternative models are indeed equivalent.
You may find the proofs to part of this in the original paper
Equivalence of models for polynomial learnability).
- finished proof of PAC learning monotone conjunctions, and PAC learning algorithm for rectangles (see chapter 1 of Introduction to Computational Learning Theory)
- definition of alternative PAC learning oracle-based models:
- the sample-based model that in addition receives as input the size of the target concept
- the one-oracle model using random example generator EX()
- the two-oracle model using positive random example generatos EX+() and negative random example generator EX-()
- reduction of online to PAC learning, more information on this in:
- Angluin's 1988 paper Queries and concept learning contains a similar reduction from the query learning model to PAC
- Littlestone's paper From on-line to batch learning
Exercise 5: Generalize the PAC algorithm for rectangles to
n-dimestional hyperrectangle.
Exercise 6: Show that an algorithm that is always able to output
consistent hypotheses is a PAC learning algorithm, given that the
sample size is at least $\frac{1}{\epsilon} \ln \frac{|H|}{\delta}$.
Exercise 7: Show that all four alternative models are indeed equivalent.
You may find the proofs to part of this in the original paper
Equivalence of models for polynomial learnability).
Tuesday, October 19, 2010
2010-10-19 PAC learning model
We finished the proof of the Weighted Majority Algorithm.
Exercise 4: Re-do analysis for WMA but instead of predicting according to the weighted majority, predict as follows: choose an expert at random according to probability w_i/W, and predict according to the chosen expert. You should get a better constant in the bound.
Then, we covered the definitions of representation class and PAC learning. The model was originally introduced in the classic 1984 paper by Valiant A theory of the learnable. We also saw one example of a learning algorithm that works in the PAC model: the elimination algorithm for learning monotone conjunctions. The proof is left for next class.
This material is well covered in the first chapter of the Kearns and Vazirani's textbook An Introduction to Computational Learning Theory.
Exercise 4: Re-do analysis for WMA but instead of predicting according to the weighted majority, predict as follows: choose an expert at random according to probability w_i/W, and predict according to the chosen expert. You should get a better constant in the bound.
Then, we covered the definitions of representation class and PAC learning. The model was originally introduced in the classic 1984 paper by Valiant A theory of the learnable. We also saw one example of a learning algorithm that works in the PAC model: the elimination algorithm for learning monotone conjunctions. The proof is left for next class.
This material is well covered in the first chapter of the Kearns and Vazirani's textbook An Introduction to Computational Learning Theory.
Friday, October 15, 2010
Scribes and exercises
Dear all:
You can find various information on the course, including scribing assignments and problems with their due dates at:
http://www.lsi.upc.edu/~marias/teaching/colt-10.html
You can find various information on the course, including scribing assignments and problems with their due dates at:
http://www.lsi.upc.edu/~marias/teaching/colt-10.html
2010-10-15: Winnow and Weighted Majority Algorithms
We covered two classic online algorithms:
Exercise 3: Re-do analysis for winnow but instead of cutting weights in half in a demotion step, set weights to 0. This is, in fact, the "elimination algorithm".
- The Winnow algorithm that is able to learn in an attribute efficient way. Originally introduced in this paper by Littlestone: Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm
- The Weighted Majority Algorithm that is able to learn from multiple experts. Introduced in this paper by Littlestone and Warmuth: The Weighted Majority Algorithm
Exercise 3: Re-do analysis for winnow but instead of cutting weights in half in a demotion step, set weights to 0. This is, in fact, the "elimination algorithm".
Monday, October 11, 2010
2010-10-08: More on the mistake-bounded model
Today:
Later in the course, we will define formally this idea of "learning reduction". But, for the time being, we proposed:
Exercise 1: Show that 2-CNF are learnable with O(n^2) mistakes by reduction to conjunctions of boolean variables.
Exercise 2: Show that n-variable, degree-d polynomials over a field are learnable from O(n^d) mistakes, by reduction to the degree-1 case.
- we finished the proof that linear functions F^n --> F for a field F are learnable with n mistakes
- we showed that subspaces of F^n are learnable with n mistakes (furthermore, they're positive mistakes)
- we showed that if a concept class C is learnable from b positive mistakes, then the class DIFF_k(C) of nested differences of at most k concepts in C is learnable from b*k mistakes.
- we argued (by informal "reductions") that the set of conjunctions of literals on n variables is learnable with 2n mistakes, and that concepts specified by non-homogeneous linear equations over a field are learnable with n+1 mistakes.
Later in the course, we will define formally this idea of "learning reduction". But, for the time being, we proposed:
Exercise 1: Show that 2-CNF are learnable with O(n^2) mistakes by reduction to conjunctions of boolean variables.
Exercise 2: Show that n-variable, degree-d polynomials over a field are learnable from O(n^d) mistakes, by reduction to the degree-1 case.
Tuesday, October 5, 2010
Last year's course
If you are curious, last year's blog for the course is:
http://coltlsiupc09.blogspot.com/
http://coltlsiupc09.blogspot.com/
2010-10-05 Introduction and the Mistake-Bounded model
We covered:
- administrative issues and mostly the evaluation method
- the goals and methods of Computational Learning Theory
- various "features" that may distinguish models of learning
- the Mistake Bounded (MB) model, one of the 3 main formal models of learning we will see in the course
- an algorithm for learning conjunctions of boolean variables in the MB model, and proved its correctness
- an algorithm for learning linear functions over fields in the MB model; its correctness proof is pending.
Sunday, October 3, 2010
Some references for the course
The class does not follow any textbook in particular, and throughout the course we will give pointers to relevant papers and other resources. If you do want a book, a classical reference is An Introduction to Computational Learning Theory by Michael Kearns and Umesh Vazirani; the book is partially available from Google Books.
Here is a list of courses from other institutions. Let us know if any of the links are broken, or if you find some other relevant course:
- Avrim Blum (Carnegie Mellon University):
- Machine Learning Theory Course (2004, 2007 versions)
- Vasant Hanovar (Iowa State University):
- Tugkan Batu (London School of Economics):
- Roni Khardon (Tufts University):
- Rob Schapire (Princeton University):
- Theoretical Machine Learning (2003 version)
- Yishai Mansour (Tel Aviv Univeristy):
- Amos Beimel (Ben-Gurion University):
- Nader Bshouty (Tel-Aviv University):
- Rocco Servedio (Columbia University):
- Adam Klivans (University of Texas):
Welcome to the COLT course, fall 2010
The course will start on tuesday 5th. We will meet on tuesdays and fridays, 10:00-12:00, in room Omega s2-216.
Ricard and Marta
Ricard and Marta
Subscribe to:
Posts (Atom)