Tuesday, December 14, 2010

2010-12-14 Boosting

Today we saw how to obtain strong PAC learning algorithms from weak ones. We showed how to boost the accuracy using the Adaboost algorithm.

The first boosting algorithm is due to Rob Schapire and is described in his paper The Strength of Weak Learnability. Freund followed this work with a more practical boosting algorithm described in Boosting a Weak Learning Algorithm by Majority. Later on they combined their ideas into Adaboost, which was first described in the article A decision-theoretic generalization of on-line learning and an application to boosting (Freund and Schapire received the Gödel Prize for this work in 2003). There are many many *many* papers on boosting from the machine learning and statistics communities; more literature can be found in the page http://www.boosting.org.

Exercise 26: Re-do the proof for AdaBoost seen in class, completing all the details and the holes that were left out.

Friday, December 10, 2010

2010-12-10 Learning using the Fourier Spectrum (continued) and Boosting

In the first part of the class, we have stated without proof that (AND,OR,NOT)-circuits of size s and depth d have "low" Fourier degree, from which it follows that they can be learned in time n^((log s)^d); this is due to Linial-Mansour-Nisan. We have then shown how to use the low-degree algorithm to learn decision lists and decision trees, and mentioned the Kushilevitz-Mansour algorithm that uses Membership Queries and can learn decision trees in polynomial time.

In the second part of the class we saw two variations on the PAC learning definition, one that fixes the confidence parameter $\delta = 1/2$, and one that fixes the accuracy parameter $\epsilon = 1/2$. We showed that the first variation is equivalent to the original one.

Exercise 24: Give depth-d circuits of size about 2^{n^{O(1/d)}} for parity of n variables. Hint: use divide-and-conquer.

Exercise 25: Use the [LMN] result on the Fourier degree of AC^0 to show that
depth-d circuits for parity of n variables must have size about 2^{n^\Omega(1/(d-1))}.
Of course, the more precise you can be in your bounds, the better solution.

Tuesday, November 30, 2010

2010-11-30 Learning using the Fourier spectrum

Today we presented the basics of Fourier analysis of boolean functions, and showed an algorithm that learns functions with (epsilon,d)-degree (= whose Fourier coefficients for parities of more than d variables have low squared sum).

This technique (as well as the application to constant-depth circuits that
we will see next day) is from:
- Constant-depth circuits, Fourier transform and learnability, N. Linial Y. Mansour and N. Nisan, Jour. Assoc. Comput. Mach., 40 (1993) 607 - 620.

The exposition in this and next lecture largely follows:

- Learning Boolean Functions via the Fourier Transform, by Y. Mansour (1994).

Exercise 21: Prove Parseval's identity with the hint given in class.

Exercise 22: Prove that the Fourier transform of the Fourier transform of f is f.

Exercise 23: Prove Claim 1 in the proof of correctness of the (epsilon,d)-degree learning algorithm.

NO CLASS ON FRIDAY DEC. 3RD AND TUESDAY DEC. 7TH

We will resume our classes on Friday, Dec. 10th.