Monday, November 26, 2007

[Thinking Cap] on learning.. (May be the last or penultimate chance to don that cap) [4th qn added]

Qn 4. added below.


Qn 0. [George Costanza qn] Consider two learners that are trying to solve the same classification problem with two classes (+  and -). L1 seems to be averaging about 50% accuracy on the test cases while L2 seems to be averaging 25% accuracy. Which learner is good?  Why is this called the George Costanza question? ;-)
 
 
Qn 1. Consider a scenario where the training set examples have been labeled by a slightly drunk teacher--and thus they sometimes have wrong labels (e.g. +ve are wrongly labelled negative etc.).  Of course, for the learning to be doable, the percentage of these mislabelled instances should be quite small.  We have two learners, L1 and L2. L1 seems to be 100% correct on the *training* examples. L2 seems to be 90% correct on the training examples. Which learner is likely to do well on test cases?
 
Qn 2.  Compression involves using the pattern in the data to reduce the storage requirements of the data.  One way of doing this would be to find the rule underlying the data, and keep the rule and throw the data out. Viewed this way, compression and learning seem one and the same. After all, learning too seems to take the training examples, find a hypothesis ("pattern"/"rule") consistent with the examples, and use that hypothesis instead of the training examples.  What, if any, differences do you see between Compression and Learning?
 
Qn 3. We said that most human learning happens in the context of prior knowledge. Can we view prior knowledge as a form of bias?
In particular, can you say that our prior knowledge helps us focus on certain hypotheses as against other ones in explaining the data?
 
 
Qn 4.  We test the effectiveness of a learner on the test cases. Obviously, the test can be made "unfair" in that the test examples are somehow unconnected with training examples. 
 
One way of ensuring fairness, the the learner would love, is to pick a subset of training examples and give them back as test examples. (This is like your good old grade school days where the teacher would give "review" sheet and the exam
will ask a subset of questions from the review sheet). Obviously, this makes the test a litle too dull (from the sadistic teacher's point of view). 
 
A way of ensuring fairness that the teacher would love is to actually give test cases that have not been seen in the training cases (e.g. set exams that don't just repeat homework questions). However, it is too easy to get into unfair tests this way.
 
***What is the most general restriction you can put on the test so that it is considered "FAIR"? Can you argue that your definition works well with your gut feelign about "fair" vs. "unfair" exams/tests?
 
 (Anecdote on "unfair tests":  When my wife took a class on algorithms and complexity in her UofMD days, the teacher--who I won't name--spent one class at the end of the semester on NP-completeness proofs, and set the entire final exam on NP-completeness proofs. She never realized how riled up she was about this until she met another student from the same class, who is a faculty member at Duke at a DARPA meeting 15 years later--and found themselves talking about the sheer unfairness of the test in the first couple of seconds. Apparently we remember unfair tests quite well ;-).
 
that is all for now.
Rao
 

6 comments:

sukru said...

First question has a trick. A 50% accuracy value in a boolean scenario is "totally clueless guessing" (you'd always be expected to have %50 error anyway).

On the other hand, negating the output of a 25% accurate learner will actually give us a 75% accurate learner!

Subbarao Kambhampati said...

Sukru is right in his answer to the zeroth question of course. The worst learner is one which has 50% accuracy in a 2-class classification problem. A 25% accuracy learner actually can be used to get 75% accuracy (by just doing the opposite of what it says).


I called it the "Costanza question" in honor of the Seinfeld episode called "Opposite" where George decides to do the opposite of what his instincts tell him to do since as he puts it

"Why did it all turn out like this for me? I had so much promise. I was personable, I was bright. Oh, maybe not academically speaking, but ... I was perceptive. I always know when someone's uncomfortable at a party. It became very clear to me sitting out there today, that every decision I've ever made, in my entire life, has been wrong. My life is the opposite of everything I want it to be. Every instinct I have, in every of life, be it something to wear, something to eat ... It's all been wrong."

http://www.seinfeldscripts.com/TheOpposite.htm

Ina said...

Q.1. I would guess L2 would be better as L1 fits the incorrectly labeled data exactly. That means, there are rules learnt by L1 that are definitely bogus & it would make inaccurate predictions/classifications on the test cases too.
Q.3. Yes, our biases let us accept ideas that would be consistent with our existing knowledge base readily. Whereas ideas which may be true but inconsistent with our knowledge base maybe explained away with other "reasonings" (consistent with our knowledge base) or even ignored.
Q.4. Fair Exam would be one that has questions solvable by the application of rules/concepts learned in the course, giving due importance to the time spent on grasping a particular topic (to be tested).
-Ina

sukru said...
This comment has been removed by the author.
sukru said...

(typo fix)

For question four, the idea could be making sure the "topic distribution" (i.e.: output value frequencies) are the same in training and test data sets.

For example if we have 80% 1s and 20% 0s in training data of a binary classifier, then having 50% 1s and 50% 0s in the test data would be unfair. We can fix it by removing 3/4 of the 0s to balance.

Tuan A. Nguyen said...

Q2. The difference between compression and learning:

The patterns/rules found from the original (somehow like "traning") data in compression are used to represent the data themselves in the most compact manner. If we are given another data, the new corresponding patterns must be extracted.

On the other hand, what an agent learn from training data are used to improve its performance in new situation (e.g, a face recognition system must be tested with a new set of face images).