www.emsdn.com
Class Profile: Home »» Development [Development] under "Development" »»» Statistical significance of a classifier

Statistical significance of a classifier


From: Martin C. Martin
Hi,
I have a bunch of data points x from two classes A & B, and
I'm creating
a classifier. So I have a function f(x) which estimates the
probability
that x is in class A. (I have an equal number of examples of
each, so
p(class) = 0.5.)
way of seeing how well this does is to compute the error
rate on the
test set, i.e. if f(x)>0.5 call it A, and see how many times I
misclassify an item. That's what MASS does. But we should
Surely you mean `99% of dataminers/machine learners' rather than `MASS'?
be able to
do better: misclassifying should be more of a problem if the
regression
is confident then if it isn't.
How can I show that my f(x) = P(x is in class A) does better
than chance?
It depends on what you mean by `better'. For some problem, people are
perfectly happy with misclassifcation rate. For others, the estimated
probabilities count a lot more. possibility is to look at the RC
curve. Another possibility is to look at the calibration curve (see MASS
the book).
Andy
Thanks,
Martin
R-help (AT) stat (DOT) math.ethz.ch mailing list
PLEASE do read the posting guide!
R-help (AT) stat (DOT) math.ethz.ch mailing list
PLEASE do read the posting guide!


No. 1# | By Developer Tags User at [2008-5-12] | size: 2097 bytes

Liaw, Andy wrote:

>>From: Martin C. Martin
>>
>>Hi,
>>
>>I have a bunch of data points x from two classes A & B, and
>>I'm creating
>>a classifier. So I have a function f(x) which estimates the
>>probability
>>that x is in class A. (I have an equal number of examples of
>>each, so
>>p(class) = 0.5.)
>>

>way of seeing how well this does is to compute the error
>>rate on the
>>test set, i.e. if f(x)>0.5 call it A, and see how many times I
>>misclassify an item. That's what MASS does. But we should

>
>>

>
>Surely you mean `99% of dataminers/machine learners' rather than `MASS'?


That was my impression, but I didn't want to presume to speak for most
dataminers/machine learners.


>>be able to
>>do better: misclassifying should be more of a problem if the
>>regression
>>is confident then if it isn't.
>>
>>How can I show that my f(x) = P(x is in class A) does better
>>than chance?

>
>>

>
>It depends on what you mean by `better'. For some problem, people are
>perfectly happy with misclassifcation rate. For others, the estimated
>probabilities count a lot more. possibility is to look at the RC
>curve. Another possibility is to look at the calibration curve (see MASS
>the book).


Thanks, those are getting closer to what I want. I think the bottom
line is that I can't really assign a p-value the way I want to, since
the problem I'm thinking of is ill-posed.

Thanks,
Martin

[[alternative HTML version deleted]]

R-help (AT) stat (DOT) math.ethz.ch mailing list

PLEASE do read the posting guide!



Development Hot!

Development New!


Copyright © 2008 www.emsdn.com • All rights reserved • CMS Theme by www.emsdn.com - 0.313