Development

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • Statistical significance of a classifier

    1 answers - 1374 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    From: Martin C. Martin
    Hi,
    I have a bunch of data points x from two classes A & B, and
    I'm creating
    a classifier. So I have a function f(x) which estimates the
    probability
    that x is in class A. (I have an equal number of examples of
    each, so
    p(class) = 0.5.)
    way of seeing how well this does is to compute the error
    rate on the
    test set, i.e. if f(x)>0.5 call it A, and see how many times I
    misclassify an item. That's what MASS does. But we should
    Surely you mean `99% of dataminers/machine learners' rather than `MASS'?
    be able to
    do better: misclassifying should be more of a problem if the
    regression
    is confident then if it isn't.
    How can I show that my f(x) = P(x is in class A) does better
    than chance?
    It depends on what you mean by `better'. For some problem, people are
    perfectly happy with misclassifcation rate. For others, the estimated
    probabilities count a lot more. possibility is to look at the RC
    curve. Another possibility is to look at the calibration curve (see MASS
    the book).
    Andy
    Thanks,
    Martin
    R-help (AT) stat (DOT) math.ethz.ch mailing list
    PLEASE do read the posting guide!
    R-help (AT) stat (DOT) math.ethz.ch mailing list
    PLEASE do read the posting guide!
  • No.1 | | 2097 bytes | |

    Liaw, Andy wrote:

    >>From: Martin C. Martin
    >>
    >>Hi,
    >>
    >>I have a bunch of data points x from two classes A & B, and
    >>I'm creating
    >>a classifier. So I have a function f(x) which estimates the
    >>probability
    >>that x is in class A. (I have an equal number of examples of
    >>each, so
    >>p(class) = 0.5.)
    >>

    >way of seeing how well this does is to compute the error
    >>rate on the
    >>test set, i.e. if f(x)>0.5 call it A, and see how many times I
    >>misclassify an item. That's what MASS does. But we should

    >
    >>

    >
    >Surely you mean `99% of dataminers/machine learners' rather than `MASS'?


    That was my impression, but I didn't want to presume to speak for most
    dataminers/machine learners.


    >>be able to
    >>do better: misclassifying should be more of a problem if the
    >>regression
    >>is confident then if it isn't.
    >>
    >>How can I show that my f(x) = P(x is in class A) does better
    >>than chance?

    >
    >>

    >
    >It depends on what you mean by `better'. For some problem, people are
    >perfectly happy with misclassifcation rate. For others, the estimated
    >probabilities count a lot more. possibility is to look at the RC
    >curve. Another possibility is to look at the calibration curve (see MASS
    >the book).


    Thanks, those are getting closer to what I want. I think the bottom
    line is that I can't really assign a p-value the way I want to, since
    the problem I'm thinking of is ill-posed.

    Thanks,
    Martin

    [[alternative HTML version deleted]]

    R-help (AT) stat (DOT) math.ethz.ch mailing list

    PLEASE do read the posting guide!

Re: Statistical significance of a classifier


max 4000 letters.
Your nickname that display:
In order to stop the spam: 0 + 9 =
QUESTION ON "Development"

EMSDN.COM