AdventureTime25
New member
- Joined
- Jan 28, 2015
- Messages
- 2
I found an article (starting on page 8) that gives a neat method for finding the line/plane/hyperplane that maximizes the separation between two groups of data points in n-dimensions. It uses Fisher's linear discriminant as a measure of the separation and then finds the solution that maximizes that expression using some clever substitutions and eigenvectors. I was really happy to find this method, since it would be a much more efficient method than what I've been doing in my computer program: brute-force looping through every possible angle in n-space.
However, I ran into a problem on page 9, where the author states "assuming the eigenvalues are nonzero..." and doesn't explain what to do when one or more eigenvalues are zero. Literally the first simple data set I tested had a zero eigenvalue in one of the matrices using this method:
data set 1 ={(0, 1), (1, 2), (2, 3)}
data set 2 ={(0, -1), (1, 0), (2, 1)}
The solution to this data set should be the vector (-1, 1). If you take the projection of each point onto that vector, the distance between the two data sets is maximized. However, the author's method fails for this data.
The substitution trick the author describes fails when an eigenvalue in one of the matrices is zero, but I suspect that a zero eigenvalue may imply something about the data that would simplify the problem. For example, the data set above is a set of two parallel lines. If I change one point slightly so that they are no longer parallel lines, the eigenvalue of the resulting matrix is no longer zero and I can solve the problem. However, I need a robust computer program that can find a solution for any data set.
Can anyone help me out or point me to a resource that explains what to do in this case?
However, I ran into a problem on page 9, where the author states "assuming the eigenvalues are nonzero..." and doesn't explain what to do when one or more eigenvalues are zero. Literally the first simple data set I tested had a zero eigenvalue in one of the matrices using this method:
data set 1 ={(0, 1), (1, 2), (2, 3)}
data set 2 ={(0, -1), (1, 0), (2, 1)}
The solution to this data set should be the vector (-1, 1). If you take the projection of each point onto that vector, the distance between the two data sets is maximized. However, the author's method fails for this data.
The substitution trick the author describes fails when an eigenvalue in one of the matrices is zero, but I suspect that a zero eigenvalue may imply something about the data that would simplify the problem. For example, the data set above is a set of two parallel lines. If I change one point slightly so that they are no longer parallel lines, the eigenvalue of the resulting matrix is no longer zero and I can solve the problem. However, I need a robust computer program that can find a solution for any data set.
Can anyone help me out or point me to a resource that explains what to do in this case?