What are the similarity measures available for binary data?
The following similarity measures are available for binary data: Russel and Rao. This is a binary version of the inner (dot) product. Equal weight is given to matches and nonmatches. This is the default for binary similarity data. Simple matching. This is the ratio of matches to the total number of values.
What is a similarity coefficient?
Similarity measures between objects that contain only binary attributes are called similarity coefficients, and typically have values between 0 and 1. A value of 1 indicates that the two objects are completely similar, while a value of 0 indicates that the objects are not at all similar.
Why are similarity and dissimilarity important in data mining?
Similarity Measures Similarity and dissimilarity are important because they are used by a number of data mining techniques, such as clustering nearest neighbor classification and anomaly detection. The term proximity is used to refer to either similarity or dissimilarity.
How to measure similarity/dissimilarity?
The way you measure the similarity/dissimilarity will depend on the data set you have, more precisely the nature of your data set, and what exactly you want to do (distance/dependance/correlation/difference of distribution ...).