The Earth Mover's Distance (EMD) is one of the most-widely used
distance functions to measure the similarity between two multimedia objects.
While providing good search results, the EMD is too much time consuming to be
used in large multimedia databases. To solve the problem, we propose an
approximate k-nearest neighbor (k-NN) search method based on the EMD. In the
proposed method, the overhead for both disk accesses and EMD computations is
reduced significantly, thanks to the approximation. First, the proposed method
builds an index using the M-tree, a distance-based multi-dimensional index
structure, to reduce the disk access overhead. When building the index, we reduce
the number of features in the multimedia objects through dimensionalityreduction. When performing the k-NN search on the M-tree, we find a small set of
candidates from the disk using the index and then perform the post-processing on
them. Second, the proposed method uses the approximate EMD for index retrieval
and post-processing to reduce the computational overhead of the EMD. To
compensate the errors due to the approximation, the method provides a way of
accuracy improvement of the approximate EMD. We performed extensive
experiments to show the efficiency of the proposed method. As a result, the
method achieves significant improvement in performance with only small errors:
the proposed method outperforms the previous method by up to 67.3% with only
3.5% error.