As the commonly used representation of a feature-rich data object has evolved from a single feature vector to a set of feature vectors, a key challenge in building a content-based search engine for feature-rich data is to match feature-sets efficiently. Although substantial progress has been made during the past few years, existing approaches are still inef- ficient and inflexible for building a search engine for massive datasets. This paper presents a randomized algorithm to embed a set of features into a single high-dimensional vec- tor to simplify the feature-set matching problem. The main idea is to project feature vectors into an auxiliary space using locality sensitive hashing and to represent a set of features as a histogram in the auxiliary space. A histogram is simply a high dimensional vector, and efficient similarity measures like L1 and L2 distances can be employed to approximate feature-set distance measures. We evaluated the proposed approach under three differ- ent task settings, i.e. content-based image search, image object recognition and near-duplicate video clip detection. The experimental results show that the proposed approach is indeed effective and flexible. It can achieve accuracy com- parable to the feature-set matching methods, while requir- ing significantly less space and time. For object recognition with Caltech 101 dataset, our method runs 25 times faster to achieve the same precision as Pyramid Matching Kernel, the state-of-the-art feature-set matching method.