Audio-visual (AV) biometrics offer complementary information sources, and the use of both voice and facial images for biometric authentication has recently become economically feasible. Therefore, multi-modality adaptive fusion, combining audio and visual information, offers an efficient tool for substantially improving the classification performance. In terms of implementation, we propose to integrate an audio classifier (based on Gaussian mixture models) and a visual classifier (based on FaceIT, a commercially available software) into a well-established mixture-of-expert fusion architecture. In addition, a consistent fusion strategy is introduced as a baseline fusion scheme, which establishes the lower bound of the "consistent region" in the FAR-FRR ROC. Our simulation results indicate that the prediction performance of the proposed adaptive fusion schemes fall in the consistent region. More importantly, the notion of consistent fusion can also facilitate the selection of the best modalities to fuse.