3D transesophageal echocardiography (TEE) is one of the most significant advances in cardiac imaging. Although TEE provides real-time three-dimensional (3D) visualization of heart tissues and blood vessels and has no ionizing radiation, X-ray fluoroscopy still dominates in guidance of cardiac interventions due to TEE having a limited field of view and poor visualization of surgical instruments. Therefore, fusing 3D echo with live X-ray images can provide a better guidance solution. This paper proposes a novel framework for image fusion by detecting the pose of the TEE probe in X-ray images in real-time. The framework does not require any manual initialization. Instead it uses a cascade classifier to compute the position and in-plane rotation angle of the TEE probe. The remaining degrees of freedom (DOFs) are determined by fast marching against a template library. The proposed framework is validated on phantoms and patient data. The target registration error (TRE) for the phantom was 2.1 mm. In addition, 10 patient datasets, seven of which were acquired from cardiac electrophysiology procedures and three from trans-catheter aortic valve implantation procedures, were used to test the clinical feasibility as well as accuracy. A mean registration error of 2.6 mm was achieved, which is well within typical clinical requirements.