Tourism promotional videos are a popular way to inform potential tourists about tourist destinations, especially given that people can now watch a variety of videos on various online platforms. Multimodal discourse analysis has in recent years emerged as an effective tool for analyzing different mediums of communication, including videos and images. The aim of this study was to investigate the ideational and representational meaning of the tourism promotional video Xi’an China and explore the relationship between the verbal and visual elements from the perspective of multimodal discourse analysis. The study made use of a qualitative analysis research design, and specifically employed: a transitivity analysis on the verbal data based on Systemic Functional Linguistics; an exploration of the representational meaning of the visual data based on Visual Grammar; and Royce’s Intersemiotic Complementarity to analyze the relationship between the verbal texts and visual images of the Xi’an China promotional video. The data obtained from these processes were analyzed using NVIVO 12. The study found that the video’s verbal modality conveyed information related to the history, culture, landscape, trade, industry and development of Xi’an; the visual modality represented the natural beauty, entertainment, events happening, and changes in Xi’an from the past to present time; thirdly, the most frequently used semantic relations between verbal and visual modes were found to be repetition and synonymy. The cooperation of the two modes helped to construct the overall meaning of the Xi’an China TPV, thereby publicizing the tourist appeal of Xi’an.