Abstract
In multimodal applications, the corresponding image data for a piece of text is not always directly available. However, for the Chinese language, since Chinese characters originate from pictorial forms, character images are closely linked to textual information. To extract useful information from different modalities, this paper proposes a model based on character-image modality and multi-label auxiliary information to improve the accuracy of Chinese sentiment analysis. First, oracle bone script characters are treated as an image modality. A multi-input, multi-output network architecture is then designed to handle both text sentiment analysis and speech recognition tasks. The overall text sentiment analysis task is divided into three unimodal and one multimodal sentiment analysis sub-tasks. These tasks utilize textual features, image features, text-image features, and speech features. Finally, a multi-task joint training framework guided by multi-labels is constructed, including a parameter-sharing optimization mechanism. The speech recognition labels and text sentiment analysis labels guide the model's attention to these features. Comparison and ablation experiments demonstrate competitive performance, indicating that even without a traditional image modality dataset, the proposed model effectively advances Chinese sentiment analysis research and enhances the generalization capability of multimodal models across different datasets.
| Original language | English |
|---|---|
| Pages (from-to) | 144632-144649 |
| Number of pages | 18 |
| Journal | IEEE Access |
| Volume | 13 |
| DOIs | |
| Publication status | Published - 13 Aug 2025 |
Bibliographical note
This work is licensed under a Creative Commons Attribution 4.0 License.Funding
This work was supported in part by the General Project of Wenzhou Science and Technology Bureau under Grant S2023013, in part by the Second Batch of Teaching Reform Projects for Zhejiang Province’s Higher Vocational Education During the ‘‘14th Five-Year Plan’’ Period under Project jg20240277, in part by the Major Project of Wenzhou Science and Technology Bureau under Grant ZG2023022, and in part by the Open Project of the State Key Laboratory of Computer Aided Design and Computer Graphics of Zhejiang University under Grant A2313.
| Funders | Funder number |
|---|---|
| Science and Technology Plan Project of Wenzhou Municipality | jg20240277, S2023013, ZG2023022 |
| State Key Laboratory of Computer Aided Design and Computer Graphics | A2313 |
Keywords
- Chinese sentiment analysis
- Multimodal
- character image modality
- feature fusion
- joint training
ASJC Scopus subject areas
- General Computer Science
- General Materials Science
- General Engineering