Research – Paper 192

Encoding Category Correlations into Bilingual Topic Modeling for Cross-Lingual Taxonomy Alignment

Tianxing Wu, Lei Zhang, Guilin Qi, Xuan Cui and Kang Xu

Research

clock_eventOctober 23, 2017, 11:00.
house Stolz 1
download Download paper (preprint)

Abstract

Cross-lingual taxonomy alignment (CLTA) refers to mapping each category in the source taxonomy of one language onto a ranked list of most relevant categories in the target taxonomy of another language. Recently, vector similarities depending on bilingual topic models have achieved the state-of-the-art performance on CLTA. However, these models only model the textual context of categories, but ignore explicit category correlations, such as correlations between the categories and their co-occurring words in text or correlations among the categories of ancestor-descendant relationships in a taxonomy. In this paper, we propose a unified solution to encode category correlations into bilingual topic modeling for CLTA, which brings two novel category correlation based bilingual topic models, called CC-BiLDA and CC-BiBTM. Experiments on two real-world datasets show our proposed models significantly outperform the state-of-the-art baselines on CLTA (at least +10.9% in each evaluation metric).

2
Leave a Reply (Click here to read the code of conduct)

avatar
1 Comment threads
1 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
2 Comment authors
Tianxing WuSvitlanaRecent comment authors
  Subscribe  
newest oldest most voted
Notify of
Svitlana
Guest
Svitlana

Bilingual topic models: cc-BiBTM & cc-BiLDA for cross-lingual taxonomy alignment modeling category distribution with structural correlations based on path length

Tianxing Wu
Guest
Tianxing Wu

Hi, thanks for your comments! There do exist three kinds of category correlations: co-occurrence correlations, the structural correlations based on information content, and the structural correlations based on path length. More detail will be given in the paper, 🙂