應用基於排序相關係數之特徵向量轉換於資訊檢索
排序模型學習

 

Feature-Level Context Transformation for Learning to Rank for Information Retrieval Based on Rank-Order Correlation

 

葉鎮源[1]

Jen-Yuan Yeh

國立自然科學博物館營運典藏與資訊組

館前路1

中市404北區

jenyuan@mail.nmns.edu.tw

 

林忠億

Jung-Yi Lin

鴻海精密工業股份有限公司創新數位系統事業群

基湖路32

臺北市114內湖區

jungyilin@gmail.com

 

鄭培成

Pei-Cheng Cheng

健行科技大學資訊管理學系

健行路229

桃園縣320中壢市

pccheng@uch.edu.tw

 

楊維邦

Wei-Pang Yang

國立東華大學資訊管理學系

大學路二段1

花蓮縣974壽豐鄉

wpyang@mail.ndhu.edu.tw

 

 

摘要

排序模型學習一般以特徵向量表示輸入的樣本資料,運用機器學習對已正確排序(或分類)的資料分析歸納,自動建構出有效的排序模型或規則。本文基於排序相關係數(Rank-Order Correlation Coefficient)計算不同特徵在排序特性的關聯程度,並藉由特徵關聯作中介轉換,將特徵向量映射到特徵關聯空間(Feature Correlation Space),使得特徵向量由一階表徵(First-Order Raw Representation)提升到二階表徵(Second-Order Context Representation)。同時,以二階表徵特徵向量作為資料表示,整合RankSVM建構線性函式(Linear Function)形式的二元分類器,用來判別兩兩文件是否為正確排序,進而推導建立所有文件的排序序列。實驗使用LETOR 4.0資料集驗證本文所提方法的可行性,評估指標選用MAPMeanNDCG,並以RankSVM為基準,比較二階表徵特徵向量對於排序模型學習的影響,評估結果顯示本文所提方法的可行性。

 

關鍵詞: 文件檢索、排序模型學習、特徵關聯計算、特徵向量二階表徵轉換、排序預測與評估。

 

Abstract

In practice, methods of learning to rank for information retrieval typically represent training instances as vectors of features and exploit supervised learning to automatically produce an effective ranking model (or retrieval function). This paper measures relationship between features using rank-order correlation coefficients, based on which second-order context vectors are derived by projecting first-order raw vectors into the feature correlation space. A novel learning method is then proposed, based on second-order context vectors, to train a ranking model (in form of linear function) by RankSVM. The proposed learning method was evaluated using the LETOR 4.0 dataset and found to perform well, in terms of metrics of MAP and MeanNDCG.

 

Keywords: document retrieval, learning to rank, feature correlation extraction, feature-level second-order representation transformation, ranking prediction and evaluation

 



[1] 本文通訊作者。