Yen-Liang Chen, Li-Chen Cheng, and Yun-Ling Cheng. - Using position, fonts and cited references to retrieve scientific documents. Journal of Information Science 2007 33: 492-508
As more and more documents become available on the internet, finding documents that fit users' needs is becoming increasingly important. A scientific document is a structured text and has some features that can be used to improve retrieval. In this work, fonts, position and cited references are investigated to this aim. These three factors together can improve retrieval performance. This work first investigates the relationships among them, and then uses them to design a novel retrieval method based on the discovered relationships. Empirical results show that using the location factor alone achieves the same performance as considering location and font factors simultaneously. Citation similarity is useful only when the similarity is high.