WSEAS Transactions on Information Science and Applications
Print ISSN: 1790-0832, E-ISSN: 2224-3402
Volume 11, 2014
Semi-Supervised Taxonomy Aware Integration of Catalogs
Authors: ,
Abstract: The main task of online commercial portals and business search engines is the integration of products coming from various providers to their product catalog. The commercial portal has its own master taxonomy while each data provider classifies the products into provider taxonomy. Classification of products from the data provider into the master catalog by using the data provider’s taxonomy information is done by classifying the products based on their textual representations by using a simple text based classifier and then using the taxonomy information to adjust the results of the classifier to make sure that the products that are tied together in the provider catalog remain close in the master catalog. The taxonomy aware calibration takes place by tuning the values of three parameters k, θ and γ respectively. The major problem in classifying the products into the master taxonomy is the ability to identify candidate products for labeling. In this paper, we propose a Semi supervised learning methodology to overcome this problem by incrementally retraining the base classifier with parameters chosen during the taxonomy-aware calibration. Semi-supervised learning is a learning standard which deals with the study of how computers and natural systems such as human beings acquire knowledge in the presence of both labeled and unlabeled data. The proposed system finds each candidate parameter θi and then finds the optimal parameter γ such that the accuracy on the validation set is at the maximum. An experimental result shows that the Semi supervised learning algorithm is efficient and thus applicable to the large data sets on the web.
Search Articles
Pages: 169-176
WSEAS Transactions on Information Science and Applications, ISSN / E-ISSN: 1790-0832 / 2224-3402, Volume 11, 2014, Art. #18