Towards A General Technique for Transformation of Nominal Features into Numeric Features in Supervised Learning

Zdravevski, Eftim and Lameski, Petre and Kulakov, Andrea (2012) Towards A General Technique for Transformation of Nominal Features into Numeric Features in Supervised Learning. In: Proceedings of the Nineth Conference on Informatics and Information Technology. Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, Macedonia, Skopje, Macedonia, pp. 133-138. ISBN 978-608-4699-01-9

[img]
Preview
Text
978-608-4699-01-9_pp133-138.pdf

Download (196kB) | Preview
Official URL: http://ciit.finki.ukim.mk

Abstract

Almost all of the machine learning problems require data preprocessing. This stage is especially important for problems where the datasets contain features of mixed types (i.e. nominal and numeric). An often practice in such cases is to transform each nominal features into many dummy (i.e. binary) features. Also many classification algorithms have preference of numeric attributes over nominal attributes, and sometimes the distance between different data points cannot be estimated if the values of the attributes are not numeric and normalized. One way to transform nominal into numeric features is to use the Weight of Evidence (WoE) technique. WoE has some properties that make it very useful tool for transformation of attributes, but unfortunately there are some preconditions that need to be met in order to calculate it. Additionally WoE originally works only on supervised learning problems where data is labelled with two classes. In this paper we propose modified calculation of the Weight of Evidence that overcomes these preconditions, and additionally makes it usable for test examples that were not present in the training set. The proposed transformation can be used for all supervised learning problems and arbitrary number of classes. This paper establishes the theoretical background for such modifications, and does not present any comparative results with other similar techniques.

Item Type: Book Section
Uncontrolled Keywords: Weight of evidence, WoE, data transformation, nominal features, numeric features, smoothing, multiclass supervised learning
Subjects: International Conference on Informatics and Information Technologies > Intelligent Systems
International Conference on Informatics and Information Technologies > Robotics
International Conference on Informatics and Information Technologies > Bioinformatics
Depositing User: Vangel Ajanovski
Date Deposited: 28 Oct 2016 00:15
Last Modified: 28 Oct 2016 00:15
URI: http://eprints.finki.ukim.mk/id/eprint/11093

Actions (login required)

View Item View Item