Package smile.data

Data and attribute encapsulation classes.

See: Description

Package smile.data Description

Data and attribute encapsulation classes. A data is a set of datum objects, which are usually defined by attribute-value pairs. The datum object could be very sparse and thus is stored in a list to save space. A datum object may have an associated class label (for classification) or real-valued response value (for regression). Optionally, a datum object or attribute may have a (positive) weight value, whose meaning depends on applications. However, most machine learning methods are not able to utilize this extra weight information. There are, generally speaking, two major types of attributes:
Qualitative variables:
The data values are non-numeric categories. Examples: Blood type, Gender.
Quantitative variables:
The data values are counts or numerical measurements. A quantitative variable can be either discrete such as the number of students receiving an 'A' in a class, or continuous such as GPA, salary and so on.
Another way of classifying data is by the measurement scales. In statistics, there are four generally used measurement scales:
Nominal data:
data values are non-numeric group labels. For example, Gender variable can be defined as male = 0 and female =1.
Ordinal data:
data values are categorical and may be ranked in some numerically meaningful way. For example, strongly disagree to strong agree may be defined as 1 to 5.
Continuous data:
Interval data: data values are ranged in a real interval, which can be as large as from negative infinity to positive infinity. The difference between two values are meaningful, however, the ratio of two interval data is not meaningful. For example temperature, IQ.
Ratio data: both difference and ratio of two values are meaningful. For example, salary, weight.
Author:
Haifeng Li

Copyright © 2015. All rights reserved.