A feature function that directly maps an HCatRecord to a feature vector. Each column becomes a feature in the vector,
with the value of the feature obtained using the value mapper for that column
ColumnFeatureFunction(int[] featurePositions,
FeatureValueMapper[] valueMappers,
int labelColumnPos,
int numFeatures,
double defaultLabel)
Feature positions and value mappers are parallel arrays.
public ColumnFeatureFunction(int[] featurePositions,
FeatureValueMapper[] valueMappers,
int labelColumnPos,
int numFeatures,
double defaultLabel)
Feature positions and value mappers are parallel arrays. featurePositions[i] gives the position of ith feature in
the HCatRecord, and valueMappers[i] gives the value mapper used to map that feature to a Double value
Parameters:
featurePositions - position number of feature column in the HCatRecord
valueMappers - mapper for each column position
labelColumnPos - position of the label column
numFeatures - number of features in the feature vector
defaultLabel - default lable to be used for null records
Method Detail
call
public org.apache.spark.mllib.regression.LabeledPoint call(scala.Tuple2<org.apache.hadoop.io.WritableComparable,org.apache.hive.hcatalog.data.HCatRecord> tuple)
throws Exception
Specified by:
call in interface org.apache.spark.api.java.function.Function<scala.Tuple2<org.apache.hadoop.io.WritableComparable,org.apache.hive.hcatalog.data.HCatRecord>,org.apache.spark.mllib.regression.LabeledPoint>