Q:

one-hot encoder that maps a column of category indices to a column of binary vectors

# one-hot encoder that maps a column of category indices to a column of binary vectors

stringIndexer = StringIndexer(inputCol="label", 
                              outputCol="indexed")
model = stringIndexer.fit(stringIndDf)
td = model.transform(stringIndDf)
encoder = OneHotEncoder(inputCol="indexed", 
                        outputCol="features")
encoder.transform(td).head().features
# SparseVector(2, {0: 1.0})
encoder.setParams(outputCol="freqs").transform(td).head().freqs
# SparseVector(2, {0: 1.0})
params = {encoder.dropLast: False, encoder.outputCol: "test"}
encoder.transform(td, params).head().test
# SparseVector(3, {0: 1.0})
onehotEncoderPath = temp_path + "/onehot-encoder"
encoder.save(onehotEncoderPath)
loadedEncoder = OneHotEncoder.load(onehotEncoderPath)
loadedEncoder.getDropLast() == encoder.getDropLast()
# True
0

New to Communities?

Join the community