In the new Release it says that Text Features

Question

In the new Release it says that Text Features

are now supported in the R Version. But the load_pool function has no mention of Text Features. How can Text Features be used then ? Are there any Plans to include embeddings Features in the R Version too ?

#english #programming

0

28.08.2021

2 ответов

136 просмотров

Thomas Wolf Автор вопроса

Thanks for the Info. That's actually quite user friendly, especially easy to use with R ML packages i.e mlr3

0

31.08.2021

Mikhail Rudakov · Accepted Answer

Hello! Currently, one can use text features only when providing dataset in data.frame. All columns that contain character values (not factors!) are considered as text columns. Simple example of such usage: dfTrain <- data.frame(height=c(150,120, 30),weight=c(200, 220, 150), phrase=c('hello good I am good I hello good', 'good I hello I am good hello','bad bad bad bad'), eye=c(2,1,15), y_train=c(0, 0, 1)) dfTrainx<- dfTrain[,!(names(dfTrain) %in% c('y_train'))] labels<-dfTrain[,c('y_train')] pool <- catboost.load_pool(data=dfTrainx, label=labels) params <- list( loss_function= 'Logloss', iterations = 100 ) model <- catboost.train(pool, params=params) One more thing to mention: If texts in your dataset are too small, you can face the following error: catboost/private/libs/feature_estimator/text_feature_estimators.cpp:89: Dictionary size is 0, check out data or try to decrease occurrence_lower_bound parameter This means that too few word combinations(n-grams) have been found. By default, occurence_lower_bound is 3, so you should have at least 3 repetitions for some 2-word ngram. Unfortunately, changing this parameter is not yet supported

169 похожих чатов

In the new Release it says that Text Features

2 ответов

Похожие вопросы