are now supported in the R Version. But the load_pool function has no mention of Text Features. How can Text Features be used then ? Are there any Plans to include embeddings Features in the R Version too ?
Hello! Currently, one can use text features only when providing dataset in data.frame. All columns that contain character values (not factors!) are considered as text columns. Simple example of such usage: dfTrain <- data.frame(height=c(150,120, 30),weight=c(200, 220, 150), phrase=c('hello good I am good I hello good', 'good I hello I am good hello','bad bad bad bad'), eye=c(2,1,15), y_train=c(0, 0, 1)) dfTrainx<- dfTrain[,!(names(dfTrain) %in% c('y_train'))] labels<-dfTrain[,c('y_train')] pool <- catboost.load_pool(data=dfTrainx, label=labels) params <- list( loss_function= 'Logloss', iterations = 100 ) model <- catboost.train(pool, params=params) One more thing to mention: If texts in your dataset are too small, you can face the following error: catboost/private/libs/feature_estimator/text_feature_estimators.cpp:89: Dictionary size is 0, check out data or try to decrease occurrence_lower_bound parameter This means that too few word combinations(n-grams) have been found. By default, occurence_lower_bound is 3, so you should have at least 3 repetitions for some 2-word ngram. Unfortunately, changing this parameter is not yet supported
Thanks for the Info. That's actually quite user friendly, especially easy to use with R ML packages i.e mlr3
Обсуждают сегодня