have an imbalance dataset,
To train a model i did the train test split based on y, did the stratified cross validation for model on train dataset, did the hyperparameter tuning. Now i have finalized model with tuned hyperparameter
To build final model, i'll have to train finalzed model with tuned hyperparameter on full data set which is original train + test.
Now my question is how should i give data to the model.
Do i have to provide original dataset which is imbalanced? or do i have to give balanced original dataset?
In my opinion, there is no yes/no kind of answer. If you’ve done stratified train/test split based on the feature you are trying to predict, then the final model where train+test is used as train would be imbalanced by setup. If you’ve employed over/under sampling techniques up front, then this is where you’ve done some balancing and thus I doubt the model would generalize that well
You give the data you used for hyperparameter tuning. Otherwise, you are giving another data, as this will be modified (balaned)
Yes, thank you. I have been thinking the same thing that i should use same data. Otherwise what is the point of doing hyperparameter tuning and cross validation. Any idea how can i see how much final model performance is deviated from cross validation score?
Обсуждают сегодня