have a problem with my work. Can you please help me with this problem when you have time?
Problem:
I have a large dataset of questions and answers, in the dataset where a user enters a new question, I need to get the answer of a question that is similar to his question
What I did:
I used cosine similarity but the accuracy is very low
What should I do now? I would be very grateful if you could help me #question #nlp
Just an idea out of the blue 🤪 have you tried Euclidean distance or dot product insteaad of cosine similarity?
Thank you for your attention, I will try it now
1) What is the language of the questions? 2) What features did you use to compute the vectors that you later compare with cosine similarity? For English questions, one of the best encoders to get the vectors is https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-dot-v1. For 50 other languages, a good model is https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2, but it is not optimized for questions specifically.
Uzbek is not in that list, as far as I know. But it is included, for example, in LaBSE: https://huggingface.co/sentence-transformers/LaBSE.
thank you for attention
Обсуждают сегодня