MLE Interview question how do you fine tune your embedding model if your model has low recall and precision?

Question

Question
You are trying to create semantic search for your company in certain domain like legal or e commerce 
You have a model like bert or transformer from hugginface 
You feed in your corpus to the model to generate embedding 
You perform knn on sample queries and found your recall and precision is bad
What do you do?

Not too sure how to approach this besides 
Train on larger corpus that is more domain specific
Use larger embedding dimensions
What else can I say

poopspal · Accepted Answer

You can use stacked approach on top of candidate retrieval..add a lr model or NN to predict rank of document ?..
Perform query expansion on query to add more context specific fields (RAG IF LLM).

elcamino9 · Answer

look at lectures on contrastive methods

qlvp · Answer

Few buckets to experiment with -
1. Model used to generate embeddings - try other SOTA models to generate embeddings
2. How you chunked your corpus e.g. sentence, paragraph, tokenization, etc to generate embeddings
3. All tunable stuff in knn - distance metric, choice of k

Sad-GPT · Answer

I'm not a hardcore ML Engineer but these are my thoughts:

If you use larger embedding dimensions you risk running into the curse of dimensionality, which KNN is especially vulnerable to. On that front I think you should go with a new approach towards feature engineering.

Something else I would consider is looking at the distance metric your KNN model uses and evaluate your options, perhaps use a different model altogether, especially because you can't really interpret KNN wrt the target's relationship with the data.

Industries

Job Groups

General Topics

MLE Interview question how do you fine tune your embedding model if your model has low recall and precision?

Sponsored

Most Read

MLE Interview question how do you fine tune your embedding model if your model has low recall and precision?

Most Read