Hey all, I work for a vertical AI startup in a highly specialized domain and recently led a project collecting expert labels to help train our machine learning models. How do you deal with lack of quality domain knowledge when bootstrapping and testing your models/products? What pain points do you face when dealing with it?
People misunderstood each other sometimes and it’s ok. I mean I am happy to clarify more for you via email. I vetted this question with multiple ml engineers at startups I know and had quite productive discussions on this.
What do you mean by “anything you want to improve in your approach”? Do you need input on how to approach this problem?
Sorry for the confusion, I meant area of improvement for your current approaching in solving this problem. For example, your current approach is too expensive, you’d like to bring the cost done; or it takes too long, or it’s too manual etc
Sorry I still don’t get what you are trying to ask here.
Maybe instead of asking the expensive high quality experts to label all the examples, perhaps you can ask them to write a set of rules. Then your bootstrap goal is to beat the performance of those rules. In normal scenarios like e commerce I'd go to production with either the rules or the models which beat the rules (or better A/B of both) and use the customer feedback to improve my models. But your domain is expensive though.
A second thought on rules, and this is pure speculation. ,Your data might have a natural 80-20 split . 80 percent of the cases can be approved/denied by rules while 20 percent need expert knowledge. You want to build models which are certain about the 80 percent "easy" cases and either deny or manual process the rest 20 percent. Then slowly increase your model exposure to the remaining 20% when you can afford it.
If my understanding of your question is correct, here are some suggestions that might help: 1. Provide prerequisite training material (perhaps also a test prior to the labeling work; those who failed the test may not conduct the work); 2. Build a simple and intuitive UI with clear instructions for the workers
“Expert labels” aka workers in India being paid $5.50 an hour to click on things?
No, more like review credit application and determine risk level of applicant, mostly tasks that require some domain knowledge not just common sense
This is also just a bad question overall filled with buzzwords. Pretty evident you don’t have a in-depth knowledge of ML