Has anyone been assigned a task that requires you to build a model, where the available training data isn’t great (lots of nulls, features not correlated with target)? What do you do when you just can’t achieve good performance metrics on a model?
Lots of data cleanup
You are talking about 90% of the real world DS problems😛
Does every data scientist have a therapist on standby or what
You might already know, but it will help to understand how the data was collected. If you know how the underlying distribution looks, you can do a Monte Carlo simulation to simulate your responses and use that to build your model. Always know how much random effect is affecting your dataset before you simulate your responses. Extensive EDA combine that with domain knowledge will help you do wonders. Good luck
what methods are you using to figure out how much random effect is affecting your dataset?
This means that you do not have adequate data. You need to seek the data.
How to predict stock prices?
Talk to the stakeholders and explain the situation
Have you tried feature engineering. Maybe you can formulate features from the existing features which may have a better correlation.
I’d probably break down the features to the model that are derived from the data. Just univariate tests to show predictive value would be enough to show feasibility. After that, you want to figure out next steps. From what it sounds, you should come up with plans to improve data collection. If that seems like a dead end then look to other avenues to add value to the business.
Garbage in garbage out