What tools do you use for gathering training data?
Open data sets are fantastic, but some problems require very diverse and broad spectrum content that hasn’t been gathered and vetted for various reasons. Sometimes it’s not enough to rely on search engines alone to find data, especially if the content may be more prevalent on the deep/dark websites or if you can’t think of every possible thing to look for.
You can write scripts to gather data all day, but it seems like such big jobs need big tools and services for data acquisition. What does your team use?