What tools do you use for gathering training data?

Question

Open data sets are fantastic, but some problems require very diverse and broad spectrum content that hasn’t been gathered and vetted for various reasons.  Sometimes it’s not enough to rely on search engines alone to find data, especially if the content may be more prevalent on the deep/dark websites or if you can’t think of every possible thing to look for.

You can write scripts to gather data all day, but it seems like such big jobs need big tools and services for data acquisition.  What does your team use?

etOA87 · Answer

You’d be surprised how much manual tagging of data goes on in the industry. Hire a bunch of workers getting paid $5/hour in another country, run that 3x to validate the tags, and you’ve got a very reliable, large, and relatively affordable supervised learning training set.

Industries

Job Groups

General Topics

What tools do you use for gathering training data?

Sponsored

Most Read

What tools do you use for gathering training data?

Most Read