Misc.Oct 25, 2018
NewEiWT57

How can i detect anomalies in text data

I have several tables. Im trying to build an anomaly detector that can scan through a column (numerical or text data) and determine if a certain entry is anomalous. For ex a SSN showing up in an Employee Id column (same number of digits). I was thinking in the direction of regex at the low end of the complexity spectrum and autoencoders at the other end. Any thoughts on how to approach this problem? Training data of valid entries for certain classes is available #machinelearning

Add a comment
Uber sinisteras Oct 25, 2018

We built something similar for detecting pii in log data streams. There are a lot of python libraries and tools available which you can reuse in a pipeline for this.

Capital One cofisblind Oct 25, 2018

Pii?

Homesite Group Pou4 Oct 25, 2018

Personally Identifiable Information