I have several tables. Im trying to build an anomaly detector that can scan through a column (numerical or text data) and determine if a certain entry is anomalous. For ex a SSN showing up in an Employee Id column (same number of digits). I was thinking in the direction of regex at the low end of the complexity spectrum and autoencoders at the other end. Any thoughts on how to approach this problem? Training data of valid entries for certain classes is available #machinelearning
We built something similar for detecting pii in log data streams. There are a lot of python libraries and tools available which you can reuse in a pipeline for this.
Pii?
Personally Identifiable Information