How can i detect anomalies in text data

New EiWT57
Oct 25, 2018 3 Comments

I have several tables. Im trying to build an anomaly detector that can scan through a column (numerical or text data) and determine if a certain entry is anomalous. For ex a SSN showing up in an Employee Id column (same number of digits).
I was thinking in the direction of regex at the low end of the complexity spectrum and autoencoders at the other end. Any thoughts on how to approach this problem? Training data of valid entries for certain classes is available


Want to comment? LOG IN or SIGN UP
TOP 3 Comments
  • We built something similar for detecting pii in log data streams. There are a lot of python libraries and tools available which you can reuse in a pipeline for this.
    Oct 25, 2018 2
    • Capital One cofisblind
      Oct 25, 2018
    • Homesite Group / Product Pou4
      Personally Identifiable Information
      Oct 25, 2018