AI experts/developers - your valuable opinion please !!!

Dec 18, 2018 16 Comments

I am a beginner in the world of AI – just been going through some basic material so far. I am well experienced in C/C++ development.

I am going to start implementing something as a Proof of concept. Experts, your valuable inputs can save months off my time – please help! I have ample time available in the holiday weeks, and ideally would like something working in a month (~100-150 hours).

Here is the basic idea:

1) Input will primarily be voice : Voice commands (statements / phrases) : Not more than ~1000 commands.

2) My hardware with probably Raspberry PI and in-build AI will analyze the words spoken, It will respond with answers among a set of limited (~1000) answers. The model will store newer questions/answers, and keep learning from the growing set. It’s an offline setup for now, not connected online!

3) It’s essentially a conversational model of information exchange. The volume and possibilities of the commands as well as the answers will be limited. (< ~1000)

4) At some point, I do plan to put in a visual recognition aspect too, but probably not now.

I was thinking of using something like Raspberry PI with TensorFlow lite AI. Will need some voice recognition software (perhaps can be picked up from some open source project ?). Can someone please give me their thoughts / ideas / best and quickest ways to utilize proper tools and implement this prototype ?

Thanks!!

comments

Want to comment? LOG IN or SIGN UP
TOP 16 Comments
  • LinkedIn / Eng
    Gill Bates

    LinkedIn Eng

    BIO
    [Insert epic sax here]
    Gill Batesmore
    Here, I'll save you some time. https://dialogflow.com

    Voice recognition is tricky to get right. From your post, it sounds like you want to build things from scratch. If that's the case, here are a couple things you should read about:

    Phoneme detection
    Recurrent neural networks
    Hidden Markov models
    LSTM and DTNN
    Language modeling
    Dec 18, 2018 9
    • LinkedIn / Eng
      Gill Bates

      LinkedIn Eng

      BIO
      [Insert epic sax here]
      Gill Batesmore
      You can use Sphinx if you really want to use something open source, but there needs to be a lot of finicking to get it to work right.

      To put it in C++ terms, this is like "oh I'd like to write a distributed database in C++ in 100 hours, it takes text and outputs arrays, how hard can that be?"
      Dec 18, 2018
    • LinkedIn / Eng
      Gill Bates

      LinkedIn Eng

      BIO
      [Insert epic sax here]
      Gill Batesmore
      Outsourcing as in use something that already exists, not as in TCS or Wipro.
      Dec 18, 2018
    • OP
      Ok yes I will use as much as possible stuff that exists. Time to market is the most important factor
      Dec 18, 2018
    • LinkedIn / Eng
      Gill Bates

      LinkedIn Eng

      BIO
      [Insert epic sax here]
      Gill Batesmore
      FYI, someone approached me to build a voice recognition system and we ended up using assistant.ai (now Dialogflow).

      It's really amazing and it allows you to worry about writing useful features instead of going down the rabbit hole of speech recognition and NLP. Kind of feel that I wasted my time learning about speech processing, but whatever.
      Dec 18, 2018
    • OP
      Knowledge never goes wasted, thanks for sharing
      Dec 18, 2018
  • Google / Eng UUKg61
    Raspberry is too constrained a system for any serious work with ML. But because you have already planned for it, here is one way of doing this:

    1) Get raspberrypi with mic and speaker .
    2) Build your speech model using tensorflow on big enough server/laptop to train . https://www.tensorflow.org/tutorials/sequences/audio_recognition is a great start .
    3) Freeze the trained model and make it compatible with Tensorflow lite.
    4) Use it on raspberry pi.

    At this stage you would be able to identify words. To build a question/answer system, you would need to understand sequence modelling . RNNs, LSTMs and Seq2Seq are abstractions. To get faster results, you can start from Seq2seq directly . One way of using this is following this blog https://blog.kovalevskyi.com/how-to-create-a-chatbot-with-tf-seq2seq-for-free-e876ea99063c

    You can of course go deeper and work with LSTMs or even RNNs which are lower level primitive but you would effectively be doing research when working at this level and results are not guaranteed.

    Assuming you are confident with python and can pickup basic tf workings in week, A basic toy model can be built within 2 weeks from fairly ground up.
    Dec 18, 2018 3
    • OP
      Thank you for the detailed answer ! I thought of Raspberry only because of small form factor, because it’s cheap, because I need it to run in offline mode, and because it’s been known to run (probably trained TF) beforehand. Are there any alternatives which are similar ?
      Dec 18, 2018
    • Google / Eng UUKg61
      Not at $35 price range. But there are options for around $100 which are much more powerful option and suitable for production use. Udoo.org is one such site which you can try. Jetson TX2 is other very capable SoM system but would be expensive for toy applications.
      Dec 18, 2018
    • OP
      Thanks again
      Dec 18, 2018
  • Facebook Whateverrs
    Break it up into small pieces and see which ones you can actually complete. Eg maybe start with just working on learning the responses to textual representations of the questions, and worry about adding voice and fitting it onto a pi later.

    To learn effectively, you may need gpus. A raspberry pi may not be enough to do the learning. Most such systems send the questions/commands to a server that gets the response. If you forget about the pi for now, you don't have to write any network services. It may also be possible to have gpus do the training and then just execute a trained network on the pi, but still that adds complexity to your project.

    As others have suggested, you can probably use something open source for speech to text (and vice versa for responses), but it still adds complexity and time. I'd do it last, if you have time left.

    If you're really new to this, just learning the concepts may suck up a lot of your development time.
    Dec 18, 2018 1
    • OP
      Thank you for your thoughts. A phase wise approach makes sense. At the outset I believe I would work on building a trained model (text based) on an available GPU. Once that is done using available sources as much as possible, I can think of adding voice / putting the trained model to a device etc.
      Dec 18, 2018