Hey all, unfortunately, I've been stuck on this task for a couple weeks now and would really like to figure it out soon. Since most people here are in tech, I was curious if anyone would be able to provide some assistance. Background: I'm on Windows using Python and Tensorflow to perform inference on 3 deep learning models I have. I have a function, let's call it "func_load", that loads the 3 models into memory (takes a few minutes). Then, I have another function, let's call it "func_infer", that performs inference on the 3 models sequentially. My goal is to perform the 3 models' inference in parallel using either multi-processing or multi-threading. I've already tried both. 1. For multi-threading, I can't seem to get it running in parallel. All I need to do is call session.run() in Tensorflow to get the inference. If I'm understanding correctly, session.run() releases the Python Global Interpreter Lock (GIL), so if I call it on the 3 different session in 3 different threads, theoretically, it should all be run in parallel - but it's still running in the same amount of time as if being run sequentially! For reference on what I'm doing with multi-threading (I'm using the lines below for my 3 threads): workers = [lambda input_list: sess.run(y_op, {x_inp: input_list, tflag_op: False}) for sess in sessions] t = threading.Thread(target=workers[idx], args=(input_list,)) t.start() ... t.join() 2. For multi-processing, I'd like to be able to create 3 different processes, using multiprocessing.Process() and then call func_load() on each of them with one model each. Then, I'd like to use those same processes to call func_infer(). I haven't been able to figure out a way to do that with processes though - calling 2 different functions at different points in the script. Does anyone have any suggestions on how to approach these tasks? Any help would be greatly appreciated! I’ve actually already posted about this in SO, but haven’t gotten much further as well btw.
Checkout Ray. Under the hood it uses cloud pickly to serialize/deserialize (unlike multiprocessing module which has some deficiencies when it comes to tf models).
What kind of models are you trying to run parallel on cpu? I think you simply have a HW bottleneck here hence you dont see any speedup.
AMA
Yesterday
3565
I’m a professional coaster AMA
Tech Industry
Yesterday
35918
Worried that our top performer is an attrition risk. How do managers handle this?
Tech Industry
15m
307
How many hours per week would you work for $1M TC
Tech Industry
11h
2694
Avoid teams with only Chinese or Indians especially with a Chinese/Indian manager
Consider using Apache beam for inference. It is easy to parallelize. And you can find examples in tfx project under tensorflow
Thanks for the suggestion! This is my first time hearing of Apache Beam. I took a look at it, but I haven’t been able to find any examples of parallel Tensorflow inferences. Do you know if it’s able to do that?
Also, I’d like for this to all happen locally. Apache Beam seems to be targeted for cloud