3

I am currently learning how to make and implement CNN(alexnet)+LSTM model to predict a video, but i got stuck in the prediction thing

when i try to predict i got this error

ValueError: Input 0 is incompatible with layer model_1: expected shape=(None, 10, 384, 384, 3), found shape=(1, 270, 480)

I admit that my width and height is different, but how to add the timesteps(10) in the prediction so it will be the same just like my model?

Here is my code:

model_path = 'CCTV_10Frame_SGD_Model_1e4_b16_l21e2_Terbaru.h5'
model = keras.models.load_model(model_path, compile = True)

vid = cv2.VideoCapture('Data16_116.mp4')

prev_frame_time = 0
total_frame = 0
while vid.isOpened():
    ret, frame = vid.read()
    if ret == True:
      total_frame += 1
      draw = frame.copy()
      draw = cv2.cvtColor(draw, cv2.COLOR_BGR2GRAY)

      scale_percent = 25 # percent of original size
      width = int(frame.shape[1] * scale_percent / 100)
      height = int(frame.shape[0] * scale_percent / 100)
      dim = (width, height)
      frame_set = cv2.resize(draw, dim, interpolation = cv2.INTER_AREA)
      boxes, scores, labels = model.predict_on_batch(
          np.expand_dims(frame_set, axis=0))
      boxes /= scale
      i_iterate = 0
      for box, score, label in zip(boxes[0], scores[0], labels[0]):
        if score < 0.5 or i_iterate > 0:
          break

        fps = 1/(start-prev_frame_time)
        prev_frame_time = start
        
        cv2.putText(draw, "%.2f" % fps, (7, 70), font,
                    1, (100, 255, 0), 3, cv2.LINE_AA)
        

        color = label_color(label)
        b = box.astype(int)
        draw_box(draw, b, color=color)
        caption = "{} {:.3f}".format(classes[label], score)
        draw_caption(draw, b, caption)
        print("=================================")
        print("[INFO] Score : ", score)
        print("[INFO] Label : ", classes[label])
        i_iterate += 1

      print("=================================")
      cv2.imshow('Result', draw)  
      if cv2.waitKey(25) & 0xFF == ord('q'):
        break
      else:
        break

vid.release()
cv2.destroyAllWindows()

I hope any of you guys who have experience of this can help me

Thank you so much !

1 Answer 1

1

In my opinion, what you can do is that for the first 10 frames you just append them, and but from the 11th frame, you now pop from the start and append to the end.

By doing so you will have 10 frames based on which the model will predict the 11th frame.
I think the model is trained to predict one frame from the past 10 frames.

Also if you want to predict just from the first frame then try to see what was the input to the model while training in the case of a single frame. It should be like 9 Frames with some arbitrary constant value and then the 10th frame with some actual values.

Sign up to request clarification or add additional context in comments.

2 Comments

yes but the predict system is still ask for the same input shape , currently my input shape model are (batch, timesteps, w, h, channel) when timesteps are how much frames that it take to predict, but i still dont understand how to make my input video for testing have the same input shape just like my model
For that, you have to look at the source of the model and code. It is highly likely that your model might be trained such that it accepts 9 timestamps of nan type (may each value at those timestamps being -99 or something) and 1 timestamp of the first frame. If that helped then kindly upvote.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.