By Scott May 7, 2012 7 Comments

I wanted to control my Romo (the cellbot from Romotive) from my PC.  But I like Romotive’s control app and didn’t really want to write my own at this point using their SDK.  Instead, I just wanted to control their app from a script on my PC which hopefully would let it access all the functionality.  Of course, this meant using the protocol they use between their controller app and the app which is actually running on the device mounted on the Romo.

When you launch a Romo app and it says “Waiting for controller devices”, it sends out a multi-cast packet (on which advertises that the Romo is present and gives its name.  That’s how controller apps know which Romos they can select for control.  The packet also gives the IP address of the Romo.  If you choose to control a specific Romo in the controller mode of the app, it starts sending UDP packets to communicate with the Romo.  These packets tell the Romo to show different emotions, to flip which camera is used, to move in one of 8 ways, and to start streaming video.  If video is requested, a sequence of JPG images are streamed back from the Romo to the controller using that same UDP port (port 21133).

Note: As this protocol is internal to the Romotive app, it may very well change if they need to restructure how they do things (for instance, if they add motor speed control instead of just direction).

I recently learned Python (actually because I had fun taking Sebastian Thrun’s self-driving car class on Udacity and that was the language used for the course).  In an attempt to learn it better, I’ve been trying to use it more often when I need a quick program.  So I cranked out a  program to do the Romo control.  Keep in mind this is a programmatic way to control the Romo from any machine that can route UDP to the Romo — not a UI (although there are a number of ways to expand it to be a cross-platform UI).  (Note: a UI version is now available — see the updates at the bottom of this article)  You can download the full file here.  The full listing has more comments and instructions.  Here’s an abbreviated listing with mostly just the code:

import socket
import time
import threading

#Edit the following 3 variables to match your configuration

image_directory="e:/romo-images" #directory to store images
my_ip=[192,168,123,2] #your IP
romo_ip=[192,168,123,3] #IP of the romo


for oct in my_ip:
for oct in romo_ip:
for i in range(4):




flipcam="\x8c" #flips the camera from front to back or back to front

#moves (for some reason the first move may need to be sent twice)
s="\x70" #stop
fl="\x69" #forward left
f="\x64" #forward straight
fr="\x65" #forward right
rl="\x78" #rotate left
rr="\x68" #rotate right
bl="\x77" #back left
b="\x7c" #back straight
br="\x7b" #back right

#control commands



#for type 3 face and move commands
def send(cmd):

#for type 2 commands like init, startvideo, shutdown
def ctrl(cmd):
    if cmd==shutdown:
        abort.set() #let thread shut down
    elif cmd==startvideo:

class ThreadClass(threading.Thread):
    def run(self):
        print("Image fetching thread started.")
        while not abort.isSet():
            data,addr=vsock.recvfrom(15000) #Photos are less than 15K
            for i in range(4-len(scnt)):
            if ord(data[0])==5 and ord(data[1])==0: #JPEG Image
        print("Image fetching thread finished.")


#ctrl(shutdown) will shutdown the image capture thread, disconnect from the romo, and allow one to quit() the python interpreter

A typical session that tells the Romo to smile, express love, move forward, and then stop might be:

python -i

A video capture session which starts video capture, moves forward, flips to the back camera, moves backward, and shuts down might be:

python -i

Be careful in your use of video capture, the frames come rather quickly and are a little less than 15K each.  When you turn on video capture in the script, it starts a background thread which captures them and writes them out to the specified directory as a sequence of JPEG files (I only pad for a 4 digit count).  If you write a UI based on this, you’d normally just refresh the image every time a new one comes in.  You could also do the same if you just want to add a command to copy out whatever the last received frame is (rather than keeping them all).  Have fun!

Update: I couldn’t resist putting together a simple Python UI with a real-time video window and keyboard controls.  You can download the full romo-ui.pyw program here.  The UI requires a standard Python release which includes the Tkinter UI modules.  It also requires the PIL library for the JPG image handling.  The details about how to get these are in the comments to the program.  If you’ve installed the Python environment with Tkinter, you can just launch the romo-ui.pyw program to start it — make sure the phone on your Romo is already waiting for connections.  You can exit the app either by typing “q” or deleting the window.  Both should exit gracefully.  The steering controls take advantage of the 9 key keypad (if you’ve got one).  It should work in either numeric or non-numeric mode (see the code for the key mappings).  Flipping the camera view is the “-” key and the faces are controlled by generally logical letter keys.  It works great on my Romo (even with my hyper speed motor mod).

Update2: I added automated discovery capability for Romos.  You can download the full romo-ui2.pyw program here.  You currently can’t choose from multiple Romos (unless you hardcode the address) but it will auto-discover the first one it finds and connect to it.  This involves both announcing that it’s looking for Romos on the multicast channel as well as listening for responses (the default is only to look for about 5 seconds — you can lengthen that time by setting it in the script).  This was tested with the latest Romo iPhone app (v1.05).


  1. Hi Scott,
    Thanks much for this, helps us play around with romo. We are trying to get the camera/video working but having some difficulties (changed directory for images to my machine etc), but we never get any pictures in the directory and the “Image fetching thread finished” is never printed, script just hangs. Are there timing issues with running this in non-interactive mode?

    • Nancy,

      I don’t recall seeing timing issues myself — you could also try the latter programs I posted which display the video in a UI. It’s always possible Romo has changed the software in their more recent versions of the app (I should re-test with their latest). You’ll need to have the Python program send the appropriate 02 type command to tell the Romo to start transmitting video. You’ll also need to make sure the machine you’re receiving it on doesn’t have a firewall that’s potentially blocking the UDP transmissions from coming in from the Romo.

  2. Hi Scott,

    Do you know if this would work with the new Romo? I’ve tried and the app seems to run but I’m not a python guy at all… When I run the .pyw file it loads a white screen and if I press keys it logs them in the terminal but nothing happens in the white screen or with Romo…?

    • I’m not sure, I don’t have the new Romo but generally we’re talking about whether the latest version of their application is different — perhaps they changed the protocol a bit. I should probably get the latest software and check everything out again. Otherwise, the trick is to monitor the network communications between Romotive’s two apps using a tool like wireshark and see how things might have changed.

  3. Yeh I’ve tried using Wireshark but don’t really know what I’m doing! I’ll just wait for the SDK!

Leave a Reply



Copyright © 2016 SWB Labs All Rights Reserved.