Control Romo with Python from Your PC

May 2012

I wanted to control my Romo (the cellbot from Romotive) from my PC. But I like Romotive's control app and didn't really want to write my own at this point using their SDK. Instead, I just wanted to control their app from a script on my PC which hopefully would let it access all the functionality. Of course, this meant using the protocol they use between their controller app and the app which is actually running on the device mounted on the Romo.When you launch a Romo app and it says "Waiting for controller devices", it sends out a multi-cast packet (on 235.8.13.21:21135) which advertises that the Romo is present and gives its name. That's how controller apps know which Romos they can select for control. The packet also gives the IP address of the Romo. If you choose to control a specific Romo in the controller mode of the app, it starts sending UDP packets to communicate with the Romo. These packets tell the Romo to show different emotions, to flip which camera is used, to move in one of 8 ways, and to start streaming video. If video is requested, a sequence of JPG images are streamed back from the Romo to the controller using that same UDP port (port 21133).Note: As this protocol is internal to the Romotive app, it may very well change if they need to restructure how they do things (for instance, if they add motor speed control instead of just direction).I recently learned Python (actually because I had fun taking Sebastian Thrun's self-driving car class on Udacity and that was the language used for the course). In an attempt to learn it better, I've been trying to use it more often when I need a quick program. So I cranked out a program to do the Romo control. Keep in mind this is a programmatic way to control the Romo from any machine that can route UDP to the Romo -- not a UI (although there are a number of ways to expand it to be a cross-platform UI). (Note: a UI version is now available -- see the updates at the bottom of this article) You can download the full romo.py file here. The full listing has more comments and instructions. Here's an abbreviated listing with mostly just the code:

import socketimport timeimport threading#Edit the following 3 variables to match your configurationimage_directory="e:/romo-images" #directory to store imagesmy_ip=[192,168,123,2] #your IPromo_ip=[192,168,123,3] #IP of the romoromo_port=21133romo_ip_str=".".join(map(str,romo_ip))my_ip_str=".".join(map(str,my_ip))buf=""for oct in my_ip:    buf+=chr(oct)for oct in romo_ip:    buf+=chr(oct)for i in range(4):    buf+=chr(0)#facessmile="È"love="É"excited="Ê"wink="Ë"random="Ì"frown="Í"angry="Î"confused="Ï"hush="Ð"crying="Ñ"#cameraflipcam="" #flips the camera from front to back or back to front#moves (for some reason the first move may need to be sent twice)s="p" #stopfl="i" #forward leftf="d" #forward straightfr="e" #forward rightrl="x" #rotate leftrr="h" #rotate rightbl="w" #back leftb="|" #back straightbr="{" #back right#control commandsinit=""startvideo=""shutdown=""abort=threading.Event()sock=socket.socket(socket.AF_INET,socket.SOCK_DGRAM)#for type 3 face and move commandsdef send(cmd):    tmp=""+cmd+buf    sock.sendto(tmp,(romo_ip_str,romo_port))#for type 2 commands like init, startvideo, shutdowndef ctrl(cmd):    if cmd==shutdown:        abort.set() #let thread shut down        time.sleep(1)    elif cmd==startvideo:        t=ThreadClass()        t.start()    tmp=""+cmd+buf    sock.sendto(tmp,(romo_ip_str,romo_port))class ThreadClass(threading.Thread):    def run(self):        vsock=socket.socket(socket.AF_INET,socket.SOCK_DGRAM)        vsock.bind((my_ip_str,romo_port))        cnt=0        print("Image fetching thread started.")        while not abort.isSet():            data,addr=vsock.recvfrom(15000) #Photos are less than 15K            cnt+=1            scnt=str(cnt)            for i in range(4-len(scnt)):                scnt="0"+scnt            if ord(data[0])==5 and ord(data[1])==0: #JPEG Image                f=open(image_directory+"/image"+scnt+".jpg","w")                f.write(data[14:])                f.close()        vsock.close()        print("Image fetching thread finished.")ctrl(init)#time.sleep(1)#ctrl(startvideo)       #ctrl(shutdown) will shutdown the image capture thread, disconnect from the romo, and allow one to quit() the python interpreter

A typical session that tells the Romo to smile, express love, move forward, and then stop might be:

python -i romo.py>>>send(smile)>>>send(love)>>>send(f)>>>send(s)>>>quit()

A video capture session which starts video capture, moves forward, flips to the back camera, moves backward, and shuts down might be:

python -i romo.py>>>ctrl(startvideo)>>>send(f)>>>send(s)>>>send(flipcam)>>>send(b)>>>send(s)>>>ctrl(shutdown)>>>quit()

Be careful in your use of video capture, the frames come rather quickly and are a little less than 15K each. When you turn on video capture in the script, it starts a background thread which captures them and writes them out to the specified directory as a sequence of JPEG files (I only pad for a 4 digit count). If you write a UI based on this, you'd normally just refresh the image every time a new one comes in. You could also do the same if you just want to add a command to copy out whatever the last received frame is (rather than keeping them all). Have fun!Update: I couldn't resist putting together a simple Python UI with a real-time video window and keyboard controls. You can download the full romo-ui.pyw program here. The UI requires a standard Python release which includes the Tkinter UI modules. It also requires the PIL library for the JPG image handling. The details about how to get these are in the comments to the program. If you've installed the Python environment with Tkinter, you can just launch the romo-ui.pyw program to start it -- make sure the phone on your Romo is already waiting for connections. You can exit the app either by typing "q" or deleting the window. Both should exit gracefully. The steering controls take advantage of the 9 key keypad (if you've got one). It should work in either numeric or non-numeric mode (see the code for the key mappings). Flipping the camera view is the "-" key and the faces are controlled by generally logical letter keys. It works great on my Romo (even with my hyper speed motor mod).Update2: I added automated discovery capability for Romos. You can download the full romo-ui2.pyw program here. You currently can't choose from multiple Romos (unless you hardcode the address) but it will auto-discover the first one it finds and connect to it. This involves both announcing that it's looking for Romos on the multicast channel as well as listening for responses (the default is only to look for about 5 seconds -- you can lengthen that time by setting it in the script). This was tested with the latest Romo iPhone app (v1.05).