Project Sikuli, a machine-vision research project from MIT, allows users to write scripts that automate UI-driven tasks. This tool, while very powerful and simple, can cause a number of headaches if you’re not careful. In this post, I’ll talk about some issues you might encounter, as well as how to avoid them. If you’ve never used Sikuli before, you can see demos here, or download the IDE here.
5. Not wait()ing
If you’re a long-time user of a particular program or operating system, you can probably describe many common tasks from memory. For example, I might describe how to access the “Uninstall or change a program” dialog in Windows 7 as follows:
- Click on the start menu
- Click on “Control Panel”
- Click on “Uninstall a Program” under the “Programs” heading
If you’re new to Sikuli, you might be tempted to script this simple process like this:
You could then run your script and not notice a single issue for a long time. Then, one day, your computer is busy running multiple background tasks and you run your script again. The start menu takes a few seconds to display its contents, and your script raises an exception. The problem is that Sikuli doesn’t automatically wait for an image to be visible on-screen before trying to click() it, so if it doesn’t see “Control Panel” on your screen very soon after clicking on the start menu, it will raise an exception. In order to force it to behave properly, you’ll need to insert wait() statements:
If you’ve added wait()s and you’re still having problems, try setting a longer wait time.
4. Having trouble with context-sensitive/popup menus
If you tried to follow along with the previous example, you likely ran into a problem when you tried to capture the “Control Panel” option in the start menu. After opening the start menu, switching focus back to the Sikuli IDE will cause the start menu to close, thwarting your effort to capture an image. Now, you could use PrintScreen while the start menu is open, paste the image into an image processor, and then use Sikuli to capture the image from the image processor, but thankfully, there’s a better way.
Sikuli installs hotkeys for common tasks like capturing an image (CTRL + SHIFT + 2 by default), and they don’t cause the current program to lose focus. So you can simply open the start menu/context-sensitive menu of your choice and use the hotkey to capture the screen. That way the menu won’t disappear in the process.
3. click()ing when you should be type()ing
Sikuli’s pretty good at finding stuff on-screen, but it’s still a costly and error-prone process. If you can navigate a user interface by emulating keystrokes rather than clicks, you’ll save yourself a lot of trouble. Typing also has the benefit of sending events sequentially to the current program’s event queue, so if your program is stalling on something, your type() commands will wait for that task to be done, meaning you’ll run into problem #5 a lot less often. If you were to instead use click()s, you would have to tell your script in advance how long to wait() before it can take its next action, giving you inconsistent results if your machine is running slower than expected. Keep in mind that Sikuli can emulate key modifiers, function keys, arrow keys, etc., so you can specify some pretty complex interactions using only type() commands.
2. Forgetting that you’re using Python
Sikuli is a powerful and flexible tool, but remember that Sikuli scripts are written in an incredibly powerful and flexible language. Before writing that fancy UI-driven script to do something simple like change the system date/time, consider that the same task can be done in about three lines in Python or a shell script, saving you a great deal of time and headache. Whenever a task seems unnecessarily complex in Sikuli, ask yourself if it might be better solved programmatically, rather than visually.
1. Not knowing where to find help
Since Sikuli is still in the early stages of development, finding online resources to help you can be very difficult, to say the least. Here are some of the pages that I find most useful when I’m writing Sikuli scripts:
- Documentation / Guide – Perhaps the best Sikuli reference out there. Complete documentation of Sikuli, along with some example code.
- Bug Reporting / Tracking – Sikuli is still in beta. See if that problem you’re having is really a bug in Sikuli, not your script.
- Blog – Contains useful code examples and news
- Q&A – If you’re having an issue, there might be someone else who’s been there before.
[Mario is a summer intern with Baydin, and he is spending part of the summer automating functional tests for Boomerang using Project Sikuli]
June 25, 2010
#5 is not correct. In fact, all actions implicitly call wait() before delivering their mouse or key events. So, you don’t need extra wait() before each action. You just need to set a longer timeout of waiting using setAutoWaitTimeout.
March 3, 2013
Didn’t know about the setAutoWaitTimeout feature. I’m not sure if that means #5 isn’t correct. It’s more that there’s a more efficient way to do it. I’ve seen several of the regular Sikuli support even using tradition Wait commands specificying seconds.
October 1, 2013
There are times when the first handful of scans just can’t find an object on the screen. Half a dozen or more scans may be needed. If you reduce Pattern.Similar below .7 false positives will occur. Higher than .7 and the likelihood of finding the pattern is greatly reduced. This is a particular problem in lower resolution situations. To combat that problem I’ve taken to using: wait(img.png,FOREVER) But then I reduce the scan overhead with: object=getLastMatch()
So I don’t see anything specifically wrong with wait() as it might be needed just to get a match on an object that is already on the screen.
January 10, 2018
Do not Expect sikuli to work on a headless machine e.g. A VM or Server where there is no physical monitor available. A good example could be a VM on Cloud where you are running your scripts using Jenkins etc.