Redactie: Lisa Gelijns ● lisa.gelijns@mail.com
It’s the fall of 2016 and time for test events and
conferences. In Holland, a lot of these events are about test automation. The reason seems to be the need for speed in DevOps, new insights on the test approach and new developments in, for example, self-learning algorithms.
Michael Bolton (Toronto) agreed on an interview with TNN on the subject of test automation. The interview took place on October the 27th.
Lisa Gelijns (LG): From former articles and interviews I understood that your view on test automation is, that people are not looking at the whole picture, they are looking at test automation in a simplistic way.
Michael Bolton (MB): Some people are certainly doing that. Let’s put it this way: automation is getting machinery to do something. Now, in testing, what is it that we ask machinery to do?
LG: We ask it to test the test object.
MB: I don’t think that’s what we are asking, because that’s not possible.
To test, at least to test as I think about testing, means to evaluate something by learning about it through exploration and experimentation. That includes studying the system and modeling it, modeling its functionality, and the users, and the ways they use the system, and the test data. Testing includes designing experiments. Testing includes figuring out what’s important to test and what might not be so important. Testing means exploring the product. Machinery can’t do that, not on its own. People explore, and we use machinery to help us explore.
There are wonderful tools that assist with testing in a lot of powerful ways. We use tools all the way through development to help with programming and testing-tools for design and modeling; tools to monitor and probe the product; tools to generate and manipulate test data; tools to visualize patterns of output; tools to help with recording and reporting and management. We can also use tools to drive the product, which can be helpful in various ways.
There is a specific part of testing, an activity that can be automated. My colleague James Bach and I call that ‘checking’. To check, to us, means to operate and observe the product, and then to apply decision rules, such that all of that work can done by algorithms, programs, machinery. This is exactly equivalent to the spell-checking function in your word processor. You can use a spell-checker to identify words that are apparently misspelled, based on a dictionary. You can even use a grammar checker to identify some violation of certain grammatical rules; although in my experience, most grammar checkers miss lots of problems and raise too many false alarms. But only a human can decide whether a correctly spelled word is being used appropriately, or whether syntactically correct text makes sense.
Typically people use checks to operate the product and compare its output to some expected or calculated result. But you could also use checks to determine whether specific files are in the right places, or to determine whether specific resources or available, or to alert you if a transaction is taking more than a specified amount of time. Those are just a few examples.
Checking can be really powerful, especially when it is focused on program logic and lower-level functionality. Programmers who develop checks along with their code can get fast feedback as they’re developing and changing the product. When a product is built with interfaces that can be driven by code, we can check business rules and exercise lots of variations with scripting. I’m not so sure about GUI checking, though.
LG: Why not test by the GUI?
MB: We can test at the GUI; that is you can evaluate, learn, explore, and experiment at the GUI. We can check at the GUI too. But would GUI checking be worthwhile? Remember, GUI stands for Graphical User Interface.
GUI checking gets a lot of attention, perhaps because it’s dazzling. People are fascinated by watching the screens go by as the machine presses its own buttons faster than any human could.
But even though the machine may perform certain behaviours quickly, the scripts can take a long time and a lot of effort to develop and maintain. Checking at the GUI level can be really fussy. A GUI is set up for a human user, not so much for machinery. Machinery doesn’t deal very well with ambiguity, so it needs very specific instructions to find the right elements on the screen and to handle them in the right way. Machinery doesn’t deal with unanticipated changes very well either, so when you change something about the UI, you often have to change the checking code too. Now, you can make GUI checking a lot easier and less fragile by designing and building and maintaining the product to be testable. It’s important for testers to advocate for that kind of testability if you’ve decided that automated GUI checking is worthwhile.
Humans repair imprecision and ambiguity quickly and effortlessly, often without even noticing it. So machinery can be sensitive to subtle or invisible changes that people might not notice, but it might also be sensitive to problems that don’t matter to humans. GUI checking tends to go better when your application is stable and doesn’t change much, but if that’s the case, why bother? There may be faster and cheaper ways to stay alert to trouble, or to avoid it altogether. Worried about broken links in your Web-based product? You could automate actions at the GUI, and try to make the machine behave like a fast, precise, robotic, but not very observant human. On the other hand, you might want to run a static link checker on each build, or build one into the program code. Are you worried about links that aren’t broken, but that are pointing to the wrong page? You could program that from the GUI, or underneath it. Either way, you’ll need to program checks that can reliably tell the difference between the right page and the wrong page. Yet even if you’ve got a page that’s right in some sense, might there be something extra or missing on the page?
Do you want to write code that actually walks through the site? You might, but would the value of that approach be worth the cost? It takes skilful people with both programming skill and testing skill to do all this really well; and it takes a pretty stable product.
Sometimes we can fool ourselves with tools, too. Let me give you an example. A few years ago I was buying something from the Web. There was a button on the checkout page that allowed you to sign up for Paypal after purchasing an item. There was an HTML <div> element positioned on top of the button, such that when I clicked, nothing happened unless I clicked precisely on the top or bottom edges of the button. I wondered whether a tool would have the same problem as I did. Back then I wrote some script using Ruby and Watir. The script was able to click the button just fine. I tried again more recently with Selenium 3, and the button click worked fine there too. The tools was apparently able to see the fraction of the button that was still clickable. So it’s possible for the check to miss a usability problem that human would recognize easily while interacting directly with the product.
LG: Is this not a problem within the tool? That you don’t know exactly how it works?
MB: The problem is not with the tool as such, but with trust in the tool, trust that might be misplaced. It’s tempting to think that a tool will act like a really fast, really precise human being, but it doesn’t. It’s different in ways that are sometimes subtle. If you’re not alert to their limitations, tools can help you to go to sleep. I try to stay worried about that. We need tools to extend our reach. If we fail to use tools, we’ll miss things. But if we simply delegate our interactions and observations to tools, we’ll miss other things. So we must remain vigilant.
LG: The knowledge of a test object will disappear when you leave it to a tool, is that the idea?
MB: Your observation may become sufficiently focused that some things disappear, yes. Think of a telescope. A telescope lets you see things that are very far away, but when you’re using the telescope, your field of view is narrow. Binoculars give you a wider, 3-D field of view, but they don’t magnify things as much as telescopes do. Your naked eye will let you see things around you that you’ll miss while you’re staring through the binoculars, and so forth.
When we instrument our observation with a tool, we’ll notice some things and be oblivious to others. So it’s important, I think, to diversify, and to look at our product in many ways. We won’t notice the gaps if we don’t look for them. When a tool is performing thousands of checks a minute, it’s impressive. It’s dazzling. It’s easy to forget what the tool might not be doing.
There’s another kind of cost to think about, by the way: transfer cost, the time or effort it takes to transfer knowledge from one person to the next. If your check code is clearly written and structured, or if people are working in close collaboration, transfer cost tends to be lower. But when we’re coding in a rush, alone, we might not place enough emphasis on readability and maintainability. If you’re writing checks or building tools that you don’t intend to throw away, try explaining them to other people, so they’ll be able to take advantage of what you’ve done.
LG: Do you see applications creating and executing tests on themselves?
MB: Most application code does checking to some degree. I’d say any kind of internal error checking, consistency checking, checksums, timeout checking, or data validation is self-checking in the sense that you’re asking about. That kind of stuff has been around in programs forever, of course. But there is a catch: that kind of code can only deal with conditions that it’s been programmed to deal with. If an internal check isn’t there at all, we may get data corruption or a crash or an inaccessible feature. If the check is there, the program might be able to handle a problem… or we might get an unhelpful error message.
I got this wonderful error message from an application once. The title on the caption bar was just ‘DLL’, so I had no immediate idea where the message was coming from. And the message was: ‘An attempt was made to load a program in an unsupported format’. Huh? What kind of attempt? What program? What ‘unsupported format’, and what would be a supported format? And then there was only one button available: ‘OK’. So, how is that supposed to help me?
I don’t know how that program was developed or tested, but it doesn’t look as though someone with a critical mind asked ‘Is this going to help a user or is it just going to annoy and frustrate them?’. Machinery isn’t able to ask or answer that question, and won’t in the foreseeable future. Machinery doesn’t have social awareness; it can’t decide whether a human will be happy or upset.
LG: Machinery cannot deal with emotions.
MB: Right, it can’t deal with emotions, and it doesn’t have emotions. Through evolution, Mother Nature has provided us with emotions that give us bad feelings when trouble is around. Emotions are powerful tools for a tester, when we pay attention to them! (I wrote about that here: http://www.developsense.com/blog/2015/09/oracles-from-the-inside-out-part-2/)
Now, we can check mechanically for specific conditions that might make someone upset. As a trivial example: we could infer that people will begin to get annoyed when they have to wait for the product to respond, seven seconds or longer, say. A check might help to alert us when seven-second delays are happening, and that can be useful.
LG: Do you think machinery, like a robot, will be able to deal with human emotions in the future? For example, with the help of self-learning algorithms.
MB: Well, I read science fiction. Maybe in forty years we’ll have robots with feelings and empathy. But we’re living in today’s world, trying to deal with problems for today’s customers. I think we can deal with those problems extremely powerfully by taking our cognitive, emotional, intellectual capabilities and extending our capabilities with the power of tools.
And let’s remind ourselves that we, and not the tools, remain in control. This is a bigger deal than just in testing. I’m increasingly hearing about algorithms that decide what news you’re going to see on social media. Algorithms are applied to deciding who is going to stay in prison and who is to be released on parole. When people apply for a mortgage, algorithms are being used to decide whether they qualify or get rejected. Those algorithms come with biases, and they reinforce those biases. They’re not self-correcting.
Meanwhile, a human can spot things, serious personal problems or strengths, that algorithms can’t know about. I’m disturbed that algorithms are being used to replace the human element instead of supplementing it. An algorithm will never wonder if it can be wrong. It doesn’t exercise critical thinking or self-doubt. Machinery is not self aware.
Part of our genius as people is our ability to use all of what our senses could be telling u, and then to think critically about how our conclusions might be wrong or inappropriate. What looks to be true might not be true. Being skeptical helps us to make progress. We’ve made great scientific progress in the last 300 years, since Galileo. Galileo showed in many ways how our models and our senses could fool us, and that really jump-started science in the West. But we also have to realize that science and instruments and algorithms and models can help us to fool ourselves too. Remember any interesting cases of polls being fooled lately?
When some people talk about ‘test automation’, they seem to push people away from the centre of the discussion. Tools absolutely help us, but tools do not DO the testing. So I think it would be helpful to move away from the idea of ‘automated testing’. Testing can’t be automated. Instead, let’s talk about automated checking, checking can be automated as a potentially useful form of tool-assisted testing. Then let’s also think about all the other ways in which tools can help us.
NieuwsMagazine