Generally, I’m a big fan of writing test. They are great: once you add them, you cannot break logic silently anymore. The time you’ve spent on writing tests pays off completely in that manner you won’t need to fix production on Friday’s night. Tests cannot guarantee your code is free from errors completely, though. But having properly designed tests greatly reduces the number of potential mistakes.

For a long time, I’ve been developing only server-side tests whereas the UI behavior was tested manually. That has been worked until the moment when we understood we break the UI too often. That’s how the question about having UI tests has appeared.

TL;DR: I didn’t anticipate maintaining client-side tests (also known as integration test) would be so costly. Yet I managed to get stable UI tests, it took long time of debugging, reading the source code of webdrivers and Google/StackOverflow surfing. Here are some random thoughts on the subject.

UI tests are complex

The first issue you might face when dealing with UI tests (and it has never been written clearly in any documentation or manual) is they are unstable and flaking by their nature. In technical terms, they are fragile and difficult to maintain. Testing UI involves lots of technologies and protocols. Under the hood, there is an entire stack of nested calls. Let’s consider a typical PHP/JS-driven web-application with Selenium-based tests. Now follow the testing pipeline.

Before you run a test, you should launch the application server (probably your REST backend) and Selenium server. Then, once you’ve started a test, it sends request to Selenium to start a new browser session. Java spawns a new process of webdriver – a special program that helps to automate browsers. The driver tries to discover a browser installed on your machine. If everything is alright, the browser process starts. Just a note: you have not done anything important yet, but there are four processes are running already (the app, Selenium, webdriver, browser).

Now then, every time you call for something like

>>> driver.find_element_by_id("my_button")

, it turns into a stack with the steps look like approximately the following:

  1. your PHP/Python/whatever code calls Selenium server.
  2. Selenium remaps the incoming data and calls webdriver server.
  3. The webdriver reads JSON body and sends corresponding binary commands to the browser via socket channel.
  4. The browser performs those actions with some delay. There is no guarantee about timings, the numbers depend on your OS, number of cores, amount of open tabs and so on.
  5. If everything is alright, the data flows in the opposite way: from the browser to the webdriver, to Selenium and finally to the app.

Remember, it was only a simplest API call. More complicated API expand into series of calls and success only when all of sub-calls managed to finish successfully.

Versioning hell

What exactly may occur during the process? Well, everything.

The most obvious thing is the more software you are trying to work together the more unstable the system becomes. That might not be a problem when all the components are under your control. Say, if it was a micro-service architecture with each node developed by your people. But if fact, they all come from different companies with different points on the subject. Your testing library is written by some enthusiasts. Selenium is the second group. The browser is the third one and they do not listen to anybody, that’s for sure. Developers who have brought the webdriver are the fourth group.

Now imagine that every product has its own version, release schedule and roadmap. And the version 4.2.1 of this requires at least version 3.2.1 of that. In the release 5.4 the feature X doesn’t work, you should use 5.5-beta instead. And more many of that. These bits of information are never shared in the documentation, but only in comments on Github.

It reminds me The Planet Parade sometimes when the system performs only if some special version’s numbers are set together. Otherwise, it falls apart.

They run for too long

Another issue with integration tests that you might not be aware about is their long execution time. It becomes new to me after long period of maintaining server tests that used to pass in a minute or two. Even having a small set of integration tests, it would take much more time. If we care about running each tests independently without affecting them each other, you need to turn the app to its initial state. That involves resetting the database, logging out, wiping session, preparing local files and so forth. It takes sensible amount time all together. Be prepared your CI pipeline becomes more time-consuming.

Referencing a missing element

What I’d like to highlight more in that topic is a strange error named “element stale”. It will definitely rich you when dealing with single page application (aka SPA). It is so common that the official Selenium site has a dedicated page and an exception class named after if. The sad thing, those text is quite stingy on describing what could be the reason of getting such an error. And more important, how to prevent it appears. Hopefully, after long debugging sessions, I’ve got answers to both of those questions.

Inside browser, every DOM element has its own unique machinery-wise identifier. There is no a single standard of representing it. Depending on browser, you might get either 0.950889431100737-1 or c1ee22f1-b96e-5245-bd85-9e56e1781cbd for the same button. But it does not matter. What really does is it’s quite easy to refer an element that has been removed from the DOM tree when the app page has been re-rendered.

That’s the root of the problem: such modern frameworks as React, Vue or whatever else continuously update the page components inserting new HTML nodes. Even when you have a component with a button with unique “id” attribute and it has been re-rendered, an new DOM element instance will be created. From the browser’s prospective, it will the another node, completely different to the previous one.

Each modern testing framework has high-level functions to operate on page elements without knowing their IDs. Consider Python examples:

>>> driver.click_on_element("my_button")

In fact, this API call expands into at leas two low-level Webdriver API calls, namely:

  1. Try to discover an element ID by the search term.
  2. Try to click on that element.

Looks well, but here is what takes place under the hood. You’ve got a long ID of a link or a button, that’s great. But in a millisecond before you clicked on it, the UI has been re-rendered. The reason of that is not so important: maybe, some event has been triggered, or a user hovered some element when moving mouse to it. Or maybe, the framework’s internal algorithm just decided somehow the content should be updated. That means, the original button element you’ve got the ID for was detached from the DOM thee and moved into a temporary array or any other kind of data structure to wait before it is wiped completely from memory. Instead of it, a new DOM node that looks exactly the same took those place. But willing to click on the button, use send the outdated ID. There is no sense in clicking on such kind of elements that are almost dead. The browser returns corresponding error message that expands into Selenium exception.

I wish that paragraph was on those page. It would save me a week maybe or even more.

A good solution would be to implement some kind of transactions in your code. Let every complex API be implemented with try/catch clause with some sensible strategy to re-call it in a case of failure, say a number of times or by timeouts.

High Hopes

OK, let’s have some final thoughts on the subject, some kind of summary. Testing UI is really far from the server-side ideal world. They are unstable, difficult to develop and maintain. There a lots of undocumented traps you may suffer from during the journey. But still, there are definitely some good things happening nowadays.

First, the official Webdriver protocol is developing actively. This protocol is subject to substitute the outdated Selenium JSONWire protocol. It has become a part of W3C so I hope it will be developed so forth.

More Selenium-free libraries appear on Github: Nightwatch, Capybara, Puppeteer… That is great because having the only one monopolist in such specific area stops the progress.

But there are lots of things to be done. What I’d like to see at first place is some kind of cookbook with common advise and good practices on how to test UI without suffering. Those bits should not depend on specific language or library although there could be platform-specific sections. I have never met such kind of a cookbook across the Internet; please correct me if there are any of them.

When writing a library for browser automation, it would be great to not only cover the low-level calls but also design an abstract layer of high-level API to make it easy to use by those who are not familiar with all the issues I mentioned above. Say, a high-level function (or method if we talk on OOP terms) that clicks on a button should take into account the case when the element is stale. From my prospectives, such a functional language powered with immutable data structures and homoiconicity as Clojure (or any Lisp-family one) would be a great choice.

Ok, let’s finish for now. Thank you for listening out. Next time, I’ll say more on technical side of testing code.