Last two posts explained the process required to get Selenium up and running, and to execute tests on one or more machines in parallel. The major drawback with the explained methods however were that each machine needed to be configured individually, which required either physical access to the machine, or a remote desktop session. Once we need to manage more than a handful of machines, the method becomes too time consuming, as well as expensive to procure the necessary hardware.
A very reasonably priced (or even free) alternative is to use the Amazon EC2 service. This post will explain how to set up a single EC2 instance, enable Selenium to run when the machine starts, and use a virtual display driver to launch the browsers in.
Start an EC2 Instance
1. An AWS account is required in order to create an EC2 instance. To sign up for a new account, go to http://aws.amazon.com, click Create an AWS account, and follow the instructions to create an account. Once an account is successfully set up, log into AWS Console, and click the EC2 link:
2. Once the Amazon EC2 Console loads, click the Launch Instance button. Select the Classic Wizard option, the click Continue.
3. On the Choose an AMI page click the Select button next to Ubuntu Server 12.04.1 LTS
Note: By default, 64bit version of the OS is selected. 32bit version may also be used, however the number of available instance types will be reduced to those supporting 32bit operating systems.
4. Select the appropriate Instance Type and click Continue. For purposes of this test, I am using the Micro instance, which is covered by the free tier. Choose an instance based on the number of simultaneous browsers that need to run on each machine.
5. Leave the settings on Advanced Instance Options and Storage Device Configuration pages as they are, and click Continue.
6. Enter a value for the Name key (i.e. Selenium) and add any other key/value pairs if necessary, then click Continue.
7. A new Key Pair will need to be created if one has not been created before. Select the Create a new Key Pair option, and follow the instructions for creating and saving a key pair, then click Continue. Do not forget where the Key Pair file (.pem) is saved, as we will need it later for configuring the SSH connection. In my case, I have named my new Key Pair selenium.
8. A new Security Group will need to be created in order to allow incoming traffic on port 8080, as well as ports 7054+. Enter a group name and description (i.e. selenium). By default, port 22 (SSH) will be added to the list of inbound rules. Add TCP port 8080 from source 0.0.0.0/0 (you can restrict the port range to the IP of your machine if necessary). Add a TCP range for ports 7054-7153, which will allow up to 50 clients. Click Continue when finished.
Note: the number of ports starting at 7054 required is 2x the number of instances.
9. Review the instance details, editing anything that may not be correct, then click Launch when ready.
Connecting to the Instance
After a short time, the instance should start and we should be able to connect to it. You can check instance status using the EC2 Console, and clicking the Instances link. Ensure instance state is running before attempting to connect.
AWS documentation has a great walkthrough on connecting to a Linux instance using either a Java-based SSH client or a standalone client, such as PuTTY. Follow the instructions in the walkthrough to connect to the instance we created earlier. Important: Walkthrough instructions use root username to connect. Ubuntu does not allow connecting as root, so you must specify ubuntu as the username instead. I’ve used both PuTTY and the Java-based client, and recommend the Java client as it does not require any form of setting up, unlike other standalone SSH clients.
Configuring the Instance
Once successfully connected to the instance, we can continue with the configuration of the instance. Initially we want to configure the instance to run Selenium Server using a virtual display driver. Once we have ensured Selenium Server runs properly, and that we can issue requests against it, we will configure the instance to run Selenium Server on machine start-up.
Installing Required Software
We need to install a few software packages to run Selenium Server properly. Run the listed commands on the console to install the required software.
1. Update the packages available on the system:
sudo apt-get update
2. Headless Java Runtime Environment, necessary for running Selenium Server:
sudo apt-get -y install openjdk-7-jre-headless
3. X virtual framebuffer (the virtual display driver), an X11 GUI server that does not require a display device. Allows us to run a browser without needing a desktop to run the browser on.
sudo apt-get -y install xvfb
4. X Server Core
sudo apt-get -y install xserver-xorg-core
5. Fonts required by X Server:
sudo apt-get -y install xfonts-100dpi xfonts-75dpi xfonts-scalable xfonts-cyrillic
6. Chrome and Chromedriver:
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add - sudo sh -c 'echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' sudo apt-get update sudo apt-get -y install google-chrome-stable wget http://chromedriver.googlecode.com/files/chromedriver_linux64_23.0.1240.0.zip sudo apt-get -y install unzip unzip chromedriver_linux64_23.0.1240.0.zip sudo cp chromedriver /usr/local/bin
7. Firefox (if desired, MUCH slower than Chrome):
sudo apt-get -y install firefox
8. Selenium Server:
Starting Selenium Server
Once everything is installed, we are ready to try testing Selenium Server. First, we need to create a new virtual display using xvfb, and configure it as the default display, then we can start Selenium Server and have it use the newly created virtual display:
1. Start xvfb:
Xvfb :99 -screen 0 1024x768x24 -ac 2>&1 >/dev/null &
2. Set the new virtual display as default:
3. Run Selenium Server:
java -jar selenium-server-standalone-2.25.0.jar -port 8080 –maxSession 10
At this point in time, you should see console output similar to starting Selenium Server on a local machine. To test the server and ensure everything is working correctly, first we must return to EC2 Console and get the Public DNS of the instance. On the EC2 Console, select the instance we created earlier, and in the information panel located below the list of instances locate the Public DNS field. Copy the field value (in my case ec2-107-22-134-4.compute-1.amazonaws.com); it will be inserted into the controller.py script we created in the last post.
Note: Selenium Server takes a minute or so to fully start after the command is run. Make sure you wait until you see the following lines appear on the console, indicating that the server is fully started:
20:35:53.487 INFO - Started SocketListener on 0.0.0.0:8080 20:35:53.490 INFO - Started org.openqa.jetty.jetty.Server@6c87ac60
Open the controller.py script and replace the hosts array entries with a single entry containing the Public DNS value copied above. Run the script, and make sure it executes correctly, and outputs the number of search results returned by Google. If all went well, the script should execute exactly the same was as in the earlier setup.
Running Selenium Server On Start-Up
Now that we have a working instance of Selenium Server, we would ideally like to automate running Selenium Server to avoid having to log into every machine and manually start it. Fortunately, Linux makes the task of automating scripts at start-up easy. We can add our commands required to start xvfb and Selenium Server to /etc/rc.local
1. Open /etc/rc.local for editing using nano (or your favourite editor):
sudo nano /etc/rc/local
2. Enter the following lines before the last exit 0 line. Note: last two lines need to be entered as a single line.
Xvfb :99 -screen 0 1024x768x24 -ac 2>&1 >/dev/null & export DISPLAY=:99 java -jar /home/ubuntu/selenium-server-standalone-2.25.0.jar -port 8080 -maxSession 50 -Dwebdriver.chrome.driver=/usr/local/bin/chromedriver > /home/ubuntu/selenium.log&
Above commands are very similar to the commands we used to launch xvfb and Selenium Server earlier, with some slight alterations to account for them running in a script before PATH variable is initialized; and using an & to start the process in background.
3. Press Ctrl+X to exit nano and Y when prompted to save changes.
4. Restart the server by running sudo shutdown –r now. Once the server has restarted, run controller.py to ensure Selenium Server successfully started. Note: In a default setup, Amazon EC2 instances do not have permanent public IPs or DNS hostnames. Every time an instance starts, it is assigned a new Public DNS name, which may differ from the one it was previously assigned. Always verify the hostname using the EC2 Control Panel. If there are any errors, check /home/ubuntu/selenium.log for logging information on the last execution of Selenium Server.
Multiple EC2 Instances
At this point we should have a single fully functioning instance with Selenium Server starting at boot time. We need to save the instance configuration as an AMI (Amazon Machine Image), which will allow us to clone the original instance without having to go through above steps to set up every machine. Note: Keeping an instance alive, whether it is running or stopped, and storing an AMI, aren’t free however. Amazon has a set rate charged for each GB of storage used. See the Amazon EC2 Pricing page for full details, or a discussion later in the post about a summary of the pricing. If you do not want to be charged for anything, make sure you terminate the instance and elete any AMIs and EBS snapshots/volumes.
Create an AMI
1. Open the EC2 Console and switch to list of instances. Right click on the instance you created earlier and click Create Image (EBS AMI):
2. Enter an Image Name for the AMI, and click Yes, Create. The default options do not need to be changed.
3. In the EC2 Console, switch to AMIs page. Once the AMI is ready, the status will be set to Available.
4. Right click on the AMI once it is ready, and click Launch Instances. Specify the Number of Instances required and select the Instance Type, then click Continue. Follow the steps used earlier to create the instance, making sure the right Security Group is selected. To make identifying instances easier, use the same name for every instance (i.e. selenium). Note: It will take a few minutes before the AMI is successfully complete. Make sure not to terminate the instance the AMI is being created from during this time.
Once our instances are running, we need to modify controller.py and add logic for sending requests to all available instances. Amazon provides APIs to access AWS information, including a list of running instances. Boto is a Python interface to Amazon Web Services that we will use to get the necessary information:
1. Install boto on the machine controller runs:
pip install boto
2. Add an import to top of controller.py for the required EC2 Boto object:
from boto.ec2.connection import EC2Connection
3. Locate the AWS Access Key and Secret Key in the AWS control panel. From any AWS page, click your name at the top, then click Security Credentials:
On the Access Credentials page, click the Show button under Secret Access Key heading to show the key.
4. Set the Access Key and Secret Key in controller.py (add to top of file below the import statements):
aws_access_key = '********************' aws_secret_key = '****************************************'
5. Add the code to retrieve a list of Public DNS hostnames of all running instances with the name selenium (the name part of the query is only required if you have instances not devoted to Selenium testing that you want to filter). The following code replaces the earlier hosts declaration:
ec2conn = EC2Connection(aws_access_key, aws_secret_key) reservations = ec2conn.get_all_instances() hosts = [i.public_dns_name + ":8080" for r in reservations for i in r.instances if (i.state == u'running' and i.key_name == u'selenium')]
First we create a new EC2 connection, using our secret key. Then we get a list of reservations. Each reservation should have one single instance (in our case). Next, we use list comprehension to select each instance in our list of reservations, only if it is running and its name is selenium.
6. Modify controller.py to use Chrome. As mentioned earlier, Chrome is considerably faster and more efficient than Firefox. Locate the following line, and change FIREFOX to CHROME:
driver = webdriver.Remote(url, webdriver.DesiredCapabilities.FIREFOX)
Now we can run controller.py, and if all went well, the program should output the number of Google results to the console.
At this point we should have a fully functioning environment in which we can execute one set of tests on as many instances as we require. Once our initial configuration of an instance is successful, we are only limited by Amazon as to how many instances we require. There is a short discussion about limits and pricing below.
Security of the solution we’ve designed is very limited. We can, and should, increase the security by allowing requests from a certain range of IPs. If we need to secure the connection, we would need to set up a SSH tunnel or similar; which while entirely possible, is out of scope of this discussion. Additional browsers can be installed on each instance; the python script can be converted to most major languages, including .NET, Java, Ruby, etc; improvements can be made to the threading model; screenshots can be taken at certain points in the test process, etc. The post should provide a developer with enough information on accomplishing the most difficult step of initial configuration, the rest is up to the individuals.
Comparison shown in the chart below clearly shows Chrome winning over Firefox in the number of concurrent tests per machine that can be run on a single instance. The micro instance is not suitable for running more than 2-3 tests, as it only has up to 2 cores available, 1 of which is meant for short bursts of processing. The High CPU instances are better suited, with the medium instance having 2 cores with 2.5 EC2 Compute Units each, and the extra-large instance having 8 virtual cores with 2.5 EC2 Compute Units available (the XL being 4x the price of medium for on-demand instances). The High-CPU Medium instance is only recommended for up to around 10 simultaneous test, while the XL should reliably support 20-30 concurrent tests.
Amazon EC2 Pricing page provides full details on the pricing scheme. Here is a quick summary of ESTIMATED pricing amounts for a full test solution. Actual will vary based on the number and type of instances started:
1. Free Tier – Amazon provides a free tier, which can be used with Micro instances. The free tier is useful for initial configuration, and limited testing, as the Micro instance provides very limited resources. You can launch multiple micro instances under the free tier, provided the total number of hours all instances run each month is less than 720. Note: You may still get charged for other services under the free tier, such as EBS storage (an AMI snapshot and individual instance EBS volume), or data transfer overages.
2. On-Demand Instance Pricing – On-demand instance pricing varies depending on the type of instance required. Check the EC2 Instance Types page for information about each type of an instance. Likely, the High-CPU Medium or Extra Large instance are the ones that will be used the most for testing, and they costs $0.165/hr and $0.660/hr respectively. The High-CPU Medium instance has 1.7GB of memory available, and 5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each, while the Extra Large instance has 7GB of memory and 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each). Amazon prices in full-hour increments, and the pricing begins when the instance is launched, and ends when the instance is terminated. The default instance limit imposed by Amazon is 20, however Amazon can be contacted with a request to increase the instance limit, and they should approve the request within a short amount of time.
3. EBS storage – Amazon charges $0.125 per GB per month of data stored as snapshots (i.e. for the AMI). Amazon will continue charging until the AMI is deleted. Amazon will also charge $0.10 per GB per month of provisioned EBS storage (if a volume is provisioned with each instance), and $0.10 per 1 million I/O requests.
4. Data Transfer – The first 1 GB/month is free. Overages are charged at $0.120 per GB.
5. Spot Instance Requests – Amazon also allows bidding for unused EC2 capacity, which usually results in lower-cost instances. Instances are charged at the current spot price, up to the specified maximum. Amazon will only service the request for instances if it can start all of the requested instances. Spot instances will run until they are terminated, or the current spot price exceeds the maximum spot price specified in the request, making spot instances risky for instances that need to remain running for extended periods of time. Current spot price is $0.018/hr for a High-CPU extra large instance.
Spot instances can be a considerably cheaper alternative to on-demand instances. I have run 3 different spot instances (Small, High CPU Medium, and High CPU XL) for the price of $0.01, $0.02, and $0.07/hr. On-demand instances would have cost $0.080, $0.165, and $0.660/hr. These are considerable savings when taking into account that a stress test would require 20 or more instances.