Because of my work, I studied James 3.5 distributed version. In order to confirm how much concurrent users it can support under the existing hardware configuration, I made a stress test on it with JMeter.
1. Hardware
Hardware 1:Two physical machines, and seven virtual machine nodes are virtualized. Each virtual machine node is 8g / 4C, local disk
Hardware 2:Aliyun cloud 6 ECS( ecs.hfg6 .large), 2c/8G.
2. James Setup
In order to use James better, I compiled james with the source code, and deployed james with the traditional Java application deployment method, not using docker.
2.1 Step 1: download the source code
Clone the code of the James master branch to the local:
git clone https://github.com/apache/james-project.git
2.2 Step 2: compile and configure
Run the command mvn package , and after about 2 hours, the compilation is successful.
2.2.1 Copy server / container / Guice / Cassandra rabbitmq Guice / target to a directory (assuming it is copied to the dist directory
#copy james
cp server/container/guice/cassandra-rabbitmq-guice/target/james-server-cassandra-rabbitmq-guice.jar dist/
cp -R server/container/guice/cassandra-rabbitmq-guice/target/james-server-cassandra-rabbitmq-guice.lib dist/
#copy james cp server/container/guice/cassandra-rabbitmq-guice/target/james-server-cassandra-rabbitmq-guice.jar dist/ cp -R server/container/guice/cassandra-rabbitmq-guice/target/james-server-cassandra-rabbitmq-guice.lib dist/
2.2.2. Copy the configuration file of James rabbit to dist directory
cp -R dockerfiles/run/guice/cassandra-rabbitmq/destination/ dist
2.2.3. Copy the CLI to dist
#copy cli&lib cp -R server/container/cli/target/james-server-cli.lib dist/ cp server/container/cli/target/james-server-cli.jar dist/
2.2.4. Modify run_ james.sh, the script content is modified as follows
java -Dlogback.configurationFile=conf/logback.xml -Dworking.directory=./ $JVM_OPTIONS $GLOWROOT_OPTIONS -jar james-server-cassandra-rabbitmq-guice.jar
2.2.5 Start James
After modifying the configuration script of Cassandra / elastic search / rabbitmq in conf, execute run_ james.sh and the James will be started.
If there is a problem in the startup process, you can perform the corresponding troubleshooting according to the error prompt.
The above steps can be completed at one time by executing the following script:
#copy james cp server/container/guice/cassandra-rabbitmq-guice/target/james-server-cassandra-rabbitmq-guice.jar dist/ cp -R server/container/guice/cassandra-rabbitmq-guice/target/james-server-cassandra-rabbitmq-guice.lib dist/ #copy cli&lib cp -R server/container/cli/target/james-server-cli.lib dist/ cp server/container/cli/target/james-server-cli.jar dist/ #copy conf files cp -R dockerfiles/run/guice/cassandra-rabbitmq/destination/ dist/
3.Test Process
3.1JMeter Setup
In order to simulate the user's real access scenarios, I used jammer to conduct 50 concurrent and 100 concurrent simulation stress tests on users, and the steps are as follows:
3.1.1 Configure JMeter
3.1.1.1 Create Mail Read Sampler
Right click on the ad group to add a mail read sampler
3.1.1.2 Configure mail server information
After adding, fill in the mail server information in the corresponding position, as shown in the figure below.
Note: parameters such as serverhost / port / username / password can be parameterized to achieve random values.
3.1.1.3 Configure user concurrency
Select thread group from the right node and configure number of threads (users) / ramp up period (seconds) / loop count respectively
Now, JMeter's mail reading configuration is complete.
3.1.1.4 Save Configuration
Save test plan as stress.config.jmx For subsequent use.
3.2 Run the test
Through the previous section, we have completed the preparation of the stress test. The simulation is that 50 concurrent users , and the 50 requests are sent out within 10 seconds. The loop round of stress test is 40 times.
The pressure test can be started directly through the GUI interface (but it is highly not recommended). You’d better closing the GUI and then start the pressure test through the command line. The startup command is as follows:
jmeter -n -t stress.config.jmx -l testlog.csv
3.2.1 Test scenarios
The following two groups of tests were carried out.
Test scenario 1: the number of inbox messages of users in James is 0
Test scenario 2: the number of inbox emails of users in James is 100+
In the above two scenarios, run the jam stress test script in Section 3.2 respectively to get the specific performance of James under 100 concurrent users.
3.2.2 Hardware configuration I test results (self built virtual machine, 100 concurrent)
3.2.2.1 When there is no mail in inbox
3.2.2.2 When there are 100 + messages in inbox
3.2.2.3 Test result analysis of hardware configuration one
It can be seen from the results that the error rate is very high whether it is one email or 100 + e-mails, especially when it is 100 +, the error rate reaches more than 80%, which is obviously unacceptable.
3.2.3 Hardware configuration 2 test (Aliyun ECS)
In order to confirm whether it's James or the hardware, I bought six ECSs virtual machines on Aliyun cloud, configured as 2C / 8g.
3.2.2.1 When there is no mail in inbox
3.2.2.2 When there are 100 + messages in inbox
3.2.2.3 Test result analysis of hardware configuration two
The TPS and error rate of hardware configuration 2 are greatly improved compared with hardware environment 1. This shows that the environment I built by myself may not be adjusted well. It may be that the error rate caused by network or disk IO is very high, and the TPS is also very low, and the average response time is even 20s +. Next, we will further configure hardware 1 to confirm the problem.
4 Summary
In the low configuration environment of Aliyun cloud, it has been many times better than the self built virtual same-sex performance. The error rate is reduced to 0.x%, and the TPS can reach 10 / s.
Although the average response time is about 10s, if the machine configuration can be further improved, I believe that the test results will be further improved. I believe that the higher configuration can reach 6000 + req / s, the average response time is within 100ms, and the error rate is lower.
Many Thanks to Tellier’s(https://github.com/chibenwa) help during the who testing process.