Skip to content
This repository has been archived by the owner on Nov 29, 2018. It is now read-only.

Selenium 2.0 RC2 gets constantly slower during test suite #1854

Closed
lukeis opened this issue Mar 2, 2016 · 49 comments
Closed

Selenium 2.0 RC2 gets constantly slower during test suite #1854

lukeis opened this issue Mar 2, 2016 · 49 comments

Comments

@lukeis
Copy link
Member

lukeis commented Mar 2, 2016

Originally reported on Google Code with ID 1854

With Selenium 2.0 RC2 the tests seem to be noticeably slower than with Selenium 2.0
b3. Additionally the performance seems to degrade during the run of the test suite.
The browser memory usage increases constantly.

What steps will reproduce the problem?
1. Start a test suite based on the Selenium-API which run at least some minutes. I
don't known, if it is important: My tests do not close the browser window between the
execution of the tests.

What is the expected output? What do you see instead?
My tests run fine (and fast) with Selenium 2.0 b3. With Selenium 2.0 RC2 the tests
getting slower and slower and the browser memory usage increases. Finally the browser
windows closes and Selenium does not open a new one. So the the tests run up to some
point and all subsequent tests fail, due to the closed browser window.

What version of the product are you using? On what operating system?
Selenium 2.0 RC2, on Windows XP SP3, with Firefox 3 & 4 and Chrome 12.

Please provide any additional information below.
For me it looks like some kind of memory leak.

Reported by markus@markusheiden.de on 2011-06-15 16:09:29

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

Sounds eerily similar to the issue I reported with HtmlUnit (1777).

Reported by gmike1 on 2011-06-15 16:36:58

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

Yeah we have this same problem.  We have a long running test that usually takes about
2 hours.  After 7 hours it still doesn't complete so we end up having to cancel the
test.  Something is definitely up

Reported by japearson@agiledigital.com.au on 2011-06-17 00:44:18

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

Just in-case it's not clear from this ticket, this relates to Selenium Server.  I have
no idea if this affects the non-RemoteWebDriver selenium.

Reported by japearson@agiledigital.com.au on 2011-06-17 00:55:29

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

We have had to implement closing the browser (driver.quit) after each test case when
using RemoteWebDriver (with both desired capabilities of internetExplorer and iphone).
 Otherwise the increased memory usage eventually crashes/hangs the browser.  We do
NOT see this issue with non-RemoteWebDriver test cases.

Reported by mclean.bryce on 2011-06-17 15:00:58

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

I forgot to mention, that I use Java Selenium RC and the Selenium standalone server.
On our tests servers we got OutOfMemoryExceptions so the memory leak may be in the
java code.

Reported by markus@markusheiden.de on 2011-06-17 18:12:08

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

I'm observing similar thing, opened this ticket long time ago http://jira.openqa.org/browse/SRC-732
, might be related

Reported by shurikk on 2011-06-17 18:37:16

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

This simply makes an upgrade to Selenium 2 impossible at the moment.

Our setup:

Plain Selenium API (no webdriver or webdriverbackselenium), Selenium RC, JAVA, Firefox
3.x, no OOM recieved (we might simply not hit the roof before the tests finish though).

Before rc2-jar: ~ 1h 50m.
After rc2-jar: ~6h.

So roughly 3x the time on all tests. The tests result itself is same as always, meaning
it doesn't seem to be affected by the slowdown at all, it just takes longer.

Reported by manadrainer on 2011-06-20 06:57:06

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

Is this issue with Firefox 3 too? FF4 leaks memory like crazy and will not handle 2
hours in Ine browser, with or without selenium

Reported by kristian.rosenvold on 2011-06-20 12:10:14

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

I have ~ 80 selenium tests that run within a suite (I also reuse the browser window
by resetting my app state - clearing some fields, dynamic refresh some components etc.)
to save time.  It was taking 8.25 minutes on my integration server, and 11.5 minutes
on my laptop with Selenium 2.0b3 (using Firefox 4 and webdriver).  With Selenium 2.0rc2
it is taking ~ 140 minutes on my laptop!!  I didn't look at memory consumption (I'll
launch it again tonight and watch this).  But I did notice a lot more activity in the
page going on.  For example if you have a selection drop-down with 50 items and you
want item 50, it manually drops through each one until it gets to the 50th one.  With
b3 if it was in the DOM it clicked it.  All of my tests and helpers are using xpath
lookup.

Reported by jadrake75 on 2011-06-20 14:11:56

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

I just saw defect 185 - http://code.google.com/p/selenium/issues/detail?id=185

I know my helpers will do a lot of checks for elements (as well since my application
is GXT based I often need to wait for elements to appear so have waitFor() type loops
checking for the element, sleeping and checking again in a small loop.  Is it possible
that each of these "failed to find elements" is taking a screen shot?  that would certainly
leak a lot of memory (esp. if they do not release the file resources).  

Reported by jadrake75 on 2011-06-20 14:17:40

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

I have observed the same behaviour. We launch new browsers for a group of tests and
tests get slower after 10-15 test have ran. It's particularly slow when doing waiting
for elements as per defect 185.

Reported by stan.wozniak on 2011-06-20 16:08:08

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

Is there a way (short of debugging) to see if it is trying to capture screenshots and
how often this is occuring? (some sort of logger output).

Stangely enough I am actually seeing reasonable performance on my Linux server running
2.0rc2 and Firefox 4.0.1. My suite was taking ~ 8.25 minutes on 2.0b3 and it is now
taking 13.0 minutes with 2.0rc2 (note I have two failures which I am working on which
likely add a little overhead - call it max 1.0 minutes).  This system is relatively
low on memory (2.0GB ram + 6.0GB swap).  However it never peaked beyond 1.8GB even
though it is running MySQL, glassfish (with hudson + 3 versions of my GWT app) and
Selenium through the browser in a VNC Server session)

However on my windows 7-64bit system with 6GB of RAM, again using Firefox 4.0.1 and
2.0rc2 I am seeing drastically different times ranging from 18 to 23 minutes (with
100% passrates) vs. 11.5 minutes before (not sure what caused the 2 hr run last time).
 Memory is getting up there (5.5GB total consumed) but Firefox is never exceeding 400MB
and Selenium is peaking at around ~ 500MB.  

One thing I am noticing is with 2.0rc2 it seems some elements are randomly "highlighted".
 As an example, if I say go to this text field and clear the value, I'll see the table
that is about 50px below it get a row table header get highlighted (I am using GXT)
almost like a mouse was moved over that table row.  I am also seeing cases where with
drop downs for example once displayed it will go through each value to get the to value
specified in the xpath. These changes might account for the differences as in general
the data entry (which most of my tests are around) seem to be reacting much slower.
 Not sure if this is by design or not.

Reported by jadrake75 on 2011-06-21 17:43:14

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

It could be related to screenshots actually.  The selenium test that is running slowly
for us is a crawler. It goes through all the pages it can find on our site and takes
screenshots of every page.  It also uses the same selenium session though. So I cannot
say which one is causing the slow downs.

All our other selenium tests have short sessions and only take screenshots on error.

Reported by japearson@agiledigital.com.au on 2011-06-22 01:01:10

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

I have set up a testcase to try to reproduce this and it appears to be related to something
your test does, it doesn't happen with my google case ;)

Can someone who is experiencing the slowdown over time do the following:

jps -l and identify process id of the "server standalone" process.

Run one test, then run "jstack <process id> >good.txt" afterwards. Run enough tests
go get a decent slowdown, then do "jstack <process id> >bad.txt". Attach these files
here please. You may also do a few "jstack command <process_id> >>whileRunning.txt"
while a sluggish test is running and attach it.

Reported by kristian.rosenvold on 2011-06-23 06:33:49

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

Reported by dave.hunt on 2011-06-23 16:44:55

  • Labels added: Component-Grid, Component-RC

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

We saw this issue at Mozilla when upgrading to Selenium Grid 2. I was able to replicate
the issue with a simple test.

Setup 1:
Grid 1 hub running on 1.0.8
Grid 1 node running on 2.0b3
A test with 500 commands took an average of 15.878 seconds

Setup 2:
Grid 2 hub running on 2.0rc2
Grid 2 node running in RC mode on 2.0rc2
A test with 500 commands took an average of 74.832 seconds (almost 5x slower)

I also tried combining Grid 2 hub with Grid 1 nodes (2.0b3) and the same test took
on average 18.81 seconds.

Thanks Kristian for the steps to debug. I will take a look into generating something
today. As it is, we really can't use Grid 2.

Reported by dave.hunt on 2011-06-23 17:07:38

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

I have attached the output from jstack for both hub and rc node, for the following:

* After 10 commands
* After 500 commands
* After 1000 commands
* While running slowly (during a 1000 command run)

Let me know if you need anything else.

Reported by dave.hunt on 2011-06-23 19:42:07


- _Attachment: [hub_after_10_commands.txt](https://storage.googleapis.com/google-code-attachments/selenium/issue-1854/comment-17/hub_after_10_commands.txt)_ - _Attachment: [rc_after_10_commands.txt](https://storage.googleapis.com/google-code-attachments/selenium/issue-1854/comment-17/rc_after_10_commands.txt)_ - _Attachment: [hub_after_500_commands.txt](https://storage.googleapis.com/google-code-attachments/selenium/issue-1854/comment-17/hub_after_500_commands.txt)_ - _Attachment: [rc_after_500_commands.txt](https://storage.googleapis.com/google-code-attachments/selenium/issue-1854/comment-17/rc_after_500_commands.txt)_ - _Attachment: [hub_after_1000_commands.txt](https://storage.googleapis.com/google-code-attachments/selenium/issue-1854/comment-17/hub_after_1000_commands.txt)_ - _Attachment: [rc_after_1000_commands.txt](https://storage.googleapis.com/google-code-attachments/selenium/issue-1854/comment-17/rc_after_1000_commands.txt)_ - _Attachment: [hub_while_running_slowly.txt](https://storage.googleapis.com/google-code-attachments/selenium/issue-1854/comment-17/hub_while_running_slowly.txt)_ - _Attachment: [rc_while_running_slowly.txt](https://storage.googleapis.com/google-code-attachments/selenium/issue-1854/comment-17/rc_while_running_slowly.txt)_

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

Hello,

have you tried with just plain RC from 2rc2 ?
"I also tried combining Grid 2 hub with Grid 1 nodes (2.0b3) and the same test took
on average 18.81 seconds." seems to indicate it is a rc issue, not a grid issue.

thanks,
françois

Reported by francois.reynaud on 2011-06-23 19:51:15

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

Dave, are you basically running single thread (i.e. one client, one active browser)
in this testcase ?

Reported by kristian.rosenvold on 2011-06-23 20:27:29

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

Hey Kristian. Yes, to simplify reproducing the issue I ran one test in one RC node with
one browser environment.

Reported by dave.hunt on 2011-06-23 20:39:00

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

Dave:

Could you try running Grid 1 with the 2.0 RC2 JAR?  I highly skeptical that this has
anything to do with grid and is actually an issue with the Selenium Server.  If you
see the problem with Grid 1 and a 2.0 RC2 server, we could verify or debunk that.

It probably makes sense to use the RC3 JAR, too, just to make sure you have the latest
available release.

Reported by nirvdrum on 2011-06-23 21:01:45

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

Can I run Grid 1 with the RC2 JAR? As I mentioned I couldn't replicate the issue with
Grid 2 hub and Grid 1 nodes (2.0b3) so I am in agreement. The RC3 JAR has a (fixed)
NPE with launching the hub so I'm avoiding that atm.

Reported by dave.hunt on 2011-06-23 21:09:29

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

The thread dumps seem perfectly healthy to me, it looks like a healthy system. This
rules out the real baddies ;)

Next step is to run with jvisualvm attached to the hub and the rc. Just start "jvisualvm"
and attach to the rc and/or the grid process (maybe one at a time) while repeating
the same exercise. Look at the "monitors" tab.

I am especially interested if you manage to observe any corrolations between slowdowns
and "changes" in any of the monitors. All four displays are relevant and as long as
you're running a reasonably stable test load the curves should not change much over
time. The best way to see this is if you're able to apply a consistent sustained load
that just runs and runs.


Reported by kristian.rosenvold on 2011-06-23 21:16:25

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

Okay, I'm getting somewhere! The results from jvisualvm were not concerning at all,
but AutomatedTester had the inspired idea that the issue may be in the atoms... 

We already had the thought that it may be related to specific commands, and I seem
to have narrowed it down to isVisible. I hope this helps! :)

Reported by dave.hunt on 2011-06-24 22:47:25

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

Just to give some numbers - the test suite we have which contains 50+ test cases used
to take around 3 hours to complete on selenium-server-standalone-b3. With rc2, it's
now taking more than 16 hours to complete, which is unacceptable. Hence this is a serious
issue, and the priority should be increased to Critical.

Reported by anishsam2008 on 2011-06-25 02:42:39

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

While I appreciate the interest, the "my tests are slow, too" comments are no longer
helpful and we're amply aware of the severity of the issue.  Any profiling information
you have that may indicate the source of the problem would be very much appreciated,
however.  It looks like Dave and David have made good progress there.

Reported by nirvdrum on 2011-06-25 02:58:03

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

I'm also quite certain the server is ok.

isVisible is definitely a candidate since we're not using any kind of isVisible logic
in our tests, and we are *not* experiencing any slowdown on our *huge* test suite.

(Technically, since we have "solved" the halting problem, we have very few "waiting"
issues other all over).

Reported by kristian.rosenvold on 2011-06-25 07:19:36

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

Yes I would definitely think this is a possibility here as well.  My GXT/GWT
application test's use a lot of visibility checks as elements load
dynamically etc.

Is there a class we think could be responsible?  I use maven, but if I add a
class to my eclipse project and move it up in the classpath I can run with
that.

-Jason

Reported by jadrake75 on 2011-06-25 17:18:14

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

Reported by nirvdrum on 2011-06-26 04:46:46

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

Reported by nirvdrum on 2011-06-26 04:50:13

  • Labels added: Priority-Critical, Performance
  • Labels removed: Priority-Medium, Component-Grid

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

I have had a look at isShown() method in dom.js as that is the command that was showing
the issue. I have created a simple tests and outputs the timedelta. I have attached
txt files with the results and a simple python script to replicate the issue.

isShown() doesnt appear to have changed so tried swapping the Closure library as that
has changed between b3 and Rc2. 

This is what I have got so far. I will continue looking at it.

Reported by david.burns@theautomatedtester.co.uk on 2011-06-26 04:58:42


- _Attachment: [b3.txt](https://storage.googleapis.com/google-code-attachments/selenium/issue-1854/comment-32/b3.txt)_ - _Attachment: [rc2.txt](https://storage.googleapis.com/google-code-attachments/selenium/issue-1854/comment-32/rc2.txt)_ - _Attachment: [rc3.txt](https://storage.googleapis.com/google-code-attachments/selenium/issue-1854/comment-32/rc3.txt)_ - _Attachment: [changinglibrary.txt](https://storage.googleapis.com/google-code-attachments/selenium/issue-1854/comment-32/changinglibrary.txt)_ - _Attachment: [changingcompiler.txt](https://storage.googleapis.com/google-code-attachments/selenium/issue-1854/comment-32/changingcompiler.txt)_ - _Attachment: [1854.py](https://storage.googleapis.com/google-code-attachments/selenium/issue-1854/comment-32/1854.py)_

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

Reported by david.burns@theautomatedtester.co.uk on 2011-06-26 15:53:56

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

I'd suggest a methodical approach to dealing with this. We know that rc2 ain't grand,
and that b3 was fine. I'd start by doing a binary search through the versions to try
and pinpoint what change started to introduce this problem (tediously long-winded though
it might be)

Can someone please try that out?

Reported by simon.m.stewart on 2011-06-27 12:36:36

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

I am working on a proxy for WebDriver/WebElement that records the timings for each invocation
of everything, aggregates statistics and saves a CSV file. I'm planning to make it
capable of comparing N different runs.  Should be done before 4th of july ;)

Reported by kristian.rosenvold on 2011-06-27 12:41:48

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

Looks like the issue even persists with rc3 and the tests run dead slow with IE8. Any
help appreciated.

Reported by javasparkle on 2011-06-27 15:45:43

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

Reported by david.burns@theautomatedtester.co.uk on 2011-06-27 18:06:35

  • Status changed: Accepted

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

The issue came in between B3 and RC1. Unfortunately there are a lot of commits in that
range that we need to go through

Reported by david.burns@theautomatedtester.co.uk on 2011-06-27 20:23:36

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

It appears that r12294 is the culprit for this using my examples above.

Reported by david.burns@theautomatedtester.co.uk on 2011-06-28 17:17:57

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

I am also seeing eye-gougingly slow performance with SeleniumServer 2.0rc3, but I must
ask: is it *really* necessary for Firefox to be checking the grammar and spelling of
the Command History?

Reported by rankincj on 2011-06-28 22:02:49

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

I have applied r12653. I can't see what this line is doing and removing it causes performance
be flat instead of degrade.

Reported by david.burns@theautomatedtester.co.uk on 2011-06-28 23:10:06

  • Status changed: Started

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

In response to comment 41: Firefox will be checking the spelling of words by default.

Reported by simonstewart@google.com on 2011-06-29 02:27:14

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

This has been rolled out into the Mozilla grid and we have seen a marked improvement!

Reported by david.burns@theautomatedtester.co.uk on 2011-07-08 13:14:22

  • Status changed: Fixed

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

this fix actually breaks some test in IE and slow down the test as well, if the web
page contains frame or iframe. It comment out //win[this.uniqueId] = 1. This will cause
a frame to do modifyWindowToRecordPopUpDialogs on every test step, this will slow down
the test process, and also cause JS error when open new pages from a frame.

Reported by adwu73@hotmail.com on 2011-11-10 01:29:12

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

please can I have a reduced test case so I can reproduce our IE tests with frames all
pass at the moment 

Reported by david.burns@theautomatedtester.co.uk on 2011-11-10 08:53:14

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

Tks for quick feedback, I will debug further and get back to you asap

Reported by adwu73@hotmail.com on 2011-11-10 23:32:37

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

Hi David,

I don't have a reduced test but I can reproduce this on my web app very time. I read
through the related code, here is how the issues happen in my web pages. 

win[this.uniqueId] now is used for deciding whether the web page needs modification
and poll for load. 

In line 710 in the browserbot.js, win[uniqueId] is used to determine whether WindowsNeedModifying.

In line 1102, the win[this.uniqueId] is used to determine whether the page is polling
for load if the target is not a frame. If the target is a frame, frameElement["frame"+this.uniqueId]
will be used as a marker.

It seems that is marker is set in modifySeparateTestWindowToDetectPageLoads. In line
839, windowObject[this.uniqueId] = marker, the flag is set when the target is a window;
and in 831 & 835, frameElement["frame"+this.uniqueId] = marker; frameElement[this.uniqueId]
= marker; two flags is set when the target is a frame.

The error in my app is two folds. First, when my page has iframes, the frame is modified,
and the flag is set by line 835, frameElement[this.uniqueId] = marker. However, when
page is updated, WindowsNeedModifying is called, win[uniqueId] didn't detected the
flag. So the window has been modified twice which cause the JS error when opening new
windows within an iframe. I fixed this by adding an another flag set action after line
835: windowObject[this.uniqueId] = marker.

The second issue is caused by somehow after modifyWindowToRecordPopUpDialogs, but isPollingForLoad
return ture, modifySeparateTestWindowToDetectPageLoads never got executed, so the flag
is not set since line 346 is commented:  //win[this.uniqueId] = 1; I fixed this by
adding a flag set action within isPollingForLoad.

Now my app can work under Selenium after this two modifications, however, I am not
sure whether this is the right way to do, and whether this will bring other issues
in the future. So Dave, can you help me on this? Thanks!

From my point view, maybe it's a better idea to have two seperate flags for modifyPopup
and isPollingForLoad. What do you think? 

Best Regards!

Adam

Reported by adwu73@hotmail.com on 2011-11-14 00:29:13

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

I am using IE 8 on Win 7, by the way.

Reported by adwu73@hotmail.com on 2011-11-14 00:30:41

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

To fix the second issue, the fix actuals lies in modifySeparateTestWindowToDetectPageLoads,
not isPolloingForLoad, my mistake. Like below:

 var oldMarker = this.isPollingForLoad(windowObject);
    if (oldMarker) {

        //Added By Adam Wu in 2.3.2
        windowObject[this.uniqueId] = oldMarker;

        LOG.debug("modifySeparateTestWindowToDetectPageLoads: already polling this
window: " + oldMarker);
        return;
    }

Reported by adwu73@hotmail.com on 2011-11-14 04:08:24

@lukeis
Copy link
Member Author

lukeis commented Mar 2, 2016

Reported by luke.semerau on 2015-09-17 18:13:03

  • Labels added: Restrict-AddIssueComment-Commit

@lukeis lukeis closed this as completed Mar 2, 2016
@SeleniumHQ SeleniumHQ locked and limited conversation to collaborators Mar 4, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant