Opened 3 years ago

Closed 3 years ago

#32913 closed Cleanup/optimization (fixed)

test_glob inside utils_tests.test_autoreload.WatchmanReloaderTests is flaky

Reported by: Abhyudai Owned by: Nick Pope
Component: Testing framework Version: dev
Severity: Normal Keywords:
Cc: Tom Forbes Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The test sometimes raises an exception. The traceback on my local setup:

ERROR: test_glob (utils_tests.test_autoreload.WatchmanReloaderTests)                                                                                                                                                                                                          
----------------------------------------------------------------------                                                                                                                                                                                                        
Traceback (most recent call last):                                                                                                                                                                                                                                              File "../django/django/utils/autoreload.py", line 529, in update_watches                                
    self._update_watches()                                                                                                                                                                                                                                                      File "../django/django/utils/autoreload.py", line 518, in _update_watches                               
    self._watch_root(root)                                                                                                                                                                                                                                                      File "../django/django/utils/autoreload.py", line 438, in _watch_root                                   
    result = self.client.query('watch-project', str(root.absolute()))                                                                                                                                                                                                           File "../django/venv/lib/python3.8/site-packages/pywatchman/__init__.py", line 1052, in query
    res = self.receive()                                                                                                                                                                                                                                                        File "../django/venv/lib/python3.8/site-packages/pywatchman/__init__.py", line 960, in receive
    result = self.recvConn.receive()                                                                                                                                                                                                                                          
  File "../django/venv/lib/python3.8/site-packages/pywatchman/__init__.py", line 695, in receive                                                                                                                                                     buf = [self.transport.readBytes(sniff_len)]                                                                                        
  File "../django/venv/lib/python3.8/site-packages/pywatchman/__init__.py", line 344, in readBytes                                                                                                                                                   raise SocketTimeout('timed out waiting for response')                                                                              
pywatchman.SocketTimeout: timed out waiting for response, while executing ('watch-project', '/tmp/django_p9emwuc2/tmpdx4bsuny')                                                                                                                                                                                            
The above exception was the direct cause of the following exception:                                                                                                                                                                                                                                                       
Traceback (most recent call last):                                                                                                                                                                                                                                              File "/usr/lib/python3.8/unittest/case.py", line 60, in testPartExecutor                                                             
    yield                                                                                                                                                                                                                                                                       File "/usr/lib/python3.8/unittest/case.py", line 676, in run                                                                         
    self._callTestMethod(testMethod)                                                                                                                                                                                                                                            File "/usr/lib/python3.8/unittest/case.py", line 633, in _callTestMethod                                                             
    method()                                                                                                                                                                                                                                                                  
  File "/usr/lib/python3.8/unittest/mock.py", line 1325, in patched                                                                                                                                                                                                               return func(*newargs, **newkeywargs)     
  File "../django/tests/utils_tests/test_autoreload.py", line 542, in test_glob                                                                                                                                                                  
    with self.tick_twice():                                                                                                                                                                                                                                                   
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__                                                                                                                                                                                                             
    return next(self.gen)                                                                                                                                                                                                                                                       File "../django/tests/utils_tests/test_autoreload.py", line 531, in tick_twice
    next(ticker)                                                                                                                                                                                                                                                              
  File "../django/django/utils/autoreload.py", line 556, in tick                                          
    self.update_watches()                                                                                                                                                                                                                                                     
  File "../django/django/utils/autoreload.py", line 532, in update_watches                                
    if self.check_server_status(ex):                                                                                                                                                                                                                                          
  File "../django/django/utils/autoreload.py", line 584, in check_server_status
    raise WatchmanUnavailable(str(inner_ex)) from inner_ex                                                                             
django.utils.autoreload.WatchmanUnavailable: timed out waiting for response, while executing ('watch-project', '/tmp/django_p9emwuc2/tmpdx4bsuny')

Till now, what I have observed is this only occurs when at least all the tests inside the test_utils.test_autoreloader.WatchmanReloaderTest class are run at once. I made a script that you may use to verify this:

#!/usr/bin/bash

iteration=0
while true; do
    iteration=$((iteration + 1))

    ./runtests.py utils_tests.test_autoreloader.WatchmanReloaderTests
    if [[ $? -ne 0 ]]; then
        echo -e "-----------\nit failed in the $iteration iteration--------------------\n"
        break
    fi
    echo -e "-------------\nrunning the $iteration iteration\n-----------------\n"
done

The flakiness is highly rare when executing only running the tests under WatchmanReloaderTests, in my case it sometimes took close to 2000 operations to fail. Although, when the whole test suite is run at once, the problems seems to occur more frequently(it occurred twice out of something like 10 times on my end)

Change History (7)

comment:1 by Mariusz Felisiak, 3 years ago

Cc: Tom Forbes added

comment:2 by Carlton Gibson, 3 years ago

Resolution: needsinfo
Status: newclosed

I'm going to have to say needsinfo for now.

Looks like it's an issue in pywatchman:

  File "../django/venv/lib/python3.8/site-packages/pywatchman/__init__.py", line 344, in readBytes                                                                                                                                                   raise SocketTimeout('timed out waiting for response')                                                                              
pywatchman.SocketTimeout: timed out waiting for response, while executing ('watch-project', '/tmp/django_p9emwuc2/tmpdx4bsuny')   

Is that timeout configurable maybe? 🤔

If we can identify a specific issue in Django, or a specific change we can make to work around the issue, we can reopen.

comment:3 by Nick Pope, 3 years ago

Has patch: set
Resolution: needsinfo
Status: closednew
Triage Stage: UnreviewedAccepted

Re-opening this as I've been experiencing this same issue.

The issue is that WatchmanReloaderTests.setUp() sets self.reloader.client_timeout = 0.1 to speed up the tests.

It seems that this is too low as at 0.1s the following fail consistently for me on every execution:

ERROR: test_glob (utils_tests.test_autoreload.WatchmanReloaderTests)
ERROR: test_glob_recursive (utils_tests.test_autoreload.WatchmanReloaderTests)
ERROR: test_multiple_globs (utils_tests.test_autoreload.WatchmanReloaderTests)
ERROR: test_multiple_recursive_globs (utils_tests.test_autoreload.WatchmanReloaderTests)
ERROR: test_nested_glob_recursive (utils_tests.test_autoreload.WatchmanReloaderTests)

Changing to 0.2s the following fail consistently:

ERROR: test_glob (utils_tests.test_autoreload.WatchmanReloaderTests)
ERROR: test_glob_recursive (utils_tests.test_autoreload.WatchmanReloaderTests)

Changing to 0.5s the following fail intermittently:

ERROR: test_glob (utils_tests.test_autoreload.WatchmanReloaderTests)

In the end I settled for 0.6s which seemed to work consistently. See PR.

Unfortunately, much like the tests for caching, I think we're going to have to suffer longer running tests.

comment:4 by Nick Pope, 3 years ago

Strangely, with 0.6s running just the tests for utils_tests.test_autoreload.WatchmanReloaderTests works consistently.

However, when running the entire test suite, I still get the following failures:

ERROR: test_glob (utils_tests.test_autoreload.WatchmanReloaderTests)
ERROR: test_glob_recursive (utils_tests.test_autoreload.WatchmanReloaderTests)

Bumping to 1 second, and even 1.5 seconds, seemed to result in the same tests failing intermittently.

I seem to have had a reasonable amount of success with 2 seconds, so I've updated the PR with that. (It's still better than 5 seconds which is the default.)

Perhaps something else is at play that makes it worse when there are more tests being executed, but I'm not sure what.

I'm wondering if we shouldn't allow the DJANGO_WATCHMAN_TIMEOUT environment variable to be applicable when running the test suite also? That'll allow users to easily overriding the value when running Django's test suite. I added a fixup for that to the PR which can be squashed or removed as required.

comment:5 by Nick Pope, 3 years ago

For further info, I bisected this to ed3c59097a01ed3f32f8a8bed95307fb5c181251.

That was when I started getting failures for running just the utils_tests.test_autoreload.WatchmanReloaderTests.

This reduced the timeout from the pywatchman default of 1.0 to 0.1 for the tests.

I still get timeouts on the commit before that when running the entire test suite, however, so 1 second isn't enough.

comment:6 by Mariusz Felisiak, 3 years ago

Owner: changed from nobody to Nick Pope
Status: newassigned
Triage Stage: AcceptedReady for checkin

comment:7 by GitHub <noreply@…>, 3 years ago

Resolution: fixed
Status: assignedclosed

In 6b513f01:

Fixed #32913 -- Made watchman reloader tests less flaky.

Note: See TracTickets for help on using tickets.
Back to Top