Opened 11 months ago

Closed 10 months ago

Last modified 10 months ago

#27741 closed Bug (fixed)

staticfiles_tests test_corrupt_intermediate_files fails intermittently when run in parallel

Reported by: Chris Lamb Owned by: nobody
Component: contrib.staticfiles Version: 1.11
Severity: Release blocker Keywords:
Cc: David Sanders Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

======================================================================
ERROR: test_corrupt_intermediate_files (staticfiles_tests.test_storage.TestCollectionCachedStorage)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 329, in run
    testMethod()
  File "/tmp/buildd/python-django-1.11~alpha1/tests/staticfiles_tests/test_storage.py", line 296, in test_corrupt_intermediate_files
    self.hashed_file_path('cached/css/img/window.png')
  File "/tmp/buildd/python-django-1.11~alpha1/tests/staticfiles_tests/test_storage.py", line 24, in hashed_file_path
    fullpath = test.render_template(test.static_template_snippet(path))
  File "/tmp/buildd/python-django-1.11~alpha1/tests/staticfiles_tests/cases.py", line 38, in render_template
    return template.render(Context(**kwargs)).strip()
  File "/tmp/buildd/python-django-1.11~alpha1/django/template/base.py", line 207, in render
    return self._render(context)
  File "/tmp/buildd/python-django-1.11~alpha1/django/test/utils.py", line 107, in instrumented_test_render
    return self.nodelist.render(context)
  File "/tmp/buildd/python-django-1.11~alpha1/django/template/base.py", line 990, in render
    bit = node.render_annotated(context)
  File "/tmp/buildd/python-django-1.11~alpha1/django/template/base.py", line 957, in render_annotated
    return self.render(context)
  File "/tmp/buildd/python-django-1.11~alpha1/django/templatetags/static.py", line 105, in render
    url = self.url(context)
  File "/tmp/buildd/python-django-1.11~alpha1/django/templatetags/static.py", line 102, in url
    return self.handle_simple(path)
  File "/tmp/buildd/python-django-1.11~alpha1/django/templatetags/static.py", line 117, in handle_simple
    return staticfiles_storage.url(path)
  File "/tmp/buildd/python-django-1.11~alpha1/django/contrib/staticfiles/storage.py", line 162, in url
    return self._url(self.stored_name, name, force)
  File "/tmp/buildd/python-django-1.11~alpha1/django/contrib/staticfiles/storage.py", line 141, in _url
    hashed_name = hashed_name_func(*args)
  File "/tmp/buildd/python-django-1.11~alpha1/django/contrib/staticfiles/storage.py", line 365, in stored_name
    self.hashed_name(name, content=None, filename=intermediate_name)
  File "/tmp/buildd/python-django-1.11~alpha1/django/contrib/staticfiles/storage.py", line 101, in hashed_name
    raise ValueError("The file '%s' could not be found with %r." % (filename, self))
ValueError: The file 'cached/css/img/window.a836fe39729e.png' could not be found with <django.contrib.staticfiles.storage.CachedStaticFilesStorage object at 0x7f43b6114e90>.

----------------------------------------------------------------------
Ran 11906 tests in 171.474s

FAILED (errors=1, skipped=1152, expected failures=4)

This is from the 1.11 alpha1 tarball, rather than git itself.

Change History (28)

comment:1 Changed 11 months ago by Tim Graham

Cc: David Sanders added
Component: Uncategorizedcontrib.staticfiles
Severity: NormalRelease blocker
Summary: test_corrupt_intermediate_files (staticfiles_tests.test_storage.TestCollectionCachedStorage) fails with ValueError: The file 'cached/css/img/window.a836fe39729e.png' could not be foundstaticfiles_tests test_corrupt_intermediate_files fails in source distribution
Triage Stage: UnreviewedAccepted
Type: UncategorizedBug

comment:2 Changed 11 months ago by David Sanders

Going to need more information than that. What are the steps to reproduce?

comment:3 Changed 11 months ago by Tim Graham

Extracting the tarball and running the tests from there reproduced it for me.

comment:4 Changed 11 months ago by David Sanders

I can't reproduce with those instructions.

[root@fce6b18201fb tests]# ./runtests.py "staticfiles_tests.test_storage.TestCollectionCachedStorage"
Testing against Django installed in '/tmp/django/Django-1.11a1/django' with up to 2 processes
Creating test database for alias 'default'...
Creating test database for alias 'other'...
System check identified no issues (0 silenced).
...................
----------------------------------------------------------------------
Ran 19 tests in 0.802s

OK
Destroying test database for alias 'default'...
Destroying test database for alias 'other'...
root@fce6b18201fb tests]# ./runtests.py "staticfiles_tests.test_storage.TestCollectionCachedStorage.test_corrupt_intermediate_files"
Testing against Django installed in '/tmp/django/Django-1.11a1/django' with up to 2 processes
Creating test database for alias 'default'...
Creating test database for alias 'other'...
System check identified no issues (0 silenced).
.
----------------------------------------------------------------------
Ran 1 test in 0.042s

OK
Destroying test database for alias 'default'...
Destroying test database for alias 'other'...

That's running with Python2.7 on a CentOS 7.2 Docker image.

Tim, are you reproducing by running all tests or just the targeted tests cases like in my example output above?

Last edited 11 months ago by David Sanders (previous) (diff)

comment:5 in reply to:  2 ; Changed 11 months ago by Chris Lamb

Replying to David Sanders:

Going to need more information than that. What are the steps to reproduce?

Sorry about that, I should have been clearer when referring to the tarball. I'm running the entire testsuite.

comment:6 in reply to:  5 ; Changed 11 months ago by David Sanders

Replying to Chris Lamb:

Replying to David Sanders:

Going to need more information than that. What are the steps to reproduce?

Sorry about that, I should have been clearer when referring to the tarball. I'm running the entire testsuite.

Does the error still occur when you run only the single test or the group of tests like in my example output above?

Entire testsuite takes my Macbook 20+ minutes to run at least, so I'd like to narrow down the reproduce case if it's to be tracked down in a reasonable amount of time.

comment:7 in reply to:  6 ; Changed 11 months ago by Chris Lamb

Does the error still occur when you run only the single test or the group of tests like in my example output above?

I'm clearly failing at communicating today - my bad! It only appears when I run the entire testsuite. If I just run the single test (or its parent group), the test does not fail.

comment:8 in reply to:  7 Changed 11 months ago by David Sanders

Replying to Chris Lamb:

Does the error still occur when you run only the single test or the group of tests like in my example output above?

I'm clearly failing at communicating today - my bad! It only appears when I run the entire testsuite. If I just run the single test (or its parent group), the test does not fail.

Well, that's good info, since it seems to show there's something elsewhere in the testsuite breaking it.

However, that's a real bummer considering how long it takes me to run the entire testsuite, tracking it down is going to be a PITA. Perhaps I can selectively delete test modules that are unlikely to be the interference and that'll speed things up...

comment:9 Changed 11 months ago by Simon Charette

David, you might want to try the --bisect or --pair options of runtests.py to speed up the process.

comment:10 in reply to:  9 ; Changed 11 months ago by David Sanders

Replying to Simon Charette:

David, you might want to try the --bisect or --pair options of runtests.py to speed up the process.

Thanks for the tip Simon, both of those were quite speedy compared to running the full test suite.

Unfortunately neither revealed any problems. Ran them both 3 times to ensure there wasn't anything inconsistent:

./runtests.py --pair "staticfiles_tests.test_storage.TestCollectionCachedStorage.test_corrupt_intermediate_files"
./runtests.py --bisect "staticfiles_tests.test_storage.TestCollectionCachedStorage.test_corrupt_intermediate_files"

I did, however, have to remove one group of tests because they were failing for me and seem unrelated - is anyone able to do run ./runtests.py "staticfiles_tests.test_views" and have it succeed? For me that always fails complaining about AUTH_USER problems. If run with other tests that error doesn't occur, only when it is run alone.

Running the entire testsuite for me hangs and I've killed it after 40 minutes without it reproducing the original error.

Chris, could you try running ./runtests.py "staticfiles_tests.test_views" and responding with your results? If that run causes an error, can you try deleting the test_views.py file under staticfiles_tests and then run the entire testsuite again and see if the failure still reproduces?

comment:11 in reply to:  10 ; Changed 11 months ago by Chris Lamb

Replying to David Sanders:

Chris, could you try running ./runtests.py "staticfiles_tests.test_views" and responding with your results?

Fails with https://gist.github.com/lamby/b424c3310fade40245f046b712913f44/raw

If that run causes an error, can you try deleting the test_views.py file under staticfiles_tests and then run the entire testsuite again and see if the failure still reproduces?

Fails with https://gist.githubusercontent.com/lamby/647a790e935ec11c25221e69bc951263/raw

comment:12 in reply to:  11 Changed 11 months ago by David Sanders

Replying to Chris Lamb:

Replying to David Sanders:

Chris, could you try running ./runtests.py "staticfiles_tests.test_views" and responding with your results?

Fails with https://gist.github.com/lamby/b424c3310fade40245f046b712913f44/raw

If that run causes an error, can you try deleting the test_views.py file under staticfiles_tests and then run the entire testsuite again and see if the failure still reproduces?

Fails with https://gist.githubusercontent.com/lamby/647a790e935ec11c25221e69bc951263/raw

Thanks! So it looks like staticfiles_tests.test_views can't be run by itself for anyone, that probably should be fixed in general since each test should be able to be run in isolation and not fail.

For the second case it looks like your setup didn't appreciate deleting that test for some reason, not sure why. Need to find a way to exclude that test to see if it's the one causing problems.

comment:13 Changed 11 months ago by Chris Lamb

Any progress? Would love to see this alpha version in Debian experimental...

comment:14 in reply to:  13 Changed 11 months ago by David Sanders

Replying to Chris Lamb:

Any progress? Would love to see this alpha version in Debian experimental...

Sorry Chris, I haven't been working on it. I've got too much else on my plate and no easy way to reproduce this (for some reason running the full testsuite from the tarball hangs for me).

We do know that the test in isolation succeeds, so it's some interaction with other tests, however both --bisect and --pair don't yield any information.

Can any Django test gurus take a look to try to find what the test interaction might be?

comment:15 Changed 11 months ago by Chris Lamb

If it helps, if I limit parallelism with "taskset --cpu-list 0" the testsuite does not fail.

comment:16 Changed 11 months ago by Tim Graham

I have a PR to fix the test_views failure when run in isolation.

I can still reproduce a non-deterministic failure of test_corrupt_intermediate_files when running only the staticfiles tests (actually only staticfiles_tests.test_storage). I use a loop to repeat the tests until it fails:

for i in {1..500};
do
  python tests/runtests.py staticfiles_tests.test_storage
  retcode=$?
  if [ $retcode = 1 ]; then
    echo $i;
    echo 'failed';
    break;
  fi
done

comment:17 Changed 11 months ago by David Sanders

Thanks Tim, that script helps me quickly reproduce on my setup.

I've further narrowed it and confirmed what Chris said about parallelism being a factor. If you use --parallel 1 to disable parallelism it won't reproduce.

I've narrowed it down to the parallel interaction between two test cases. It makes sense, the latter is also dealing with window.png and changes the modification timestamp on it which changes the hash.

 for i in {1..500};
 do
   python tests/runtests.py staticfiles_tests.test_storage.TestCollectionCachedStorage staticfiles_tests.test_storage.TestCollectionHashedFilesCache
   retcode=$?
   if [ $retcode = 1 ]; then
     echo $i;
     echo 'failed';
     break;
   fi
 done

That's all the digging I've got time for tonight. I'll start to look into a fix tomorrow, unless someone wants to pounce on it before me now that it's narrowed down.

comment:18 Changed 11 months ago by Claude Paroz

To isolate the test, you can get inspiration from tests.i18n.utils.RunInTmpDirMixin, that is create a temporary directory to copy the static tree in, override the STATICFILES_DIRS setting to include that temp dir, then remove the dir in the test tearDown.

comment:19 Changed 11 months ago by GitHub <noreply@…>

In 11856ea4:

Refs #27741 -- Fixed staticfiles_tests.test_views when run in isolation.

comment:20 Changed 11 months ago by Tim Graham <timograham@…>

In 7dda48b:

[1.11.x] Refs #27741 -- Fixed staticfiles_tests.test_views when run in isolation.

Backport of 11856ea44e4dd1ff2358c559c0d04ac848832c23 from master

comment:21 Changed 11 months ago by Tim Graham

Summary: staticfiles_tests test_corrupt_intermediate_files fails in source distributionstaticfiles_tests test_corrupt_intermediate_files fails intermittently when run in parallel

I'm feeling a bit lazy about isolating the tests using the approach Claude suggested. How about this patch, at least for now:

  • tests/staticfiles_tests/test_storage.py

    diff --git a/tests/staticfiles_tests/test_storage.py b/tests/staticfiles_tests/test_storage.py
    index 7e7b31f..11d1cf3 100644
    a b class TestCollectionCachedStorage(TestHashedFiles, CollectionTestCase): 
    290290        # high enough.
    291291        configured_storage.max_post_process_passes = 1
    292292        # File without intermediates that can be rehashed without a problem.
    293         self.hashed_file_path('cached/css/img/window.png')
     293        self.hashed_file_path('cached/css/fonts/font.svg')
    294294        # File with too many intermediates to rehash with the low max
    295295        # post-process passes.
    296296        err_msg = "The name 'cached/styles.css' could not be hashed with %r." % (configured_storage._wrapped,)

comment:22 in reply to:  21 Changed 11 months ago by Chris Lamb

Replying to Tim Graham:

I'm feeling a bit lazy about isolating the tests using the approach Claude suggested. How about this patch, at least for now:

Can you elaborate why this would fix it? :) In isolation it seems a bit arbitrary.

comment:23 Changed 10 months ago by Tim Graham

From David's comment:17, "I've narrowed it down to the parallel interaction between two test cases. It makes sense, the latter is also dealing with window.png and changes the modification timestamp on it which changes the hash." If the test uses some other file that's not modified in another test, there's no issue as far as I've tested.

comment:24 in reply to:  23 Changed 10 months ago by David Sanders

Replying to Tim Graham:

From David's comment:17, "I've narrowed it down to the parallel interaction between two test cases. It makes sense, the latter is also dealing with window.png and changes the modification timestamp on it which changes the hash." If the test uses some other file that's not modified in another test, there's no issue as far as I've tested.

I've actually got a patch for it isolated, Tim. Just finished it up the other day, but I've been too swamped to get it cleaned up and in a PR. Sorry about that.

Should I submit the PR for the stable/1.11.x branch and then do a separate PR for master?

comment:25 Changed 10 months ago by Tim Graham

The PR can go to master and I'll backport from there.

comment:26 in reply to:  25 Changed 10 months ago by David Sanders

Last edited 10 months ago by Tim Graham (previous) (diff)

comment:27 Changed 10 months ago by Tim Graham <timograham@…>

Resolution: fixed
Status: newclosed

In 8e3f9d3e:

Fixed #27741 -- Isolated TestCollectionHashedFilesCache in a tmpdir.

comment:28 Changed 10 months ago by Tim Graham <timograham@…>

In fe86a78:

[1.11.x] Fixed #27741 -- Isolated TestCollectionHashedFilesCache in a tmpdir.

Backport of 8e3f9d3ee20fdadef6f2a28c71fb7e142b415059 from master

Note: See TracTickets for help on using tickets.
Back to Top