Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#34325 closed Cleanup/optimization (fixed)

Clarify PercentRank() description.

Reported by: dennisvang Owned by: dennisvang
Component: Documentation Version: 4.1
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by dennisvang)

The documentation for the PercentRank window function says:

Computes the percentile rank of the rows in the frame clause. This computation is equivalent to evaluating:

(rank - 1) / (total rows - 1)

(my emphasis)

However, I'm not so sure "percentile rank" is the correct term.

If you look up the (statistical) term "percentile rank" online, you'll find various definitions, ranging from

(CF - 0.5 * F) / N

where CF—the cumulative frequency—is the count of all scores less than or equal to the score of interest, F is the frequency for the score of interest, and N is the number of scores in the distribution.

to something like

<number of values less than the score of interest> / <total number of values in the data set>

(equivalent to (CF - F) / N)

Both definitions are also used e.g. by scipy.

The latter definition is similar to that in the Django docs, but still subtly different in the denominator.

Note also that the documentation for the percent_rank function in the SQLite and PostgreSQL database backends does not mention "percentile rank" at all. Instead, they use the term "relative rank."

To prevent confusion, wouldn't it be better to use the same terminology as the database backends?

Change History (9)

comment:1 by dennisvang, 2 years ago

Description: modified (diff)

comment:2 by dennisvang, 2 years ago

Description: modified (diff)

comment:3 by dennisvang, 2 years ago

Description: modified (diff)

comment:4 by dennisvang, 2 years ago

Description: modified (diff)

comment:5 by Mariusz Felisiak, 2 years ago

Summary: PercentRank confusionClarify PercentRank() description.
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

Agreed, "relative rank" is less confusing. Would you like to prepare a patch?

in reply to:  5 comment:6 by dennisvang, 2 years ago

Replying to Mariusz Felisiak:

Agreed, "relative rank" is less confusing. Would you like to prepare a patch?

Certainly. Please have a look at https://github.com/django/django/pull/16539

I also replaced "Percent Rank" in the corresponding *table* by "Relative Rank," but I'm not sure if that's necessary. An alternative would be to use "PercentRank," without the space, to match the name of the function.

comment:7 by Mariusz Felisiak, 2 years ago

Has patch: set
Owner: changed from nobody to dennisvang
Status: newassigned
Triage Stage: AcceptedReady for checkin

comment:8 by Mariusz Felisiak <felisiak.mariusz@…>, 2 years ago

Resolution: fixed
Status: assignedclosed

In 7bb741d7:

Fixed #34325 -- Corrected wording in PercentRank() docs.

This is consistent with the terminology used for the percent_rank()
function in SQLite docs and PostgreSQL docs.

comment:9 by Mariusz Felisiak <felisiak.mariusz@…>, 2 years ago

In 4a89aa25:

[4.2.x] Fixed #34325 -- Corrected wording in PercentRank() docs.

This is consistent with the terminology used for the percent_rank()
function in SQLite docs and PostgreSQL docs.

Backport of 7bb741d787ba360a9f0d490db92e22e0d28204ed from main

Note: See TracTickets for help on using tickets.
Back to Top