Wrong number of groups #90

m-elholm · 2023-04-27T11:54:49Z

Describe the bug
I think there is an error in the gunique and also gegen xx =nunique functions. In a sample of 35 million observations it does not count the number of unique values correctly.
When I generate a variable x = _n , there should be 35 mil. unique observations, but it only count it as 25 million,

// code snippet
gen x = _n 
gunique x

Version info

OS: [e.g. Windows 10]
Version: [i.e. output of gtools]

The text was updated successfully, but these errors were encountered:

mcaceresb · 2023-04-27T13:02:43Z

@m-elholm I think the more likely explanation is that you've run into the limits of 4-byte floats (see the generating IDs section here). This snippet shows that gunique is working correctly, and that x is indeed the problem, which has repeated values:

. clear

. set obs 35000000
Number of observations (_N) was 0, now 35,000,000.

. gen x = _n

. gen long y = _n

. gen double z = _n

. gunique x
N = 35,000,000; 25,527,216 unbalanced groups of sizes 1 to 5

. gunique y
N = 35,000,000; 35,000,000 balanced groups of size 1

. gunique z
N = 35,000,000; 35,000,000 balanced groups of size 1


. format %21.0fc x y z

. l in `=_N-4'/l

          +--------------------------------------+
          |          x            y            z |
          |--------------------------------------|
34999996. | 34,999,996   34,999,996   34,999,996 |
34999997. | 34,999,996   34,999,997   34,999,997 |
34999998. | 35,000,000   34,999,998   34,999,998 |
34999999. | 35,000,000   34,999,999   34,999,999 |
35000000. | 35,000,000   35,000,000   35,000,000 |
          +--------------------------------------+

One solution is to type such data `c(obs_t)', which contains the smallest data type that can store _n (and will change as the number of observations in your data changes).

mcaceresb added invalid question and removed invalid labels Jun 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong number of groups #90

Wrong number of groups #90

m-elholm commented Apr 27, 2023

mcaceresb commented Apr 27, 2023

Wrong number of groups #90

Wrong number of groups #90

Comments

m-elholm commented Apr 27, 2023

mcaceresb commented Apr 27, 2023