You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I think there is an error in the gunique and also gegen xx =nunique functions. In a sample of 35 million observations it does not count the number of unique values correctly.
When I generate a variable x = _n , there should be 35 mil. unique observations, but it only count it as 25 million,
// code snippetgen x = _n
gunique x
Version info
OS: [e.g. Windows 10]
Version: [i.e. output of gtools]
The text was updated successfully, but these errors were encountered:
@m-elholm I think the more likely explanation is that you've run into the limits of 4-byte floats (see the generating IDs section here). This snippet shows that gunique is working correctly, and that x is indeed the problem, which has repeated values:
. clear
. set obs 35000000
Number of observations (_N) was 0, now 35,000,000.
. gen x = _n
. gen long y = _n
. gen double z = _n
. gunique x
N = 35,000,000; 25,527,216 unbalanced groups of sizes 1 to 5
. gunique y
N = 35,000,000; 35,000,000 balanced groups of size 1
. gunique z
N = 35,000,000; 35,000,000 balanced groups of size 1
. format %21.0fc x y z
. l in `=_N-4'/l
+--------------------------------------+| x y z ||--------------------------------------|
34999996. | 34,999,996 34,999,996 34,999,996 |
34999997. | 34,999,996 34,999,997 34,999,997 |
34999998. | 35,000,000 34,999,998 34,999,998 |
34999999. | 35,000,000 34,999,999 34,999,999 |
35000000. | 35,000,000 35,000,000 35,000,000 |+--------------------------------------+
One solution is to type such data `c(obs_t)', which contains the smallest data type that can store _n (and will change as the number of observations in your data changes).
Describe the bug
I think there is an error in the gunique and also gegen xx =nunique functions. In a sample of 35 million observations it does not count the number of unique values correctly.
When I generate a variable x = _n , there should be 35 mil. unique observations, but it only count it as 25 million,
Version info
gtools
]The text was updated successfully, but these errors were encountered: