Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error balance_DS function #1

Open
PrestonLeung opened this issue Oct 4, 2024 · 1 comment
Open

Error balance_DS function #1

PrestonLeung opened this issue Oct 4, 2024 · 1 comment

Comments

@PrestonLeung
Copy link

PrestonLeung commented Oct 4, 2024

Hi!

Dolphyn is awesome! I found the github page via the publication!

I just wanted to ask what Python Version do you use for running Dolphyn?

I'm running python 3.12 and I get an error because random package (I think past python 3.11?) do not take in sets when selecting random samples. (similar to something described here benedekrozemberczki/littleballoffur#27)

I was running through the jupyter tutorial to learn how to use Dolphyn, but when I get to:
ge = D.findEpitopes(testrun = 2, protein_seq_file = protein_seq_file, epitile_size=15, epitope_probability_cutoff = 0.5)

I get errors about sets not supported in random.sample()

I fiddled a bit with the function:

def balance_DS(X, y, random_state=42):
    wildtypeIDs = set([item[0] for item in X.index.str.split("_")])
    random.seed(random_state)
    size_smaller_group = y.value_counts().min()
    pos_IDs = set(y[y==1].index)
    # pos_IDs = list(set(y[y==1].index)) # I just made sets into lists
    
    neg_IDs = set(y[y==0].index)
    # neg_IDs = list(set(y[y==0].index)) # here

    neg_IDs= random.sample(neg_IDs, size_smaller_group)
    pos_IDs= random.sample(pos_IDs, size_smaller_group)

    balancedIDs = set(pos_IDs + neg_IDs)
    # balancedIDs = list(set(pos_IDs + neg_IDs)) # and here

    y_bal = y.loc[balancedIDs,]
    X_bal = X.loc[balancedIDs,]    
    return(X_bal, y_bal)

This seemed to fix my errors. Do you also see this?

Cheers and thanks!

@YihuiSun
Copy link

YihuiSun commented Nov 6, 2024

It is because of the version of pandas. The older version is working well. What I'm using is pandas_1.3.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants