Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: "None of [Int64Index ... in wgdi -c #43

Open
amvarani opened this issue Nov 17, 2023 · 7 comments
Open

KeyError: "None of [Int64Index ... in wgdi -c #43

amvarani opened this issue Nov 17, 2023 · 7 comments

Comments

@amvarani
Copy link

Hi there,
I'm facing the error below when running the "wgdi -c " command
I'm using Ptrichocarpa from Phytozome

blockinfo = Ptrichocarpa_Ptrichocarpa.blockinfo.csv
lens1 = Ptrichocarpa.lens
lens2 = Ptrichocarpa.lens
tandem = false
tandem_length = 200
pvalue = 0.2
block_length = 5
tandem_ratio = 0.5
multiple = 1
homo = -1,1
savefile = Ptrichocarpa_Ptrichocarpa.blockinfo.new.csv
Traceback (most recent call last):
File "/home/amvarani/.local/bin/wgdi", line 8, in
sys.exit(main())
File "/home/amvarani/.local/lib/python3.10/site-packages/wgdi/run.py", line 163, in main
module_to_run(arg, value)
File "/home/amvarani/.local/lib/python3.10/site-packages/wgdi/run.py", line 122, in module_to_run
run_subprogram(program, conf, name)
File "/home/amvarani/.local/lib/python3.10/site-packages/wgdi/run.py", line 87, in run_subprogram
r.run()
File "/home/amvarani/.local/lib/python3.10/site-packages/wgdi/block_correspondence.py", line 47, in run
arr = self.collinearity_region(cor, bkinfo, lens1)
File "/home/amvarani/.local/lib/python3.10/site-packages/wgdi/block_correspondence.py", line 70, in collinearity_region
df1[[int(k) for k in b1]] += 1
File "/home/amvarani/.local/lib/python3.10/site-packages/pandas/core/series.py", line 1007, in getitem
return self._get_with(key)
File "/home/amvarani/.local/lib/python3.10/site-packages/pandas/core/series.py", line 1042, in _get_with
return self.loc[key]
File "/home/amvarani/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 1073, in getitem
return self._getitem_axis(maybe_callable, axis=axis)
File "/home/amvarani/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 1301, in _getitem_axis
return self._getitem_iterable(key, axis=axis)
File "/home/amvarani/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 1239, in _getitem_iterable
keyarr, indexer = self._get_listlike_indexer(key, axis)
File "/home/amvarani/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 1432, in _get_listlike_indexer
keyarr, indexer = ax._get_indexer_strict(key, axis_name)
File "/home/amvarani/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6113, in _get_indexer_strict
self._raise_if_missing(keyarr, indexer, axis_name)
File "/home/amvarani/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6173, in _raise_if_missing
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Int64Index([8462, 8465, 8469, 8477, 8481, 8484, 8502, 8503, 8508, 8517, 8520,\n 8534, 8537, 8541, 8545, 8551, 8554, 8558, 8572, 8585, 8591],\n dtype='int64')] are in the [index]"

@SunPengChuan
Copy link
Owner

The problem with your gff and lens file processing is that the location of a certain gene is likely to exceed the scope of the chromosome.

@SunPengChuan
Copy link
Owner

You can upload your dataset and I can check it for you.

@amvarani
Copy link
Author

@SunPengChuan
Copy link
Owner

In a GFF file, the fourth column, which represents the start of each chromosome, always begins with the number 1.
There are numerous genes with alternative splicing in your GFF file that need to be eliminated.

@amvarani
Copy link
Author

Hi @SunPengChuan
Thanks!
I've identified the issue in my GFF file.
For those interested in converting Phytozome GFF3 files, here is a simple AWK script that can accomplish this:

zcat $genome.gff3.gz | awk '{if ($3 == "gene" ) print $1,$4,$5,$7,$9}' | cut -f 1 -d";" | sort -V | sed 's#.#_#g' | sed 's#v4_1##g' | awk '{split($5, a, "[=.]"); if (last != $1) {counter = 1; last = $1} else {counter++} print $1 " " a[2] " " $2 " " $3 " " $4 " " counter " ID" a[2]}'

@kashiff007
Copy link

Hi @SunPengChuan

I am getting the similar error while running BlockInfo.
Error

(base) [nawazk@login509-02-l W6-48549-006]$ wgdi -bi bi_total.conf
blast  =  SG_A_vs_SB_B.blast
gff1  =  SG_A.gff
gff2  =  SG_B.gff
lens1  =  SG_A.lens
lens2  =  SG_B.lens
collinearity  =  SG_A_vs_SB_B.list
score  =  100
evalue  =  1e-5
repeat_number  =  20
position  =  order
ks  =  ks file
ks_col  =  ks_NG86
savefile  =  block information (*.csv)
/home/nawazk/.conda/envs/mamba/lib/python3.10/site-packages/wgdi/block_info.py:74: FutureWarning: In a future version of pandas, a length 1 tuple will be returned when iterating over a groupby with a grouper equal to a list of length 1. Don't supply a list with a single grouper to avoid this warning.
  index = [group.sort_values(by=11, ascending=False)[:repeat_number].index.tolist()
Traceback (most recent call last):
  File "/home/nawazk/.conda/envs/mamba/bin/wgdi", line 10, in <module>
    sys.exit(main())
  File "/home/nawazk/.conda/envs/mamba/lib/python3.10/site-packages/wgdi/run.py", line 163, in main
    module_to_run(arg, value)
  File "/home/nawazk/.conda/envs/mamba/lib/python3.10/site-packages/wgdi/run.py", line 122, in module_to_run
    run_subprogram(program, conf, name)
  File "/home/nawazk/.conda/envs/mamba/lib/python3.10/site-packages/wgdi/run.py", line 87, in run_subprogram
    r.run()
  File "/home/nawazk/.conda/envs/mamba/lib/python3.10/site-packages/wgdi/block_info.py", line 121, in run
    collinearity = self.auto_file(gff1, gff2)
  File "/home/nawazk/.conda/envs/mamba/lib/python3.10/site-packages/wgdi/block_info.py", line 164, in auto_file
    return collinearity
UnboundLocalError: local variable 'collinearity' referenced before assignment

Can you check the files attached?
SG_A_vs_SB_B.blast.txt
SG_A_vs_SB_B.collinearit_pair.txt
SG_A.gff.txt
SG_A.lens.txt
SG_B.gff.txt
SG_B.lens.txt

@SunPengChuan
Copy link
Owner

collinearity = SG_A_vs_SB_B.list
This file is the result of the -c Subprogram of WGDI, and it could also be the output of MCScanX or JCVI. It’s not a gene pair.

savefile = block information (*.csv)
The savefile has not been modified; it is an output file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants