genomic_position_from_gtf Attribute Error
Tester3454 opened this issue · comments
Report
0
I'm trying to use the scanpy plugin to infer copy number variation (CNV) from single-cell transcriptomics data.
When trying to annotate genomic positions in Jupyter using infercnvpy, I get the following error. I don't know if this is due to a bug in the code or if it's an error in my handling, but I am able to successfully complete a variety of other functions when using my dataset. I tried running this with the adata = cnv.datasets.oligodendroglioma() dataset, and ended up getting the same error.
_AttributeError Traceback (most recent call last) Cell In[28], line 2 1 gtf_file=r'G:\gencode.v38.annotation.gtf' 2 cnv.io.genomic_position_from_gtf(gtf_file, adata=adata, gtf_gene_id='gene_name', inplace=True)
File ~\Anaconda3\envs\spapros\lib\site-packages\infercnvpy\io_genepos.py:41, in genomic_position_from_gtf(gtf_file, adata, gtf_gene_id, adata_gene_id, inplace) 11 def genomic_position_from_gtf( 12 gtf_file: Union[Path, str], 13 adata: Union[AnnData, None] = None, (...) 17 inplace: bool = True, 18 ) -> Union[pd.DataFrame, None]: 19 """Get genomic gene positions from a GTF file. 20 21 The GTF file needs to match the genome annotation used for your single cell dataset. (...) 39 If True, add the annotations directly to adata, otherwise return a dataframe. 40 """ 41 gtf = gtfparse.read_gtf( 42 gtf_file, usecols=["seqname", "feature", "start", "end", "gene_id", "gene_name"], result_type="pandas" 43 ) 44 gtf = ( 45 gtf.loc[ 46 gtf["feature"] == "gene", (...) 50 .rename(columns={"seqname": "chromosome"}) 51 ) 53 gene_ids_adata = (adata.var_names if adata_gene_id is None else adata.var[adata_gene_id]).values
File ~\Anaconda3\envs\spapros\lib\site-packages\gtfparse\read_gtf.py:254, in read_gtf(filepath_or_buffer, expand_attribute_column, infer_biotype_column, column_converters, usecols, features, result_type) 251 raise ValueError("GTF file does not exist: %s" % filepath_or_buffer) 253 if expand_attribute_column: 254 result_df = parse_gtf_and_expand_attributes( 255 filepath_or_buffer, 256 restrict_attribute_columns=usecols, 257 features=features) 258 else: 259 result_df = parse_gtf(result_df, features=features)
File ~\Anaconda3\envs\spapros\lib\site-packages\gtfparse\read_gtf.py:189, in parse_gtf_and_expand_attributes(filepath_or_buffer, restrict_attribute_columns, features) 166 def parse_gtf_and_expand_attributes( 167 filepath_or_buffer, 168 restrict_attribute_columns=None, 169 features=None): 170 """ 171 Parse lines into column->values dictionary and then expand 172 the 'attribute' column into multiple columns. This expansion happens (...) 187 Ignore entries which don't correspond to one of the supplied features 188 """ 189 df = parse_gtf( 190 filepath_or_buffer=filepath_or_buffer, 191 features=features, 192 split_attributes=True) 193 if type(restrict_attribute_columns) is str: 194 restrict_attribute_columns = {restrict_attribute_columns}
File ~\Anaconda3\envs\spapros\lib\site-packages\gtfparse\read_gtf.py:155, in parse_gtf(filepath_or_buffer, split_attributes, features, fix_quotes_columns) 150 def parse_gtf( 151 filepath_or_buffer, 152 split_attributes=True, 153 features=None, 154 fix_quotes_columns=["attribute"]): 155 df_lazy = parse_with_polars_lazy( 156 filepath_or_buffer=filepath_or_buffer, 157 split_attributes=split_attributes, 158 features=features, 159 fix_quotes_columns=fix_quotes_columns) 160 return df_lazy.collect()
File ~\Anaconda3\envs\spapros\lib\site-packages\gtfparse\read_gtf.py:87, in parse_with_polars_lazy(filepath_or_buffer, split_attributes, features, fix_quotes_columns) 80 def parse_with_polars_lazy( 81 filepath_or_buffer, 82 split_attributes=True, (...) 85 # use a global string cache so that all strings get intern'd into 86 # a single numbering system 87 polars.toggle_string_cache(True) 88 kwargs = dict( 89 has_header=False, 90 sep="\t", (...) 103 "frame": polars.UInt32, 104 }) 105 try:
AttributeError: module 'polars' has no attribute 'toggle_string_cache'_
I've run the code below. I'm using the latest GENCODE GTF.
import infercnvpy as cnv
import scanpy as sc
import matplotlib.pyplot as plt
import pandas as pd
adata=sc.read(r'G:\data.h5ad') sc.pp.log1p(adata)
gtf_file=r'G:\gencode.v44.annotation.gtf'
cnv.io.genomic_position_from_gtf(gtf_file, adata=adata, gtf_gene_id='gene_name', inplace=True)`
I'm doing this so I can get an output along the following lines, and proceed to running inferCNV
adata.var.loc[:, ["ensg", "chromosome", "start", "end"]].head()
Version information
-----
anndata 0.8.0
infercnvpy 0.4.3.dev10+g4ff5f8f
matplotlib 3.6.3
pandas 1.5.3
polars 0.19.3
scanpy 1.9.1
session_info 1.0.0
-----
PIL 9.0.0
asttokens NA
backcall 0.2.0
backports NA
cairo 1.24.0
colorama 0.4.6
comm 0.1.2
cycler 0.10.0
cython_runtime NA
dateutil 2.8.2
debugpy 1.5.1
decorator 5.1.1
defusedxml 0.7.1
entrypoints 0.4
executing 0.8.3
gtfparse NA
h5py 3.8.0
igraph 0.9.11
importlib_metadata NA
ipykernel 6.19.2
ipywidgets 8.0.4
jedi 0.18.1
joblib 1.2.0
kiwisolver 1.4.4
leidenalg 0.8.10
llvmlite 0.39.1
mpl_toolkits NA
natsort 8.2.0
nt NA
ntsecuritycon NA
numba 0.56.4
numpy 1.23.5
packaging 22.0
parso 0.8.3
pickleshare 0.7.5
pkg_resources NA
platformdirs 2.5.2
prompt_toolkit 3.0.36
psutil 5.9.0
pure_eval 0.2.2
pydev_ipython NA
pydevconsole NA
pydevd 2.6.0
pydevd_concurrency_analyser NA
pydevd_file_utils NA
pydevd_plugins NA
pydevd_tracing NA
pygments 2.11.2
pyparsing 3.0.9
pyreadr 0.4.9
pythoncom NA
pytz 2022.7.1
pywintypes NA
ruamel NA
scipy 1.10.0
setuptools 65.6.3
six 1.16.0
sklearn 1.2.1
sphinxcontrib NA
stack_data 0.2.0
texttable 1.6.7
threadpoolctl 3.1.0
tornado 6.2
tqdm 4.64.1
traitlets 5.7.1
typing_extensions NA
wcwidth 0.2.5
win32api NA
win32com NA
win32security NA
yaml 5.4.1
zipp NA
zmq 23.2.0
-----
IPython 8.8.0
jupyter_client 7.4.8
jupyter_core 5.1.1
-----
Python 3.8.16 (default, Jan 17 2023, 22:25:28) [MSC v.1916 64 bit (AMD64)]
Windows-10-10.0.19044-SP0
-----
Session information updated at 2023-09-22 23:39
Hi,
this is a known incompatibility of the gtfparse
package with some versions of polars. See #86 for details.
I pinned gtfparse<2 in the latest release, this should fix the issue.
Ok, thanks! For the record, it worked when I used the solution per #86
- In the Anaconda PowerShell Prompt, in whichever environment used:
- pip uninstall polars
- pip install polars==0.16.13
- Restart the kernel in jupyter