icbi-lab / infercnvpy

Infer copy number variation (CNV) from scRNA-seq data. Plays nicely with Scanpy.

Home Page:https://infercnvpy.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

genomic_position_from_gtf Attribute Error

Tester3454 opened this issue · comments

Report

0

I'm trying to use the scanpy plugin to infer copy number variation (CNV) from single-cell transcriptomics data.

When trying to annotate genomic positions in Jupyter using infercnvpy, I get the following error. I don't know if this is due to a bug in the code or if it's an error in my handling, but I am able to successfully complete a variety of other functions when using my dataset. I tried running this with the adata = cnv.datasets.oligodendroglioma() dataset, and ended up getting the same error.

_AttributeError Traceback (most recent call last) Cell In[28], line 2 1 gtf_file=r'G:\gencode.v38.annotation.gtf' 2 cnv.io.genomic_position_from_gtf(gtf_file, adata=adata, gtf_gene_id='gene_name', inplace=True)

File ~\Anaconda3\envs\spapros\lib\site-packages\infercnvpy\io_genepos.py:41, in genomic_position_from_gtf(gtf_file, adata, gtf_gene_id, adata_gene_id, inplace) 11 def genomic_position_from_gtf( 12 gtf_file: Union[Path, str], 13 adata: Union[AnnData, None] = None, (...) 17 inplace: bool = True, 18 ) -> Union[pd.DataFrame, None]: 19 """Get genomic gene positions from a GTF file. 20 21 The GTF file needs to match the genome annotation used for your single cell dataset. (...) 39 If True, add the annotations directly to adata, otherwise return a dataframe. 40 """ 41 gtf = gtfparse.read_gtf( 42 gtf_file, usecols=["seqname", "feature", "start", "end", "gene_id", "gene_name"], result_type="pandas" 43 ) 44 gtf = ( 45 gtf.loc[ 46 gtf["feature"] == "gene", (...) 50 .rename(columns={"seqname": "chromosome"}) 51 ) 53 gene_ids_adata = (adata.var_names if adata_gene_id is None else adata.var[adata_gene_id]).values

File ~\Anaconda3\envs\spapros\lib\site-packages\gtfparse\read_gtf.py:254, in read_gtf(filepath_or_buffer, expand_attribute_column, infer_biotype_column, column_converters, usecols, features, result_type) 251 raise ValueError("GTF file does not exist: %s" % filepath_or_buffer) 253 if expand_attribute_column: 254 result_df = parse_gtf_and_expand_attributes( 255 filepath_or_buffer, 256 restrict_attribute_columns=usecols, 257 features=features) 258 else: 259 result_df = parse_gtf(result_df, features=features)

File ~\Anaconda3\envs\spapros\lib\site-packages\gtfparse\read_gtf.py:189, in parse_gtf_and_expand_attributes(filepath_or_buffer, restrict_attribute_columns, features) 166 def parse_gtf_and_expand_attributes( 167 filepath_or_buffer, 168 restrict_attribute_columns=None, 169 features=None): 170 """ 171 Parse lines into column->values dictionary and then expand 172 the 'attribute' column into multiple columns. This expansion happens (...) 187 Ignore entries which don't correspond to one of the supplied features 188 """ 189 df = parse_gtf( 190 filepath_or_buffer=filepath_or_buffer, 191 features=features, 192 split_attributes=True) 193 if type(restrict_attribute_columns) is str: 194 restrict_attribute_columns = {restrict_attribute_columns}

File ~\Anaconda3\envs\spapros\lib\site-packages\gtfparse\read_gtf.py:155, in parse_gtf(filepath_or_buffer, split_attributes, features, fix_quotes_columns) 150 def parse_gtf( 151 filepath_or_buffer, 152 split_attributes=True, 153 features=None, 154 fix_quotes_columns=["attribute"]): 155 df_lazy = parse_with_polars_lazy( 156 filepath_or_buffer=filepath_or_buffer, 157 split_attributes=split_attributes, 158 features=features, 159 fix_quotes_columns=fix_quotes_columns) 160 return df_lazy.collect()

File ~\Anaconda3\envs\spapros\lib\site-packages\gtfparse\read_gtf.py:87, in parse_with_polars_lazy(filepath_or_buffer, split_attributes, features, fix_quotes_columns) 80 def parse_with_polars_lazy( 81 filepath_or_buffer, 82 split_attributes=True, (...) 85 # use a global string cache so that all strings get intern'd into 86 # a single numbering system 87 polars.toggle_string_cache(True) 88 kwargs = dict( 89 has_header=False, 90 sep="\t", (...) 103 "frame": polars.UInt32, 104 }) 105 try:

AttributeError: module 'polars' has no attribute 'toggle_string_cache'_

I've run the code below. I'm using the latest GENCODE GTF.

import infercnvpy as cnv 
import scanpy as sc 
import matplotlib.pyplot as plt 
import pandas as pd

adata=sc.read(r'G:\data.h5ad') sc.pp.log1p(adata)

gtf_file=r'G:\gencode.v44.annotation.gtf' 
cnv.io.genomic_position_from_gtf(gtf_file, adata=adata, gtf_gene_id='gene_name', inplace=True)`

I'm doing this so I can get an output along the following lines, and proceed to running inferCNV

adata.var.loc[:, ["ensg", "chromosome", "start", "end"]].head()

Version information

-----
anndata             0.8.0
infercnvpy          0.4.3.dev10+g4ff5f8f
matplotlib          3.6.3
pandas              1.5.3
polars              0.19.3
scanpy              1.9.1
session_info        1.0.0
-----
PIL                         9.0.0
asttokens                   NA
backcall                    0.2.0
backports                   NA
cairo                       1.24.0
colorama                    0.4.6
comm                        0.1.2
cycler                      0.10.0
cython_runtime              NA
dateutil                    2.8.2
debugpy                     1.5.1
decorator                   5.1.1
defusedxml                  0.7.1
entrypoints                 0.4
executing                   0.8.3
gtfparse                    NA
h5py                        3.8.0
igraph                      0.9.11
importlib_metadata          NA
ipykernel                   6.19.2
ipywidgets                  8.0.4
jedi                        0.18.1
joblib                      1.2.0
kiwisolver                  1.4.4
leidenalg                   0.8.10
llvmlite                    0.39.1
mpl_toolkits                NA
natsort                     8.2.0
nt                          NA
ntsecuritycon               NA
numba                       0.56.4
numpy                       1.23.5
packaging                   22.0
parso                       0.8.3
pickleshare                 0.7.5
pkg_resources               NA
platformdirs                2.5.2
prompt_toolkit              3.0.36
psutil                      5.9.0
pure_eval                   0.2.2
pydev_ipython               NA
pydevconsole                NA
pydevd                      2.6.0
pydevd_concurrency_analyser NA
pydevd_file_utils           NA
pydevd_plugins              NA
pydevd_tracing              NA
pygments                    2.11.2
pyparsing                   3.0.9
pyreadr                     0.4.9
pythoncom                   NA
pytz                        2022.7.1
pywintypes                  NA
ruamel                      NA
scipy                       1.10.0
setuptools                  65.6.3
six                         1.16.0
sklearn                     1.2.1
sphinxcontrib               NA
stack_data                  0.2.0
texttable                   1.6.7
threadpoolctl               3.1.0
tornado                     6.2
tqdm                        4.64.1
traitlets                   5.7.1
typing_extensions           NA
wcwidth                     0.2.5
win32api                    NA
win32com                    NA
win32security               NA
yaml                        5.4.1
zipp                        NA
zmq                         23.2.0
-----
IPython             8.8.0
jupyter_client      7.4.8
jupyter_core        5.1.1
-----
Python 3.8.16 (default, Jan 17 2023, 22:25:28) [MSC v.1916 64 bit (AMD64)]
Windows-10-10.0.19044-SP0
-----
Session information updated at 2023-09-22 23:39

Hi,

this is a known incompatibility of the gtfparse package with some versions of polars. See #86 for details.

I pinned gtfparse<2 in the latest release, this should fix the issue.

Ok, thanks! For the record, it worked when I used the solution per #86

  • In the Anaconda PowerShell Prompt, in whichever environment used:
  • pip uninstall polars
  • pip install polars==0.16.13
  • Restart the kernel in jupyter