rdkit / mmpdb

A package to identify matched molecular pairs and use them to predict property changes.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sqlite3.OperationalError: database or disk is full when indexing

chengthefang opened this issue · comments

Dear all,

I came across a SQLite3 error when indexing the fragments. See below:

WARNING: Neither ujson nor cjson installed. Falling back to Python's slower built-in json decoder. Building index ...
Failed to execute the following SQL: CREATE INDEX pair_rule_environment_id on pair (rule_environment_id);
Traceback (most recent call last): 
File "/mmpdb/mmpdb", line 11, in <module> commandline.main() 
File "/mmpdb/mmpdblib/commandline.py", line 1054, in main parsed_args.command(parsed_args.subparser, parsed_args) 
File "/mmpdb/mmpdblib/commandline.py", line 393, in index_command do_index.index_command(parser, args) 
File "/mmpdb/mmpdblib/do_index.py", line 205, in index_command pair_writer.end(reporter) 
File "mmpdb/mmpdblib/index_algorithm.py", line 1199, in end self.backend.end(reporter) 
File "/mmpdb/mmpdblib/index_writers.py", line 228, in end schema.create_index(self.conn) 
File "/mmpdb/mmpdblib/schema.py", line 133, in create_index _execute_sql(c, get_create_index_sql()) 
File "/mmpdb/mmpdblib/schema.py", line 119, in _execute_sql c.execute(statement) 

**sqlite3.OperationalError: database or disk is full**

But I checked my disk memory and confirmed there were plenty space available (1T). Any comments or suggestions on that? Will it help if I switch to APSW instead of SQlite3?

Thanks,
Cheng

Hi Cheng,

Christian tracked this down at #6 (comment) . The likely problem is SQLIte filled up your temp directory, not your main disk. Try setting SQLITE_TMPDIR.

Thanks Adalke. I will give a try soon, and update here about how things go.

Cheng

@adalke Hi Adalke. As Christian suggested in the old issue, I tried "export SQLITE_TMPDIR="new temp directory"", but I still got the same error above, i.e. "sqlite3.OperationalError: database or disk is full" error. In the output log, I found all steps seemed to be completed, which was indicated by "Loaded fragment record 15894329; Constant fragment matches 12250751/12250758 (100.0%); Writing rule statistics for property LogP: 2071429099/2071447397 (100.0%)". Any other suggestions I could try?

Another thing I found was that the new temp directory was empty when the job failed. I wonder if there is a way to check where the current SQLITE_TMPDIR is.

Thanks,
Cheng

I've been looking, but I don't see how to get that information from SQLite.

I found a comment at https://sqlite.org/forum/forumpost/1fe9cfb542 which says that SQLite will create/open the file and then unlink it. This is possible in Unix-type file systems. It means that the file will still be open and accessible by SQLite, and be stored on the file system, but not be present in the directory structure.

Once the file is closed (either when SQLite closes it or when the program crashes unexpectedly), that file will be removed from the file system automatically. This gives a form of automatic cleanup for temporary files.

You can tell if that's happening by checking the disk usage while mmpdb is running. For example, here's the disk usage of the file systems on my Mac:

% df -h
Filesystem       Size   Used  Avail Capacity iused      ifree %iused  Mounted on
/dev/disk1s5s1  466Gi   14Gi   32Gi    32%  559993 4881892887    0%   /
devfs           343Ki  343Ki    0Bi   100%    1186          0  100%   /dev
/dev/disk1s4    466Gi  5.0Gi   32Gi    14%       6 4882452874    0%   /System/Volumes/VM
/dev/disk1s2    466Gi  269Mi   32Gi     1%     781 4882452099    0%   /System/Volumes/Preboot
/dev/disk1s6    466Gi  216Ki   32Gi     1%      17 4882452863    0%   /System/Volumes/Update
  ...

and specifically here's what my /tmp uses:

% df -h /tmp
Filesystem     Size   Used  Avail Capacity iused      ifree %iused  Mounted on
/dev/disk1s1  466Gi  414Gi   32Gi    93% 4928564 4877524316    0%   /System/Volumes/Data

I used the -h option to give the "human-readable" output,

Hi Adalke, Thank you so much for looking into this issue. I will check that when the mmpdb is running. I will close this ticket.

Best,
Cheng

Since you closed the ticket, I assume it's working for you. Was your disk filling up? Could you comment here what you learned so people in the future can find out?

Hi Adalke, I haven't tried it out yet. The indexing will take a few days to complete. Let me reopen it, and update here how it works.
Thank you for the follow up.

Hi all,

Just the update. The change of the SQLite temp folder worked eventually, but I have to modify in the "~/.bashrc" file as follows:
Note that I changed both TMPDIR and SQLITE_TMPDIR

export TMPDIR=/home/mmpdb/temp
export SQLITE_TMPDIR=/home/mmpdb/temp

Also, thanks to Adalke's suggestion, during the running of MMPDB, I checked the usage of space for the temporary folder I specified. The available space did decrease over time, which might imply the mmpdb is writing into that temp folder.

Thanks,
Cheng