DQUEEN v 0.5 (Data QUality assEssmENt and managing tool)

The goal of the DQUEEN project is to design and develop an open-source tool to expose and evaluate OMOP-CDM and meta data. Also,DQUEEN help to easily understand about Data Quaity information. (This version only support DQA of OMOP-CDM)

Introduction

This package will run a series of data quality checks against an OMOP CDM instance (currently supports v5.3.1). DQUEEN support of 7 Data Quality concept evaluation and providing 3 level of Data Quality Assessment Aim of initial DQA level like below DQA level 1.

level 1 aims to find out whether the subj ect data for assessment corresponds with previous definition and whet her there is duplication between primary key and entire row.
DQA Concept: OMOP-CDM conformance, Uniqueness check

DQA level 2.

Level 2 evaluates the ratio of missing values to foreign keys in the data and t he suitability of the data.
DQA Concept: Completeness, Conformance-relation, Conformance-Value

DQA level 3.

Level 3 evaluates the distribution of data in order of date and logical errors. It also performs a statistical analysis to identify the error data.
DQA concept: Accuracy, Plausibility-Atemporal, Plausiblity-Temporal

If you want more specific information details in here

Overview in DQA process of DQUEEN

DQUEEN's system process like below

User entered the input parameter of Data Quality Assessment
DQUEEN check the connection information and create result tables
Data quality assessment is performed according to the DQA level entered by the user.
if DQA level 3 then calculated initial Table Data Quality Score with DQ error
If DQA level 3 then make the Shiny Data
run shiny App

Data Quality Assessment Target Tables

Person
Death
Provider
Care_site
Visit_occurrence
Condition_occurrence
Drug_exposure
Device_exposure
Procedrue_occurrence
Measurement

How to use?

Please make a your own fork of DQUEEN repo

Excution R studio and open the terminal and input move the directory
enter the git clone -> input: git clone https://github.com/ABMI/DQUEEN_OMOP_CDM_Version.git (this web address is example)
end of the git clone then move directory of DQUEEN_OMOP_CDM_Version
open the DQUEEN.Rproj and please build a DQUEEN pacakge (DQUEEN install R packages) -> Click the Build tab and after click Install and Restart
Open the CodeToRun.R (directory like below) -> DQUEEN_OMOP_CDM_Version/extras/CodeToRun.R
Enter the DQA Parameter

library(SqlRender)
library(shiny)
library(shinyjs)
library(highcharter)
library(treemap)
library(DT)
library(xts)
library(dplyr)
library(dygraphs)
library(lubridate)
library(plotly)
library(ECharts2Shiny)
library(shinythemes)
library(visNetwork)
library(dplyr)
library(reshape2)
library(dplyr)
library(shinyBS)
library(knitr)
library(ggplot2)
library(ggiraph)
library(reshape)
library(ParallelLogger)

ConnectionDetails <- DatabaseConnector::createConnectionDetails(dbms = "sql server",
                                                              server = "",  #IP
                                                              schema = "master.dbo" ,
                                                              user = "",        #User id
                                                              password = "")

cdmSchema <- 'cdmSchema.dbo' #Target CDM schema name 
metaSchema <- 'metaSchema.dbo' # If you have Meta schema put in your meta schema name 
resultSchema<- 'resultSchema.dbo' # Dqueen result schema name 
level = 3 # DQA level 1 = 1,  DQA level 2 = 2,  DQA level 3 = 3
useRandomExtraction = T # if you want random sampling from CDM shcmea then please put in T or F 
extractioncdmSchema = 'extractioncdmSchema.dbo' # random sampling CDM schema name 
randParameter = 10000 # random smapling person count
etl_stdt = '1995-01-01' #  minimum start date of your CDM
etl_endt = '2015-12-31' #  maximum start date of your CDM
createddl = T #create DDL of DQUEEN result table 
cdmAnalysis = T # run of DQA 
makeShinyData = T # create shiny Data
useVisnetwork=T # if you have csv file of visnetwork you can see ETL flow 
runShiny =T # T: run shiny, F: do not nun Shiny 
visnetworkCsvPath =c(file.path(.libPaths()[1],'DQUEEN','csv','schemas','filename_of_meta.csv'),  # visinetwork file path meta
                   file.path(.libPaths()[1],'DQUEEN','csv','schemas','filename_of_CDM.csv')) # visinetwork file path CDM 
outputFolder = getwd()
verboseMode = T

DQUEEN::dqueen(ConnectionDetails,
             level,
             etl_stdt,
             etl_endt,
             cdmSchema,
             metaSchema,
             resultSchema,
             useRandomExtraction,
             extractioncdmSchema,
             randParameter,
             createddl,
             cdmAnalysis,
             makeShinyData,
             useVisnetwork,
             visnetworkCsvPath,
             runShiny,

)

If you want more information here is details.

System Requirements

MSSQL
R (version 3.5.2 or higher)
Install Java SDK (developed in 8 Update 181)
Install R packages "devtools", "DQUEEN"
Install Rtools and set Rtools PATH

More specific R library below here

Feature

Technology

DQUEEN is an Rpackages

Suupport

We use the GitHub issue tracker for all bugs/issues
Developer questions/comments/feedback: OHDSI Forum or Korean OHDSI Forum

License

DQUEEN is licensed under Apache License 2.0

Development status

v 0.5 ready for use

LanceByun / DQUEEN_OMOP_CDM_Version