前言

一些关于数据库优化器的论文/资料, 资源来自网上, 纯属学习, 禁止商业用途, 如有侵权, 请第一时间联系删除.

Some papers on the database optimizer, resources are all from the Internet, purely for learning, commercial use is prohibited. If there is any infringement, please contact and delete it as soon as possible.

CTE

Efficient exploitation of similar subexpressions for query processing

Exploiting Common Subexpressions for Cloud Query Processing

Optimization of Common Table Expressions in MPP Database Systems

ML

Join Query Optimization with Deep Reinforcement Learning Algorithms

TUM query optimization

Query Optimization 01

Query Optimization 02

Query Optimization 03

Query Optimization 04

Query Optimization 05

adaptive query processing

Adaptive Ordering of Pipelined Stream Filters

Adaptive Query Processing in the Looking Glass

Adaptive Query Processing

Adaptive selectivity estimation using query feedback

Adaptively reordering joins during query execution

An adaptive query execution system for data integration

Continuous Cloud-Scale Query Optimization and Processing

Eddies-Continuously Adaptive Query Processing

Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans

Efficient Query Processing for Data Integration

Looking Ahead Makes Query Plans Robust

Partial Results for Online Query Processing

Plan Stitch Harnessing the Best of Many Plans

Re-optimizing Data-Parallel Computing

Robust Query Processing through Progressive Optimization

Run-Time Adaptation in River

SkinnerDB Regret-Bounded Query Evaluation via Reinforcement Learning

Using State Modules for Adaptive Query Processing

cardinality estimation

A Bayesian Approach to Estimating the Selectivity of Conjunctive Predicates

A Streaming Parallel Decision Tree Algorithm

Accurate estimation of the number of tuples satisfying a condition

Adaptive Statistics in Oracle 12c

An Approach Based on Bayesian Networks for Query Selectivity Estimation

An Improved Data Stream Summary The Count-Min Sketch and its Applications

Automated Statistics Collection in DB2 UDB

Best Practices for Gathering Optimizer Statistics with Oracle Database

CORDS Automatic Discovery of Correlations and Soft Functional Dependencies

Candinality Estimation of distributioned join queries

Cardinality Estimation Done Right

Cardinality Estimation Using Sample Views with Quality Assurance

Cardinality Estimation- An Experimental Survey

Consistently Estimating the Selectivity of Conjuncts of Predicates

Constructing Join Histograms from Histograms with q-error Guarantees

Data Sketching 1

Data Sketching 2

Detecting attribute dependencies from query feedback

End-biased Samples for Join Cardinality Estimation

Estimating Aggregations over Joins

Estimating Frequencies and Finding Heavy Hitters

Estimating the selectivity of LIKE queries using pattern-based

Every Row Counts Combining Sketches and Sampling for Accurate Group-By Result Estimates

Exploiting Self-Monitoring Sample Views for Cardinality Estimation

Histograms Reloaded The Merits of Bucket Diversity

How Good Are Query Optimizers, Really

Improved Histograms for Selectivity Estimation of Range Predicates

Integrating Query-Feedback Based Statistics into Informix Dynamic Server

Lightweight Graphical Models for Selectivity Estimation Without Independence Assumptions

New Estimation Algorithms for Streaming Data Count-min Can Do More

On the Estimation of Join Result Sizes

Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator

Pessimistic Cardinality Estimation Tighter Upper Bounds for Intermediate Join Cardinalities

Preventing Bad Plans by Bounding the Impact of Cardinality Estimation Errors

Processing Complex Aggregate Queries over Data Streams

Quantifying Uncertainty in Multi-Dimensional Cardinality Estimations

Random Sampling and Size Estimation over Cyclic 2 Joins

Random Sampling over Joins Revisited

Sampling-Based Cardinality Estimation Algorithms- A Survey and An Empirical Evaluation

Sampling-Based Query Re-Optimization

Size Estimation for Query Results Using Histograms

Sketch Techniques for Approximate Query Processing

Sketches for Size of Join Estimation

Statistical Profile Estimation in Database Systems

StatisticsAndNewCE

Synopses for Massive Data Samples, Histograms, Wavelets, Sketches

The History of Histograms

Tighter Upper Bounds for Join Cardinality Estimates

Towards Optimal Cardinality Estimation of Unions and Intersections with Sketches

Towards a Robust Query Optimizer- A Principled and Practical Approach

Two-Level Sampling for Join Size Estimation

Two-Level Sampling

Understanding Optimizer Statistics with Oracle Database

Wander Join Online Aggregation via Random Walks

cost model

An End-to-End Learning-based Cost Estimator

Cost Models for Big Data Query Processing Learning, Retrofitting, and Our Findings

Multi-Objective Parametric Query Optimization

data profiling

Data Profiling 2017

Data Profiling Revisited

Data Profiling in SQL Server

Profiling Relational Data – A Survey

materialized views

Automated Generation of Materialized Views in Oracle

Automated Selection of Materialized Views and Indexes for SQL Databases

Computation Reuse in Analytics Job Service at Microsoft

Materialized Views

Optimizing Queries Using Materialized Views A Practical, Scalable Solution

Selecting Subexpressions to Materialize at Datacenter Scale

View Matching for Outer-Join Views

A survey of view selection methods.pdf

optimizer

An Overview of Query optimization in Relation Systems

Apache Calcite A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources

Cost-Based Oracle Fundamentals zh

Cost-Based Oracle Fundamentals

Cost-based query transformation in oracle

Distributed Heterogeneous Query Processing in Microsoft SQL Server

Inside The SQL Server Query Optimizer

Is Query Optimization a “Solved” Problem

LEO – DB2’s LEarning Optimizer

Neo- A Learned Query Optimizer

Optimizer with Oracle Database

Oracle® Database SQL Tuning Guide 20c

Orca A Modular Query Optimizer Architecture for Big Data

PostgreSQL技术内幕：查询优化深度探索

Query Optimization in Microsoft SQL Server PDW

Query Optimizers Time to Rethink the Contract

SQL-Server-Query-Optimization.ppt

Spark_CBO_Design_Spec

The Internals of GPORCA Optimizer

The MemSQL Query Optimizer

The Snowflake Elastic Data Warehouse

基于Oracle的SQL优化

数据库查询优化器的艺术

property enforcement

A Combined Framework for Grouping and Order Optimization

Advanced Partitioning Techniques for Massively Distributed Computation

An Efficient Framework for Order Optimization

Automated Partitioning Design in Parallel Database Systems

Automatic Data Placement in MPP Databases

Efficient Discovery of Dependencies

Exploiting Functional Dependence in Query Optimization

Incorporating Partitioning and Parallel Plans into the SCOPE Optimizer

Optimizing Queries over Partitioned Tables in MPP Systems

search framework

Access Path Selection in a Relational Database Management System

Efficiency in the columbia database query optimizer

The Cascades Framework for Query Optimization

The EXODUS Optimizer Generator

The Volcano Optimizer Generator Extensibility and Efficient Search

search space

A New Heuristic for Optimizing Large Queries

A New, Highly Efficient, and Easy To Implement Top-Down Join Enumeration Algorithm

Adaptive Optimization of Very Large Join Queries

Algorithms for Efficient Top-Down Join Enumeration

An Overview of Cost-based Optimization of Queries with Aggregates

Analysis of Two Existing and One New Dynamic Programming Algorithm for the Generation of Optimal Bushy Join Trees without Cross Products

Cost-Based Optimization for Magic Algebra and Implementation

Counter Strike Generic Top-Down Join Enumeration for Hypergraphs

Dynamic Programming Strikes Back

Eager Aggregation and Lazy Aggregation

Effective and Robust Pruning for Top-Down Join Enumeration Algorithms

Exploiting Upper and Lower Bounds in Top-Down Query Optimization

Hypergraphs in the Service of Very Large Scale Query Optimization

Improving Join Reorderability with Compensation Operators

Including Group-By in Query Optimization

Interchanging the Order of Grouping and Join

Join Order Selection — Good Enough is Easy

Measuring the Complexity of Join Enumeration in Query Optimization

On the Correct and Complete Enumeration of the Core Search Space

Optimal Top-Down Join Enumeration (extended version)

Optimizing Join Enumeration in Transformation-based Query Optimizers

Optimizing Large Star-Schema Queries with Snowflakes via Heuristic-Based Query Rewriting

Outerjoin simplification and for Query Optimization

Parallelizing Extensible Query Optimizers(1)

Parallelizing Extensible Query Optimizers

Parallelizing Query Optimization on Shared-Nothing Architectures

Parallelizing_query_optimization

Partial Join Order Optimization in the ParAccel Analytic Database

Performing group by before join

Predicate Migration Optimizing Queries with Expensive Predicates

Projection Pushing Revisited

Query Graphs, Implementing Trees, and Freely Reorderable Outerjoins

Query Simplification Graceful Degradation for Join-Order Optimization

The Complete Story of Joins (in HyPer)

The Complexity of Transformation Based Join

Top Down Plan Generation From Theory to Practice

subquery

Enhanced Subquery Optimizations in Oracle

Orthogonal Optimization of Subqueries and Aggregation

Parameterized Queries and Nesting Equivalencies

Unnesting Arbitrary Queries

test

A Framework for Testing Query Transformation Rules

Counting Enumerating and Sampling of Execution Plans in a Cost-Based Query Optimizer

OptMark A Toolkit for Benchmarking Query Optimizers

Testing Cardinality Estimation Models in SQL Server

Testing SQL Server’s Query Optimizer- Challenges, Techniques and Experiences

Testing the Accuracy of Query Optimizers

tools

Automatic Capture of Minimal, Portable, and Executable Bug Repros Using AMPERe

Reversing Statistics for Scalable Test Databases Generation

Total Operator State Recall — Cost-effective Reuse of Results in Greenplum Database

execution engine

Adaptive Execution of Compiled Queries

Adaptive NUMA-aware data placement and task scheduling for analytical workloads in main-memory column-stores

Approximate Distributed Joins in Apache Spark

Architecture-of-a-Database-System

Balancing Vectorized Query Execution with Bandwidth-Optimized Storage

Breaking the memory wall in MonetDB

ClickHouse Query Execution Pipeline

Efﬁcient Exploitation of Similar Subexpressions for Query Processing

Efﬁciently Compiling Efﬁcient Query Plans for Modern Hardware

Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask

Fuxi a Fault-Tolerant Resource Management and Job Scheduling System at Internet Scale

Morsel-Driven Parallelism A NUMA-Aware Query Evaluation Framework for the Many-Core Age

On the Parallels between Paxos and Raft, and how to Port Optimizations

Optimal Bloom Filters and Adaptive Merging for LSM-Trees∗

Paxos Made Simple

PaxosRaft 分布式一致性算法原理剖析及其在实战中的应用

Predicate Pushdown in Parquet and Apache Spark

Push vs. Pull-Based Loop Fusion in Query Engines-ppt

Push vs. Pull-Based Loop Fusion in Query Engines

SCOPE parallel databases meet MapReduce

Static and Dynamic Big Data Partitioning on Apache Spark

The Case for Learned Index Structures

Yugong Geo-Distributed Data and Job Placement at Scale

Towards Practical Vectorized Analytical Query Engines

A History and Evaluation of System R

HyPer HYbrid OLTP&OLAP High PERformance Database System

Kudu Storage for Fast Analytics on Fast Data

Vectorization vs. Compilation in Query Execution

uncategorized

Abadi et al. - 2003 - Aurora A new model and architecture for data stream management-annotated

Abadi et al. - 2007 - Scalable semantic web data management using vertical partitioning-annotated

Aberger et al. - 2017 - Empty headed A relational engine for graph processing-annotated

Abouzeid et al. - 2009 - HadoopDB An architectural hybrid of mapreduce and DBMS technologies for analytical workloads-annotated

Aggarwal et al. - 2003 - A framework for clustering evolving data streams-annotated

Agrawal et al. - 2009 - Asynchronous view maintenance for VLSD databases-annotated

Ahmad, Nath - 2008 - COLR-Tree Communication-efficient spatio-temporal indexing for a sensor data web portal-annotated

Ananthanarayanan et al. - 2013 - Photon Fault-tolerant and scalable joining of continuous data streams-annotated

Andersen et al. - 2001 - Resilient overlay networks-annotated

Andersen et al. - 2009 - FAWN A fast array of wimpy nodes-annotated

Antenucci, Anderson, Cafarella - 2016 - A declarative query processing system for nowcasting-annotated

Armbrust et al. - 2013 - Generalized scale independence through incremental precomputation-annotated

Armenatzoglou, Papadopoulos, Papadias - 2013 - A general framework for geoSocial query processing-annotated

Azzam et al. - 2011 - Liver transplantation in patients with hepatocellular carcinoma A single-center experience-annotated

Banerjee et al. - 2006 - OMNI An efficient overlay multicast infrastructure for real-time applications-annotated

Battle, Chang, Stonebraker - 2016 - Dynamic prefetching of data tiles for interactive visualization-annotated

Benjelloun - Unknown - Benjelloun et al. - 2008 - Databases with uncertainty and lineage-annotated

Biswas, Morris - Unknown - ExOR Opportunistic Multi-Hop Routing for Wireless Networks-annotated

Blanas et al. - 2010 - A comparison of join algorithms for log processing in MaPreduce-annotated

Boehm et al. - 2016 - SystemML-annotated

Boral et al. - 1990 - Prototyping Bubba, A Highly Parallel Database System-annotated

Bruno - Unknown - Continuous Cloud-Scale Query Optimization and Processing-annotated

Bu, Carey - Unknown - Pregelix Big ( ger ) Graph Analytics on A Dataflow Engine-annotated

Cafarella et al. - 2016 - WebTables Exploring the Power of Tables on the Web-annotated

Candea, Polyzotis, Vingralek - 2009 - A scalable, predictable join operator for highly concurrent data warehouses-annotated

Cao et al. - 2011 - Fast checkpoint recovery algorithms for frequently consistent applications-annotated

Chaiken et al. - 2008 - SCOPE Easy and efficient parallel processing of massive data sets-annotated

Chan, Dehne, Taylor - 2005 - CGMGRAPHCGMLIB Implementing and testing CGM graph algorithms on PC clusters and shared memory machines-annotated

Chandra, Griesemer, Redstone - 2007 - Paxos Made Live -An Engineering Perspective (2006 Invited Talk)-annotated

Chandrasekaran, Dadush, Vempala - 2010 - Thin partitions Isoperimetric inequalities and a sampling algorithm for star shaped bodies-annotated

Chang et al. - 2006 - BigTable A distributed storage system for structured data-annotated

Chen, Gibbons, Nath - 2010 - PR-join A non-blocking join achieving higher early result rate with statistical guarantees-annotated

Cheng et al. - 2012 - Kreach Who is in your small world-annotated

Chirigati et al. - 2016 - Knowledge exploration using tables on the web-annotated

Clement et al. - 2009 - UpRight cluster services-annotated

Condie et al. - 2019 - MapReduce online-annotated

Cong, Jensen - 2016 - Querying Geo-Textual Data-annotated

Considine et al. - 2004 - Approximate aggregation techniques for sensor databases-annotated

Cooper et al. - 2010 - Benchmarking cloud serving systems with YCSB-annotated

Cudre-Mauroux, Wu, Madden - 2010 - TrajStore An adaptive storage system for very large trajectory data sets-annotated

DIAO, RIZVI, FRNAKLIN - 2004 - Towards an Internet-Scale XML Dissemination Service1-annotated

Dabek et al. - 2004 - Vivaldi A decentralized network coordinate system-annotated

Das, Dutta - 2004 - Data acquisition in multiple-sink sensoi networks-annotated

Datta, Stoica, Franklin - 2007 - LagOver Latency Gradated Overlays-annotated

DeCandia et al. - 2007 - Dynamo Amazon's highly available key-value store-annotated

Dean, Ghemawat - 2008 - MapReduce Simplified data processing on large clusters-annotated

Dobrescu et al. - Unknown - Routebricks-Sosp09-annotated

Drosou, Pitoura - 2015 - Multiple radii disc diversity Result diversification based on dissimilarity and coverage-annotated

Fan et al. - 2012 - Query preserving graph compression-annotated

Fan, Wang, Wu - 2012 - Performance guarantees for distributed reachability queries-annotated

Feamster, Balakrishnan - 2005 - Detecting BGP configuration faults with static analysis-annotated

Ford et al. - 2019 - Availability in globally distributed storage systems-annotated

Franklin et al. - 2011 - CrowdDB Answering queries with crowdsourcing-annotated

Frey, Alonso - 2009 - Minimizing the hidden cost of RDMA-annotated

Gates et al. - 2009 - Building a highlevel dataflow system on top of Map-Reduce The Pig experience-annotated

Ghemawat, Gobioff, Leung - 2003 - The google file system-annotated

Gonzalez, Bickson, Guestrin - Unknown - Osdi2012-Gonzalez-Low-Gu-Bickson-Guestrin-annotated

Grund, CudreMauroux, Madden - 2011 - A demonstration of HYRISE- A main memory hybrid storage engine-annotated

Guan, Yan, Kaplan - 2012 - Measuring twoevent structural correlations on graphs-annotated

Gulisano et al. - 2010 - StreamCloud A large scale data streaming system-annotated

Gummadi et al. - 2004 - Improving the reliability of internet paths with one-hop source routing-annotated

Halevy et al. - 2016 - Goods Organizing Google's datasets-annotated

Harding et al. - 2016 - An evaluation of distributed concurrency control-annotated

He, Singh - 2008 - Graphs-at-a-time Query language and access methods for graph databases-annotated

Heller et al. - 2019 - Elastictree Saving energy in data center networks-annotated

Herodotou, Borisov, Babu - 2011 - Query optimization techniques for partitioned tables-annotated

Hu, Tao, Chung - 2013 - Massive graph triangulation-annotated

Huang, Abadi - 2016 - LEOPARD Lightweight edg-oriented partitioning and replication for dynamic graphs-annotated

Huang, Abadi, Ren - 2011 - Scalable SPARQL querying of large RDF graphs-annotated

Huebsch et al. - 2003 - Querying the internet with PIER-annotated

Hwang, Çetintemel, Zdonik - 2008 - Fast and highly-available stream processing over wide area networks-annotated

Ipeirotis et al. - 2007 - Modeling and managing changes in text databases-annotated

Isard et al. - 2008 - 3_Quincy-annotated

Ivanova et al. - 2010 - An architecture for recycling intermediates in a column-store-annotated

Jeffery, Garofalakis, Franklin - 2006 - Adaptive cleaning for RFID data streams-annotated

Jin et al. - 2012 - SCARAB Scaling reachability computation on large graphs-annotated

Johnson et al. - 2010 - Aether A scalable approach to logging-annotated

Jones, Abadi, Madden - 2010 - Low overhead concurrency control for partitioned main memory databases-annotated

Jung et al. - 2010 - Mistral Dynamically managing power, performance, and adaptation cost in cloud infrastructures-annotated

Karypis, Kumar - 1998 - A fast and high quality multilevel scheme for partitioning irregular graphs-annotated

Kemper, Neumann - 2011 - HyPer A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots-annotated

Khan et al. - 2013 - NeMa Fast graph search with label similarity-annotated

Khurana, Deshpande - 2013 - Efficient snapshot retrieval over historical graph data-annotated

Kostić et al. - 2003 - Bullet-annotated

Koutris et al. - 2013 - Toward practical query pricing with QueryMarket-annotated

Kriegel et al. - 1990 - The R-tree an efficient and robust access method for points and rectangles-annotated

Kwon et al. - 2012 - SkewTune in action Mitigating skew in MapReduce applications-annotated

Kyrola, Blelloch, Guestrin - 2012 - Graphchi Large-scale graph computation on just a PC-annotated

Lee - 2013 - A view of cloud computing-annotated

Lee et al. - 2009 - MCC-DB Minimizing cache conflicts in multi-core processors for databases-annotated

Lee, Zheng - 2005 - DSI A fully distributed spatial index for location-based wireless broadcast services-annotated

Levis et al. - 2004 - Trickle A self-regulating algorithm for code propagation and maintenance in wireless sensor networks-annotated

Li et al. - 2015 - Influential Community Search in Large Networks-annotated

Li, Patel - 2013 - BitWeaving Fast scans for main memory data processing-annotated

Luo et al. - 2013 - Finding time period-based most frequent path in big trajectory data-annotated

Ma et al. - 2016 - GSQL Fast query processing via graph exploration-annotated

Machanavajjhala et al. - 2008 - Scalable Ranked Publish Subscribe-annotated

Mackert, Lohman - 1986 - R Optimizer Validation and Performance Evaluation for Local Queries-annotated

Madden et al. - 2002 - TAG A tiny aggregation service for ad-hoc sensor networks∗-annotated

Manoharan et al. - 2016 - Shasta Interactive reporting at scale-annotated

Marcus et al. - 2012 - Counting with the crowd-annotated

McConnell, Ping, Hwang - 2010 - iFlow An approach for fast and reliable internetScale stream processing utilizing detouring and replicat-annotated

Neumann - 2011 - Efficiently compiling efficient query plans for Modern Hardware-annotated

Olteanu, Huang - 2008 - Using OBDDs for efficient query evaluation on probabilistic databases-annotated

Pham, Shahabi, Liu - 2013 - EBM - An entropy-based model to infer social strength from spatiotemporal data-annotated

Rajsbaum - 2003 - ACM SIGACT news distributed computing column 13-annotated

Reiter - 2013 - Zephyr-annotated

Science, Yang - 2006 - On the Database Network Interface in Large-Scale Publish Subscribe Systems-annotated

Soliman et al. - 2014 - Orca A modular query optimizer architecture for big data-annotated

Unknown - 1983 - David Dewitt and Jim Gray-annotated

Unknown - Unknown - 5da318c7038f90b5433dc43b5a8b70c86461528e-annotated

Waas, Galindo-Legaria - 2000 - Counting, enumerating, and sampling of execution plans in a cost-based query optimizer-annotated

Weissman - 2013 - Comet-annotated

Wu - 2003 - An Extended Dynamic Source Routing Scheme in Ad Hoc Wireless Networks-annotated