:: Publikationen Details
| Jahr: | 2010 : 2009 : 2008 : 2007 : 2006 : 2005 : 2004 : 2003 : 2002 : 2001 : 2000 : 1999 : 1998 : 1997 : 1996 : 1995 : 1994 : 1993 : 1992 : 1991 : 1990 : 1988 : |
:: 2010
TransScale: Scalability Transformations for Declarative Applications
Year: 2010 Journal: Proceedings of the 26th International Conference on Data Engineering (ICDE), Long Beach, California, USA. (Demonstration) BibTeX: @inproceedings{Boehm:2010:TST, author = {Alexander B{\"o}hm and Erich Marth and Carl-Christian Kanne}, title = {TransScale: Scalability Transformations for Declarative Applications}, booktitle = {Proceedings of the 26th International Conference on Data Engineering (ICDE), Long Beach, California, USA.}, note = {Demonstration}, year = {2010} }Abstract: Recent developments in cloud computing have turned data processing and storage capacities into commodities. The scalability of an application is no longer limited by the amount of available hardware, but solely depends on whether it can be effectively distributed to multiple hosts. A conventional approach is to use a distributed run-time system that can execute applications in a transparent manner on multiple machines. Unfortunately, this approach does not scale well when dealing with shared, updated data. Instead, real-world applications use application-specific, hand-written messaging code to coordinate the hosts. Hence, to distribute an application across multiple hosts, developers have to manually restructure the application logic to create balanced application fragments and data partitions. This process of making an application scalable is tedious. It involves tough decisions on when to give up full consistency for the sake of scalability, and design patterns or best practices only slowly emerge. The goal of the Demaq/TransScale system is to automate the distribution of complex application processes to large numbers of hosts. In this demonstration, we show how to automate much of the manual work involved in making applications scalable, improving developer productivity and reducing time-to- market for such applications. Our approach is based on the novel programming model of the Demaq system which allows to elegantly specify messaging applications. It uses a simple, yet powerful data model and declarative rules to describe the application logic. As a result, reasoning about the data flow and data access patterns of an application as well as automatic program transformations are highly simplified.
:: 2009
Messaging Rules as a Programming Model for Enterprise Application Integration
Year: 2009 Journal: Proceedings of the 3rd East European Workshop on Rule-Based Applications, Cottbus, Germany. BibTeX: @inproceedings{Boehm:2009:MRP, author = {Alexander B{\"o}hm and Carl-Christian Kanne}, title = {Messaging Rules as a Programming Model for Enterprise Application Integration}, booktitle = {Proceedings of the 3rd East European Workshop on Rule-Based Applications, Cottbus, Germany.}, year = {2009} }Abstract: Rule-based systems and languages are successful in many application areas such as business rules or active database systems. The goal of the Demaq project is to investigate the feasibility and benefits of using a declarative, rule-based programming language to simplify the development of complex, distributed applications. For this purpose, we propose a novel programming paradigm based on messaging, queues and declarative rules. We focus on evaluating whether the proposed, rule-based approach can be used to implement complex application patterns. We use Enterprise Application Integration (EAI) as an example application domain, as EAI applications involve multiple, heterogeneous systems with complex interaction patterns. We discuss whether and how these application patterns can be implemented using our rule language.
Messaging Rules as a Programming Model for Enterprise Application Integration
Year: 2009 Journal: Technical Report Download: PDF BibTeX: @techreport{Boehm:2009:MRPtr, author = {Alexander B{\"o}hm and Carl-Christian Kanne}, title = {Messaging Rules as a Programming Model for Enterprise Application Integration}, institution = {University of Mannheim}, year = {2009} }Abstract: Rule-based systems and languages are successful in many application areas such as business rules or active database systems. The goal of the Demaq project is to investigate the feasibility and benefits of using a declarative, rule-based programming language to simplify the development of complex, distributed applications. For this purpose, we propose a novel programming paradigm based on messaging, queues and declarative rules. We focus on evaluating whether the proposed, rule-based approach can be used to implement complex application patterns. We use Enterprise Application Integration (EAI) as an example application domain, as EAI applications involve multiple, heterogeneous systems with complex interaction patterns. We discuss whether and how these application patterns can be implemented using our rule language.
Processes Are Data: A Programming Model for Distributed Applications
Year: 2009 Journal: Proceedings of the Tenth International Conference on Web Information Systems Engineering (WISE), Poznan, Poland. BibTeX: @inproceedings{Boehm:2009:PAD, author = {Alexander B{\"o}hm and Carl-Christian Kanne}, title = {Processes Are Data: A Programming Model for Distributed Applications}, booktitle = {Proceedings of the Tenth International Conference on Web Information Systems Engineering (WISE), Poznan, Poland.}, year = {2009} }Abstract: Many modern distributed applications employ protocols based on XML messages. Typical architectures for these applications follow an approach where messages are organized in queues, state is stored in DBMS, and application code is written in imperative languages. As a result, much developer productivity and system performance is wasted on handling conversions between the various data models (XML messages,objects, relations), and reliably managing persistent state for application instances. This overhead turns application servers into data management servers instead of process servers. We show how this model can be greatly improved by changing two aspects. Firstly, by using a declarative rule language to describe the processing logic. Secondly, by providing a single, unified data model based on XML messages that covers all kinds of data encountered, including process state. We discuss the resulting design choices for compile-time and run-time systems, and show and experimentally evaluate the performance improvements made possible.
Preventing Bad Plans by Bounding the Impact of Cardinality Estimation Errors
Year: 2009 Journal: Proceedings of the 35th International Conference on Very Large Data Bases (VLDB), Lyon, France. Download: PDF BibTeX: @inproceedings{Moerkotte:2009:PBP, author = {Guido Moerkotte and Thomas Neumann and Gabriele Steidl}, title = {Preventing Bad Plans by Bounding the Impact of Cardinality Estimation Errors}, booktitle = {Proceedings of the 35th International Conference on Very Large Data Bases (VLDB), Lyon, France}, year = {2009} }Abstract: Query optimizers rely on accurate estimations of the sizes of intermediate results. Wrong size estimations can lead to overly expensive execution plans. We first define the q-error to measure deviations of size estimates from actual sizes. The q-error enables the derivation of two important results: (1) We provide bounds such that if the q-error is smaller than this bound, the query optimizer constructs an optimal plan. (2) If the q-error is bounded by a number q, we show that the cost of the produced plan is at most a factor of q^4 worse than the optimal plan. Motivated by these findings, we next show how to find the best approximation under the q-error. These techniques can then be used to build synopsis for size estimates. Finally, we give some experimental results where we apply the developed techniques.
Scalability Transformations on Declarative Applications
Year: 2009 Journal: Technical Report Download: PDF BibTeX: @techreport{Boehm:2009:STD, author = {Alexander B{\"o}hm and Carl-Christian Kanne}, title = {Scalability Transformations on Declarative Applications}, institution = {University of Mannheim}, year = {2009} }Abstract: Many current distributed applications are based on the exchange of XML messages. Scaling such applications to the high processing volume demanded by Internet-scale deployment typically requires costly redesign and coding. In this paper, we investigate how a declarative specification of such applications can simplify the task of deploying them on a large number of host machines. In our model, applications are represented as a graph of message queues connected by message flow rules. The state of application instances is encoded in the message history of the queues and accessed using XQuery expressions. We show how to split such an application into distributable fragments using graph partitioning and discuss different algorithms for placing the fragments on hosts. Typically, an initial application specification contains data dependencies that place an upper limit on the number of fragments, and hence the number of usable machines. We describe transformations that increase the number of possible fragments by converting data dependencies into message flow. An evaluation using the TPC-App benchmark and a runtime system prototype confirms the feasibility and performance benefits of this approach.
Processes Are Data: A Programming Model for Distributed Applications
Year: 2009 Journal: Technical Report Download: PDF BibTeX: @techreport{Boehm:2009:PAD, author = {Alexander B{\"o}hm and Carl-Christian Kanne}, title = {Processes Are Data: A Programming Model for Distributed Applications}, institution = {University of Mannheim}, year = {2009} }Abstract: Applications in distributed environments must scale to an increasing number of concurrently active application instances. Today's application servers spend a significant amount of resources on reliably managing state for these instances, turning them into data management servers instead of process servers. The goal of the Demaq project is to overcome the limitations of these systems using a novel programming model for applications based on asynchronous messaging (e.g. Web Services). A crucial aspect of our approach is the representation of state. Messages do not only represent requests and replies sent to and from an application, but retained messages are also used to model the application instance state. This contrasts with most of today’s application servers where two separate data models, languages and stores are used for requests and state. In Demaq, a single, highly efficient, reliable message store is used both for requests and instance state, and a single declarative language specifies message flow and state management. This extends data independence to the whole application stack, thereby improving both developer productivity and - as experimental results confirm - application scalability and performance.
:: 2008
Dynamic Programming Strikes Back
Year: 2008 Journal: Proceedings of the 28th ACM SIGMOD/PODS International Conference on Management of Data / Principles of Database Systems, Vancouver, BC, Canada. BibTeX: @inproceedings{Moerkotte:2008:DPS, author = {Guido Moerkotte and Thomas Neumann}, title = {Dynamic Programming Strikes Back}, booktitle = {Proceedings of the 28th ACM SIGMOD/PODS International Conference on Management of Data / Principles of Database Systems, Vancouver, BC, Canada.}, year = {2008} }Abstract: Two highly efficient algorithms are known for optimally ordering joins while avoiding cross products: DPccp, which is based on dynamic programming, and Top-Down Partition Search, based on memoization. Both have two severe limitations: They handle only (1) simple (binary) join predicates and (2) inner joins. However, real queries may contain complex join predicates, involving more than two relations, and outer joins as well as other non-inner joins. Taking the most efficient known join-ordering algorithm, DPccp, as a starting point, we first develop a new algorithm, DPhyp, which is capable to handle complex join predicates efficiently. We do so by modeling the query graph as a (variant of a) hypergraph and then reason about its connected subgraphs. Then, we present a technique to exploit this capability to efficiently handle the widest class of non-inner joins dealt with so far. Our experimental results show that this reformulation of non-inner joins as complex predicates can improve optimization time by orders of magnitude, compared to known algorithms dealing with complex join predicates and non-inner joins. Once again, this gives dynamic programming a distinct advantage over current memoization techniques.
The Demaq System: Declarative Development of Distributed Applications
Year: 2008 Journal: Proceedings of the 28th ACM SIGMOD/PODS International Conference on Management of Data / Principles of Database Systems, Vancouver, BC, Canada. (Demonstration) BibTeX: @inproceedings{Boehm:2008:TDS, author = {Alexander B{\"o}hm and Erich Marth and Carl-Christian Kanne}, title = {The Demaq System: Declarative Development of Distributed Applications}, booktitle = {Proceedings of the 28th ACM SIGMOD/PODS International Conference on Management of Data / Principles of Database Systems, Vancouver, BC, Canada.}, note = {Demonstration}, year = {2008} }Abstract: The goal of the Demaq project is to investigate a novel way of thinking about distributed applications that are based on the asynchronous exchange of XML messages. Unlike today's solutions that rely on imperative programming languages and multi-tiered application servers, Demaq uses a declarative language for implementing the application logic as a set of rules. A rule compiler transforms the application specifications into execution plans against the message history, which are evaluated using our optimized runtime engine. This allows us to leverage existing knowledge about declarative query processing for optimizing distributed applications.
Faster Join Enumeration for Complex Queries
Year: 2008 Journal: Proceedings of the 24th International Conference on Data Engineering, Cancún, México. (Poster) BibTeX: @inproceedings{Moerkotte:2008:FJE, author = {Guido Moerkotte and Thomas Neumann}, title = {Faster Join Enumeration for Complex Queries}, booktitle = {Proceedings of the 24th International Conference on Data Engineering, Cancún, México}, note = {Poster}, year = {2008} }Abstract: Most existing join ordering algorithms concentrate on join queries with simple join predicates and inner joins only, where simple predicates are those that involve exactly two relations. However, real queries may contain complex join predicates, i.e. predicates involving more than two relations. We show how to handle complex join predicates efficiently, by modeling the query graph as a hypergraph and reasoning about its connected subgraphs.
:: 2007
Efficient XQuery Evaluation of Grouping Conditions with Duplicate Removals
Year: 2007 Journal: Proceedings of the Fifth International XML Database Symposium (XSym), Vienna, Austria Download: PDF BibTeX: @inproceedings{may:2007:exe, author = {Norman May and Guido Moerkotte}, title = {Efficient XQuery Evaluation of Grouping Conditions with Duplicate Removals}, booktitle = {Proc. of Fifth International XML Database Symposium, Vienna, Austria}, }Abstract: Currently, grouping in XQuery must be expressed implicitly with nested FLWOR expressions. With XQuery 1.1, an explicit group by clause will be part of this query language. As users integrate this new construct into their applications, it becomes important to have efficient evaluation techniques available to process even complex grouping conditions. Among them, the removal of distinct values or distinct nodes in the partitions defined by the group by clause is not well-supported yet. The evaluation technique proposed in this paper is able to handle duplicate removal in the partitions efficiently. Experiments show the superiority of our solution compared to state-of-the-art query processing.
Let a Single FLWOR Bloom (to improve XQuery plan generation)
Year: 2007 Journal: Proceedings of the Fifth International XML Database Symposium (XSym), Vienna, Austria Download: PDF BibTeX: @inproceedings{brantner:2007:DXE, author = {Matthias Brantner and Carl-Christian Kanne and Guido Moerkotte}, title = {Let a Single {FLWOR} Bloom (to improve XQuery plan generation)}, booktitle = {Proc. of Fifth International XML Database Symposium, Vienna, Austria}, }Abstract: To globally optimize execution plans for XQuery expressions, a plan generator must generate and compare plan alternatives. In proven compiler architectures, the unit of plan generation is the query block. Fewer query blocks mean a larger search space for the plan generator and lead to a generally higher quality of the execution plans. The goal of this paper is to provide a toolkit for developers of XQuery evaluators to transform XQuery expressions into expressions with as few query blocks as possible. Our toolkit takes the form of rewrite rules merging the inner and outer FLWOR expressions into single FLWORs. We focus on previously unpublished rewrite rules, and on inner FLWORs occurring in the for, let, and return clauses in the outer FLWOR.
Let a Single FLWOR Bloom
Year: 2007 Journal: Technical Report Download: PDF BibTeX: @TechReport{brantner:2007:DXE, author = {Matthias Brantner and Carl-Christian Kanne and Guido Moerkotte}, title = {Let a Single {FLWOR} Bloom}, note = {http://madoc.bib.uni-mannheim.de/madoc/volltexte/2007/1428/pdf/TR_2007_007.pdf} institution = {University of Mannheim}, year = {2007} }Abstract: To globally optimize execution plans for XQuery expressions, a plan generator must generate and compare plan alternatives. In proven compiler architectures, the unit of plan generation is the query block. Fewer query blocks mean a larger search space for the plan generator and lead to a generally higher quality of the execution plans. The goal of this paper is to provide a toolkit for developers of XQuery evaluators to transform XQuery expressions into expressions with as few query blocks as possible. Our toolkit takes the form of rewrite rules merging the inner and outer FLWOR expressions into single FLWORs. We focus on previously unpublished rewrite rules, and on inner FLWORs occurring in the for, let, and return clauses in the outer FLWOR.
Declarative Development of Distributed Applications
Year: 2007 Journal: Proceedings SIGMOD2007 Ph.D. Workshop on Innovative Database Research 2007(IDAR2007) (Poster Paper) BibTeX: @inproceedings{Boehm:2007:DDD, author = {Alexander Böhm}, title = {Declarative Development of Distributed Applications}, booktitle = {Proceedings SIGMOD2007 Ph.D. Workshop on Innovative Database Research 2007(IDAR2007)}, note = {Poster Paper}, year = {2007} }Abstract: We present a novel approach for the convenient implementation and efficient execution of distributed XML message processing applications. Various application classes such as Web Services are based on asynchronously exchanging XML data. However, today's programming languages and execution systems fail to support their particular demands such as integrated XML type support and efficient, asynchronous messaging. As a result, the development of distributed applications is unnecessarily complex and their runtime performance deteriorates. To overcome these deficits, we propose a fully declarative XML processing language that allows for the convenient specification of an individual node participating in a distributed application. It is complemented by a corresponding runtime system for reliable and efficient application execution.
Declarative Development of Distributed Applications
Year: 2007 Journal: Technical Report Download: PDF BibTeX: @techreport{Boehm:2007:DDDTR, author = {Alexander Böhm}, title = {Declarative Development of Distributed Applications}, institution = {University of Mannheim}, year = {2007} }Abstract: We present a novel approach for the convenient implementation and efficient execution of distributed XML message processing applications. Various application classes such as Web Services are based on asynchronously exchanging XML data. However, today's programming languages and execution systems fail to support their particular demands such as integrated XML type support and efficient, asynchronous messaging. As a result, the development of distributed applications is unnecessarily complex and their runtime performance deteriorates. To overcome these deficits, we propose a fully declarative XML processing language that allows for the convenient specification of an individual node participating in a distributed application. It is complemented by a corresponding runtime system for reliable and efficient application execution.
Impedantix: An API for Native XML Data Stores
Year: 2007 Journal: Technical Report Download: PDF BibTeX: @TechReport{Boehm:2007:IAX, author = {Alexander Böhm and Matthias Brantner and Sven Helmer and Alexander Hollmann and Carl-Christian Kanne and Norman May and Guido Moerkotte}, title = {{I}mpedantix: {A}n {API} for {N}ative {XML} {D}ata {S}tores}, institution = {University of Mannheim}, year = {2007} }Abstract: Database systems have to provide powerful Application Programming Interfaces (APIs) to facilitate the convenient development of data-intensive applications. While de-facto standards such as ODBC and JDBC have become widely accepted and adopted in the relational world, they provide only limited support for the specific requirements of XML data processing applications. To overcome these deficiencies, we (1) identify the specific requirements and (2) propose Impedantix, an API for interfacing with native XML data stores
Demaq: A Foundation for Declarative XML Message Processing
Year: 2007 Journal: Proc. 3. Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA Download: Paper BibTeX: @InProceedings{Boehm:2007:DFD, author = {A. B\"ohm and C-C. Kanne and G. Moerkotte}, title = {Demaq: A Foundation for Declarative XML Message Processing}, year = {2007}, booktitle = {Proc. 3. Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA}, }Abstract: This paper gives an overview of Demaq, an XML message processing system operating on the foundation of transactional XML message queues. We focus on the syntax and semantics of its fully declarative, rule-based application language and demonstrate our message-based programming para\-digm in the context of a case study. Further, we discuss optimization opportunities for executing Demaq programs.
Unnesting Scalar SQL Queries in the Presence of Disjunction
Year: 2007 Journal: Proc. 23. ICDE Conference, Istanbul, Turkey Download: Paper BibTeX: @InProceedings{Brantner:2007:USQ, author = {M. Brantner and N. May and G. Moerkotte}, title = {Unnesting Scalar {SQL} Queries in the Presence of Disjunction}, year = {2007}, booktitle = {Proc. 23. ICDE Conference, Istanbul, Turkey}, }Abstract: Optimizing nested queries is an intricate problem. It becomes even harder if in a nested query the linking predicate or the correlation predicate occurs disjunctively. We present the first unnesting strategy that can effectively deal with such queries. The starting point of our approach is to translate SQL into the relational algebra extended by bypass operators. Then we present for the first time unnesting equivalences which are valid for algebraic expressions containing bypass operators. Applying these to the translated queries results in our effective unnesting strategy for nested SQL queries with disjunction. With an extensive experimental study (including three commercial DBMSs), we demonstrate the possible performance gains of our approach.
:: 2006
On the Optimal Ordering of Maps, Selections, and Joins Under Factorization
Year: 2006 Journal: Proceedings of 23rd British National Conference on Databases (BNCOD) Download: PDF, Springer BibTeX: @inproceedings{DBLP:conf/bncod/NeumannHM06, author = {Thomas Neumann and Sven Helmer and Guido Moerkotte}, title = {On the Optimal Ordering of Maps, Selections, and Joins Under Factorization.}, booktitle = {Proceedings of 23rd British National Conference on Databases, Belfast, Northern Ireland, UK}, year = {2006}, pages = {115-126} }Abstract: We examine the problem of producing the optimal evaluation order for queries containing joins, selections, and maps. Specifically, we look at the case where common subexpressions involving expensive UDF calls can be factored out. First, we show that ignoring factorization during optimization can lead to plans that are far off the best possible plan: the difference in cost between the best plan considering factorization and the best plan not considering factorization can easily reach several orders of magnitude. Then, we introduce optimization strategies that produce optimal left-deep and bushy plans when factorization is taken into account. Experiments (1) confirm that factorization is a critical issue when it comes to generating optimal plans and (2) we show that to consider factorization does not make plan generation significantly more expensive.
Kappa-Join: Efficient Execution of Existential Quantification in XML Query Languages
Year: 2006 Journal: Proc. Workshop Fourth International XML Database Symposium, Seoul, Korea Download: Paper, Springer BibTeX: @InProceedings{Brantner:2006:KEE, author = {M. Brantner and S. Helmer and C-C. Kanne and G. Moerkotte}, title = {{Kappa-Join}: Efficient Execution of Existential Quantification in {XML} Query Languages}, booktitle = {Proc. Workshop Fourth International XML Database Symposium, Seoul, Korea}, pages = {1--15} year = {2006} }Abstract: XML query languages feature powerful primitives for formulating queries, involving comparison expressions which are existentially quantified. If such comparisons involve several scopes, they are correlated and, thus, become difficult to evaluate efficiently. In this paper, we develop a new ternary operator, called Kappa-Join, for efficiently evaluating queries with existential quantification. In XML queries, a correlation predicate can occur conjunctively and disjunctively. Our decorrelation approach not only improves performance in the conjunctive case, but also allows decorrelation of the disjunctive case. The latter is not possible with any known technique. In an experimental evaluation, we compare the query execution times of the Kappa-Join with existing XPath evaluation techniques to demonstrate the effectiveness of our new operator.
Index vs. Navigation in XPath Evaluation
Year: 2006 Journal: Proc. Workshop Fourth International XML Database Symposium, Seoul, Korea Download: Paper, Springer BibTeX: @InProceedings{May:2006:INX, author = {N. May and M. Brantner and A. B\"ohm and C-C. Kanne and G. Moerkotte}, title = {Index vs. Navigation in XPath Evaluation}, booktitle = {Proc. Workshop Fourth International XML Database Symposium, Seoul, Korea}, pages = {16--30} year = {2006} }A Linear-Time Algorithm for Optimal Tree Sibling Partitioning and Approximation Algorithms in Natix
Year: 2006 Journal: Proc. 32nd VLDB Conference, Seoul, Korea BibTeX: @InProceedings{Kanne:2006:TFX, author = {C.-C. Kanne and G. Moerkotte}, title = {A Linear-Time Algorithm for Optimal Tree Sibling Partitioning and Approximation Algorithms in Natix}, year = {2006}, booktitle = "Proc. 32nd VLDB Conference, Seoul, Korea" }Abstract: Document insertion into a native XML Data Store (XDS) requires to partition the document tree into a number of storage units with limited capacity, such as records on disk pages. As intra partition navigation is much faster than navigation between partitions, minimizing the number of partitions has a beneficial effect on query performance. We present a linear time algorithm to optimally partition an ordered, labeled, weighted tree such that each partition does not exceed a fixed weight limit. Whereas traditionally tree partitioning algorithms only allow child nodes to share a partition with their parent node (i.e. a partition corresponds to a subtree), our algorithm also considers partitions containing several subtrees as long as their roots are adjacent siblings. We call this sibling partitioning. Based on our study of the optimal algorithm, we further introduce two novel, near-optimal heuristics. They are easier to implement, do not need to hold the whole document instance in memory, and require much less runtime than the optimal algorithm. Finally, we provide an experimental study comparing our novel and existing algorithms. One important finding is that compared to partitioning that exclusively considers parent-child partitions, including sibling partitioning as well can decrease the total number of partitions by more than 90%, and improve query performance by more than a factor of two.
A Linear-Time Algorithm for Optimal Tree Sibling Partitioning and Approximation Algorithms in Natix
Year: 2006 Journal: Technical report Download: local, Library Server BibTeX: @TechReport{Kanne:2006:LTA_TR, author = "Carl-Christian Kanne and Guido Moerkotte", title = {A Linear Time Algorithm for Optimal Tree Sibling Partitioning and its Application to {XML} Data Stores}, institution = "University of Mannheim, Department for Mathematics and Computer Science", year = "2006", type = "Technical Report", number = "TR-2006-006" }Abstract: Document insertion into a native XML Data Store (XDS) requires to partition the document tree into a number of storage units with limited capacity, such as records on disk pages. As intra partition navigation is much faster than navigation between partitions, minimizing the number of partitions has a beneficial effect on query performance. We present a linear time algorithm to optimally partition an ordered, labeled, weighted tree such that each partition does not exceed a fixed weight limit. Whereas traditionally tree partitioning algorithms only allow child nodes to share a partition with their parent node (i.e. a partition corresponds to a subtree), our algorithm also considers partitions containing several subtrees as long as their roots are adjacent siblings. We call this sibling partitioning. Based on our study of the optimal algorithm, we further introduce two novel, near-optimal heuristics. They are easier to implement, do not need to hold the whole document instance in memory, and require much less runtime than the optimal algorithm. Finally, we provide an experimental study comparing our novel and existing algorithms. One important finding is that compared to partitioning that exclusively considers parent-child partitions, including sibling partitioning as well can decrease the total number of partitions by more than 90%, and improve query performance by more than a factor of two.
Template Folding for XPath
Year: 2006 Journal: Third International Workshop on XQuery Implementation, Experience and Perspectives (XIME-P 2006) Download: paper BibTeX: @InProceedings{Kanne:2006:TFX, author = {C.-C. Kanne and G. Moerkotte}, title = {Template Folding for XPath}, year = {2006}, booktitle = "Third International Workshop on XQuery Implementation, Experience and Perspectives (XIME-P 2006)" }Abstract: We discuss query evaluation for XML-based server systems where the same query is evaluated on every incoming XML message. In a typical scenario, many of the incoming messages will be highly similar to each other. Current XML query evaluators reevaluate the query from scratch on every message. We call substructures that occur in many input documents template fragments, and introduce a novel template folding method that allows to move the work of evaluating the query on recurring document substructures from the query execution engine into the query compiler. Similar to constant folding, our method avoids run-time evaluation of intermediate results whose value only depends on information that is already available at compile time. For XPath location paths, we propose a representation for such invariant intermediate results, and show how it can be incorporated into query execution plans. Such augmented execution plans improve query performance when evaluating the same query on subsequent input documents.
Implementierung von skalierbaren, hochperformanten Web Services durch deklarative Nachrichtenverarbeitung
Year: 2006 Journal: Proc. doIT Software-Forschungstag 2006, Mannheim BibTeX: @InProceedings{Boehm:2006:ISH, author = {Alexander B\"ohm and Carl-Christian Kanne and Guido Moerkotte}, title = {Implementierung von skalierbaren, hochperformanten {W}eb {S}ervices durch deklarative {N}achrichtenverarbeitung}, booktitle = {Proc. doIT Software-Forschungstag 2006, Mannheim}, year = {2006}, note = {In German} }Abstract: Der zunehmende Einsatz von Web Services für erfolgsentscheidende Unternehmensanwendungen erfordert schnelle und flexible Anwendungsentwicklung sowie hochperformante und zuverlässige Ausführungsumgebungen. Existierende Applikationsserver erfüllen diese Anforderungen nur partiell. Wir stellen eine deklarative Regelsprache zur Definition von Geschäftsprozessen und ein zugehöriges Systemdesign auf der Basis von Nachrichtenwarteschlangen und einem nativen XML-Datenbanksystem vor.
Implementierung von skalierbaren, hochperformanten Web Services durch deklarative Nachrichtenverarbeitung
Year: 2006 Journal: Technical Report Download: local, Library Server BibTeX: @TechReport{Boehm:2006:ISH_TR, author = {Alexander B\"ohm and Carl-Christian Kanne and Guido Moerkotte}, title = {Implementierung von skalierbaren, hochperformanten {W}eb {S}ervices durch deklarative {N}achrichtenverarbeitung}, institution = {University of Mannheim, Department for Mathematics and Computer Science}, year = {2006}, type = {Technical Report}, number = {TR-2006-014}, note = {http://db.informatik.uni-mannheim.de/publications/TR-2006-014.pdf} }Abstract: Der zunehmende Einsatz von Web Services für erfolgsentscheidende Unternehmensanwendungen erfordert schnelle und flexible Anwendungsentwicklung sowie hochperformante und zuverlässige Ausführungsumgebungen. Existierende Applikationsserver erfüllen diese Anforderungen nur partiell. Wir stellen eine deklarative Regelsprache zur Definition von Geschäftsprozessen und ein zugehöriges Systemdesign auf der Basis von Nachrichtenwarteschlangen und einem nativen XML-Datenbanksystem vor.
Unnesting SQL Queries in the Presence of Disjunction
Year: 2006 Journal: Technical report Download: technical report BibTeX: @TechReport{Brantner:2006:DSQ, author = {M. Brantner and N. May and G. Moerkotte}, title = {Unnesting {SQL} Queries in the Presence of Disjunction}, institution = {University of Mannheim}, year = {2006}, month = {March}, note = {http://db.informatik.uni-mannheim.de/publications/TR-06-013.pdf} }Abstract: Optimizing nested queries is an intricate problem. It becomes even harder if in a nested query the linking predicate or the correlation predicate occurs disjunctively. We present the first unnesting strategy that can effectively deal with these queries. The starting point of our approach is to translate SQL into the relational algebra extended by bypass operators. Then we present for the first time unnesting equivalences which are valid for algebraic expressions containing bypass operators. Applying these to the translated queries results in our effective unnesting strategy for nested SQL queries with disjunction. With an extensive experimental study (including three commercial DBMSs), we demonstrate the possible performance gains and limitations of our approach.
A Declarative Control Language for Dependable XML Message Queues
Year: 2006 Journal: Proc. Workshop Dependability in Large-scale Service-oriented Systems (at ARES Conference), Vienna, Austria Download: Paper BibTeX: @InProceedings{Boehm:2006:DCL, author = {Alexander Böhm and Carl-Christian Kanne and Guido Moerkotte}, title = {A Declarative Control Language for Dependable XML Message Queues}, booktitle = {Proc. Workshop Dependability in Large-scale Service-oriented Systems (at ARES Conference), Vienna, Austria}, year = {2006} }Abstract: We present a novel approach for the implementation of efficient and dependable web service engines (WSEs). A WSE instance represents a single node in a distributed network of participants that communicate using XML messages. We introduce a fully declarative language custom-tailored to XML message processing that allows to specify business processes in a concise manner. To support the efficient and reliable evaluation of our language, we show how to augment a native, transactional XML data store with efficient and reliable XML message queues.
Strategies for Query Unnesting in XML Databases
Year: 2006 Journal: ACM Transactions on Database Systems Download: Paper, ACM TODS BibTeX: @Article{May:2006:SQU, author = {N. May and S. Helmer and G. Moerkotte}, title = {Strategies for Query Unnesting in XML Databases}, journal = {ACM Transactions on Database Systems} volume = {31}, number = {3}, pages = {968--1013}, year = {2006}, }Kappa-Join: Efficient Execution of Existential Quantification in XML Query Languages
Year: 2006 Journal: Technical Report Download: Technical Report BibTeX: @TechReport{Brantner:2006:KPJTR, author = {M. Brantner and S. Helmer and C-C. Kanne and G. Moerkotte}, title = {Kappa-Join: Efficient Execution of Existential Quantification in XML Query Languages}, institution = {University of Mannheim}, note = {http://madoc.bib.uni-mannheim.de/madoc/volltexte/2006/1227/}, year = {2006} }DP-Counter Analytics
Year: 2006 Journal: Technical report Download: Technical Report BibTeX: @TechReport{Moerkotte:2006:DCA, author = {G. Moerkotte}, title = {DP-Counter Analytics}, institution = {University of Mannheim}, year = {2006} }Analysis of Two Existing and One New Dynamic Programming Algorithm for the Generation of Optimal Bushy Join Trees without Cross Products / DP-Counter Analytics
Year: 2006 Journal: Proc. 32nd VLDB Conference, Seoul, Korea Download: Paper BibTeX: @InProceedings{Neumann:2006:ATE, author = {T. Neumann and G. Moerkotte}, title = {Analysis of Two Existing and One New Dynamic Programming Algorithm for the Generation of Optimal Bushy Join Trees without Cross Products}, year = {2006}, booktitle = "Proc. 32nd VLDB Conference, Seoul, Korea" }Ontologies for Location-based Services
Year: 2006 Journal: In Handbook of Research in Mobile Business: Technical, Methodological and Social Perspectives (ed.: B. Unhelkar), Idea Group, Inc., Hershey, PA, USA Natix Visual Interfaces
Year: 2006 Journal: Proc. 10. International Conference on Extending Database Technology (Demo Paper) Download: paper, Conference Proceedings BibTeX: @InProceedings{Boehm:2006:NVI, author = {Alexander Böhm and Matthias Brantner and Carl-Christian Kanne and Norman May and Guido Moerkotte}, title = {Natix Visual Interfaces}, booktitle = {Proc. 10. International Conference on Extending Database Technology}, year = {2006}, pages = {1125 - 1129} }Abstract: We present the architecture of the new version of Natix. Among the new features of this native XML DBMS are an optimizing XPath query compiler and a powerful API. In our demonstration we explain this API and present XPath evaluation in Natix using its visual explain facilities.
The importance of sibling clustering for efficient bulkload of XML document trees
Year: 2006 Journal: IBM Systems Journal, Number 2, Volume 45 Download: from IBM Systems Journal Website BibTeX: @InProceedings{Kanne:2006:ISC, author = {Carl-Christian Kanne and Guido Moerkotte}, title = {The importance of sibling clustering for efficient bulkload of XML document trees}, booktitle = {IBM Systems Journal}, volume = {45}, number = {2}, year = {2006} }Abstract: In an XML Data Store (XDS), importing documents from external sources is a very frequent operation. Because a document import consists of a large number of individual node inserts, it is essentially a small bulkload operation, and thus efficient bulkload support is crucial for the performance of the XDS. The bulkload operation is in essence a mapping of an XML parser's output into the storage structures of the XDS. This involves two major subtasks: (1) partitioning the document's logical tree structure into subtrees that can be stored on a page in a way that is both space-efficient and suitable for later processing and (2) mapping the subtrees to the internal representation of the XDS for paging. In enterprise-scale environments with very large documents and many parallel bulkload operations, the first task is particularly challenging, as not only disk space consumption, but also CPU and main-memory usage are important factors. In this paper, we discuss the requirements for an XDS bulkload component and examine existing algorithms for tree partitioning and their applicability to the bulkload operation. We derive a new tree-partitioning algorithm for use in the bulkload operation and present the design of the bulkload component for the XDS Natix. Finally, we evaluate the performance of the bulkload component and compare our results with previous work.
Algebraic Optimization of Nested XPath Expressions
Year: 2006 Journal: Proc. 22. ICDE Conference, Atlanta, USA (Poster Paper) Download: paper BibTeX: @InProceedings{Brantner:2006:AON, author = "Matthias Brantner and Carl-Christian Kanne and Sven Helmer and Guido Moerkotte", title = "Algebraic Optimization of Nested XPath Expressions", booktitle = "Proc. 22. ICDE Conference, Atlanta", year = "2006", pages = "128", }Abstract: The XPath language incorporates powerful primitives for formulating queries containing nested subexpressions which are existentially or universally quantified. However, even the best published approaches for evaluating XPath have unsatisfactory performance when applied to nested queries. We examine optimization techniques that unnest complex XPath queries. For this purpose, we classify XPath expressions particularly with regard to properties that are relevant for unnesting. We present algebraic equivalences that transform nested expressions into unnested expressions. In our experiments we compare the evaluation times with existing XPath evaluators and the naive evaluation.
:: 2005
Formally Specifying the Syntax and Semantics of a Visual Query Language for the Domain of High Energy Physics Data Analysis
Year: 2005 Journal: IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) BibTeX: @INPROCEEDINGS{AmHeMo05, AUTHOR = {V. Amaral and S. Helmer and G. Moerkotte}, TITLE = {Formally Specifying the Syntax and Semantics of a Visual Query Language for the Domain of High Energy Physics Data Analysis}, BOOKTITLE = {IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)}, PAGES = {251-258}, YEAR = {2005} }A Model-Based Monitoring and Diagnosis System for a Space-Based Astrometry Mission
Year: 2005 Journal: Proc. 16th International Conference on Database and Expert Systems Applications, Copenhagen, Denmark BibTeX: @InProceedings{Pavlov:2005:MBM, author = "Aleksei Pavlov and Sven Helmer and Guido Moerkotte", title = "A Model-Based Monitoring and Diagnosis System for a Space-Based Astrometry Mission", booktitle = "Proc. 16th International Conference on Database and Expert Systems Applications, Copenhagen, Denmark", pages = "920-929", year = "2005", }Algebraic Translation and Optimization of (Nested) XPath Expressions
Year: 2005 Journal: Proc. Dutch Belgian Database Day, Amsterdam, Netherlands BibTeX: @InProceedings{Brantner:2005:ATO, author = "Matthias Brantner and Carl-Christian Kanne and Sven Helmer and Guido Moerkotte", title = "Algebraic Translation and Optimization of (Nested) XPath Expressions", booktitle = "Proc. Dutch Belgian Database Day", year = "2005", }Main Memory Implementations for Binary Grouping
Year: 2005 Journal: 3rd Int. XML Database Symposium (XSym 2005), Trondheim, Norway Download: paper BibTeX: @InProceedings{May:2005:MMI, author = "Norman May and Guido Moerkotte", title = "Main Memory Implementations for Binary Grouping", booktitle = "Proc. 3rd Int. XML Database Symposium (XSym 2005), Trondheim, Norway", year = "2005", pages = "162--176", }Abstract: An increasing number of applications depend on efficient storage and analysis features for XML data. Hence, query optimization and efficient evaluation techniques for the emerging XQuery standard become more and more important. Many XQuery queries require nested expressions. Unnesting them often introduces binary grouping. We introduce several algorithms implementing binary grouping and analyze their time and space complexity. Experiments demonstrate their performance.
Main Memory Implementations for Binary Grouping
Year: 2005 Journal: Technical Report Download: technical report, Library Server BibTeX: @TechReport{May:2005:MMITR, author = "Norman May and Guido Moerkotte", title = "Main Memory Implementations for Binary Grouping", institution = "University of Mannheim, Department for Mathematics and Computer Science", year = "2005", type = "Technical Report", number = "TR-2005-007" }Abstract: An increasing number of applications depend on efficient storage and analysis features for XML data. Hence, query optimization and efficient evaluation techniques for the emerging XQuery standard become more and more important. Many XQuery queries require nested expressions. Unnesting them often introduces binary grouping. We introduce several algorithms implementing binary grouping and analyze their time and space complexity. Experiments demonstrate their performance.
Cost-Sensitive Reordering of Navigational Primitives
Year: 2005 Journal: Proc. 24th ACM SIGMOD -SIGACT-SIGART Symposium on Principles of Database Systems, Baltimore, Maryland Download: paper BibTeX: @InProceedings{Kanne:2005:CSR, author = "Carl-Christian Kanne and Matthias Brantner and Guido Moerkotte", title = "Cost-Sensitive Reordering of Navigational Primitives", booktitle = "Proc. 24th ACM SIGMOD, Baltimore, Maryland", year = "2005", pages = "742--753", }Abstract: We present a method to evaluate path queries based on the novel concept of partial path instances. Our method (1) maximizes performance by means of sequential scans or asynchronous I/O, (2) does not require a special storage format, (3) relies on simple navigational primitives on trees, and (4) can be complemented by existing logical and physical optimizations such as duplicate elimination, duplicate prevention and path rewriting. We use a physical algebra which separates those navigation operations that require I/O from those that do not. All I/O operations necessary for the evaluation of a path are isolated in a single operator, which may employ efficient I/O scheduling strategies such as sequential scans or asynchronous I/O. Performance results for queries from the XMark benchmark show that reordering the navigation operations can increase performance up to a factor of four.
Full-fledged Algebraic XPath Processing in Natix
Year: 2005 Journal: Proc. 21. ICDE Conference, Tokyo, Japan Download: paper, prototype implementation BibTeX: @InProceedings{Brantner:2005:FAX, author = "Matthias Brantner and Carl-Christian Kanne and Sven Helmer and Guido Moerkotte", title = "Full-fledged Algebraic XPath Processing in Natix", booktitle = "Proc. 21. ICDE Conference, Tokyo, Japan", pages = "705--716", year = "2005", }Abstract: We present the first complete translation of XPath into an algebra, paving the way for a comprehensive, state-of-the-art XPath (and later on, XQuery) compiler based on algebraic optimization techniques. Our translation includes all XPath features such as nested expressions, position-based predicates and node-set functions. The translated algebraic expressions can be executed using the proven, scalable, iterator-based approach, as we demonstrate in form of a corresponding physical algebra in our native XML DBMS Natix. A first glance at performance results shows that even without further optimization of the expressions, we provide a competitive evaluation technique for XPath queries.
On the Optimal Ordering of Maps and Selections under Factorization
Year: 2005 Journal: Proc. ICDE Conference, Tokyo, Japan Download: paper BibTeX: @InProceedings{Neumann:2004:OOM, author = "Thomas Neumann, Sven Helmer and Guido Moerkotte", title = "On the Optimal Ordering of Maps and Selections under Factorization", booktitle = "Proc. ICDE Conference, Tokyo, Japan", pages = "490--501", year = "2005", }Abstract: The query optimizer of a database system is confronted with two aspects when handling user-defined functions (UDFs) in query predicates: the vast differences in evaluation costs between UDFs (and other functions) and multiple calls of the same (expensive) UDF. The former is dealt with by ordering the evaluation of the predicates optimally, the latter by identifying common subexpressions and thereby avoiding costly recomputation. Current approaches order $n$ predicates optimally (neglecting factorization) in $O(n \log n)$. Their result may deviate significantly from the optimal solution under factorization. We formalize the problem of finding optimal orderings under factorization and prove that it is NP-hard. Furthermore, we show how to improve on the run time of the brute-force algorithm (which computes all possible orderings) by presenting different enhanced algorithms. Although in the worst case these algorithms obviously still behave exponentially, our experiments demonstrate that for real-life examples their performance is much better.
:: 2004
Index Structures for Fuzzy Object-Oriented Database Systems.
Year: 2004 Journal: In Advances in Fuzzy Object-Oriented Databases (edited by Zongmin Ma), Idea Group Publishing, Hershey, 2004: 206-240 An Efficient Framework for Order Optimization
Year: 2004 Journal: Proc. ICDE Conference, Boston, USA Download: paper, technical report BibTeX: @InProceedings{Neumann:2004:EFO, author = "Thomas Neumann and Guido Moerkotte", title = "An Efficient Framework for Order Optimization", booktitle = "Proc. ICDE Conference, Boston, USA", pages = "461--472", year = "2004", }Abstract: Since the introduction of cost-based query optimization, the performance-critical role of interesting orders has been recognized. Some algebraic operators change interesting orders (e.g. sort and select), while others exploit interesting orders (e.g. merge join). The two operations performed by any query optimizer during plan generation are 1) computing the resulting order given an input order and an algebraic operator and 2) determining the compatibility between a given input order and the required order a given algebraic operator can beneficially exploit. Since these two operations are called millions of times during plan generation, they are highly performance-critical. The third crucial parameter is the space requirement for annotating every plan node with its output order.Lately, a powerful framework for reasoning about orders has been developed, which is based on functional dependencies. Within this framework, the current state-of-the-art algorithms for implementing the above operations both have a lower bound time requirement of Omega(n), where n is the number of functional dependencies involved. Further, the lower bound for the space requirement for every plan node is Omega(n).We improve these bounds by new algorithms with upper time bounds O(1). That is, our algorithms for both operations work in constant time during plan generation, after a one-time preparation step. Further, the upper bound for the space requirement for plan nodes is O(1) for our approach. Besides, our algorithm reduces the search space by detecting and ignoring irrelevant orderings. Experimental results with a full fledged query optimizer show that our approach significantly reduces the total time needed for plan generation. As a corollary of our experiments, it follows that the time spent for order processing is a non-neglectable part of plan generation.
Nested Queries and Quantifiers in an Ordered Context
Year: 2004 Journal: Proc. ICDE Conference, Boston, USA Download: paper BibTeX: @InProceedings{May:2004:NQQ, author = "Norman May and Sven Helmer and Guido Moerkotte", title = "Nested Queries and Quantifiers in an Ordered Context", booktitle = "Proc. ICDE Conference, Boston, USA", pages = "239--250", year = "2004", }Abstract: We present algebraic equivalences that allow to unnest nested algebraic expressions for order-preserving algebraic operators. We illustrate how these equivalences can be applied successfully to unnest nested queries given in the XQuery language. Measurements illustrate the performance gains possible by unnesting.
Evaluating Lock-based Protocols for Cooperation on XML Documents
Year: 2004 Journal: SIGMOD Record Download: sigrec04_sym.pdf BibTeX: @Article{Helmer:2004:ELP, author = "Sven Helmer and Carl-Christian Kanne and Guido Moerkotte", title = "Evaluating Lock-based Protocols for Cooperation on XML Documents", journal = "SIGMOD Record", volume = "33", number = "1", pages = "58--63", year = "2004", month = mar, }Abstract: We discuss four different core protocols for synchronizing access to and modi cations of XML document collections. These core protocols synchronize structure traversals and modi cations. They are meant to be integrated into a native XML base management System (XBMS) and are based on two phase locking. We also demonstrate the different degrees of cooperation that are possible with these protocols by various experimental results. Furthermore, we also discuss extensions of these core protocols to full- edged protocols. Further, we show how to achieve a higher degree of concurrency by exploiting the semantics expressed in Document Type De nitions (DTDs).
XQuery Processing in Natix with an Emphasis on Join Ordering
Year: 2004 Journal: First International Workshop on XQuery Implementation, Experience and Perspectives (XIME-P 2004) Download: paper BibTeX: @InProceedings{May:2004:XQN, author = "Norman May and Sven Helmer and Carl-Christian Kanne and Guido Moerkotte", title = "XQuery Processing in Natix with an Emphasis on Join Ordering", booktitle = "First International Workshop on XQuery Implementation, Experience and Perspectives (XIME-P 2004)", year = "2004", month = jun, pages = "49--54", }Abstract: We give an overview on how XQuery processing works in our native XML database system Natix. After a brief description of the query compiler we focus on the aspect of join ordering when generating query execution plans. Here we show that better plans can be found when extending the search space of the plan generator.
Engineering a New Abstraction Layer to Optimize the HEP Analysis Process
Year: 2004 Journal: IEEE, Transaction in Nuclear Sciences Journal BibTeX: @Article{Amaral:2004:ENA, author = "V. Amaral and S. Helmer and G. Moerkotte", title = "Engineering a New Abstraction Layer to Optimize the HEP Analysis Process", journal = "IEEE, Transaction in Nuclear Sciences Journal", volume = "51", number = "4", pages = "1441--1448", year = "2004", month = aug, }Abstract: We are observing a growing complexity in the interfaces of the analysis tools. As a consequence, end users have to master programming, understand complex frameworks and data storage details before achieving the physics goals. This reduces significantly the efficiency of the analysis process. In order to tackle this problem in analysis we propose to introduce a new abstraction layer between the end users and the current interfaces. We go further in our approach by introducing Pheasant QL, the first proposal for a declarative domain specific visual query language into this domain. Through our solution we can express complex decay queries by means of visual operators with reduced programming efforts, abstracting the storage and optimization details, and reducing the need to deal with the normal analysis framework and physical storage intricacies. In our communication we will describe the methodology we are following to design and implement the new abstraction layer. We will also describe our language in an informal manner in terms of syntax, semantics, and example queries.
PHEASANT: A PHysicist's EAsy ANalysis Tool
Year: 2004 Journal: Springer Verlag, Lecture Notes in Artificial Intelligence Download: SpringerLink BibTeX: @InProceedings{Amaral:2004:PHE , author = "V. Amaral and S. Helmer and G. Moerkotte", title = "PHEASANT: A PHysicist's EAsy ANalysis Tool", booktitle = "Springer Verlag, Lecture Notes in Artificial Inteligence", year = "2004", pages = "229--242" }Abstract: We present Pheasant, a framework for the analysis process in High Energy Physics (HEP). It uses PheasantQL, the first domain specific visual query language for HEP analysis. This allows us to express complex decay queries easily with no programming efforts, abstracting the storage and optimization details. The currently existing tools overburden users with many complex details and intricacies. A small scale assessment conducted with a prototype demonstrates the effectiveness of our technique.
Timestamp-based Protocols for Synchronizing Access on XML Documents
Year: 2004 Journal: Proc. DEXA Conference, Zaragoza, Spain BibTeX: @InProceedings{Helmer:2004:TBP, author = "Sven Helmer and Carl-Christian Kanne and Guido Moerkotte", title = "Timestamp-based Protocols for Synchronizing Access on XML Documents", booktitle = "Proc. DEXA Conference, Zaragoza, Spain", pages = "230--234", year = "2004", }A Combined Framework for Grouping and Order Optimization
Year: 2004 Journal: Proc. VLDB Conference, Toronto, Canada Download: paper BibTeX: @InProceedings{Neumann:2004:CFG, author = "Thomas Neumann and Guido Moerkotte", title = "A Combined Framework for Grouping and Order Optimization", booktitle = "Proc. VLDB Conference, Toronto, Canada", pages = "960--971", year = "2004", }Abstract: Since the introduction of cost-based query optimization by Selinger et al. in their seminal paper, the performance-critical role of interesting orders has been recognized. Some algebraic operators change interesting orders (e.g. sort and select), while others exploit them (e.g. merge join). Likewise, Wang and Cherniack (VLDB 2003) showed that existing groupings should be exploited to avoid redundant grouping operations. Ideally, the reasoning about interesting orderings and groupings should be integrated into one framework. So far, no complete, correct, and efficient algorithm for ordering and grouping inference has been proposed. We fill this gap by proposing a general two-phase approach that efficiently integrates the reasoning about orderings and groupings. Our experimental results show that with a modest increase of the time and space requirements of the preprocessing phase both orderings and groupings can be handled at the same time. More importantly, there is no additional cost for the second phase during which the plan generator changes and exploits orderings and groupings by adding operators to subplans.
:: 2003
A Study of Four Index Structures for Set-Valued Attributes of Low Cardinality
Year: 2003 Journal: VLDB Journal Download: technical report BibTeX: @Article{Helmer:2003:SFI, author = "Sven Helmer and Guido Moerkotte", title = "A Study of Four Index Structures for Set-Valued Attributes of Low Cardinality", journal = "VLDB Journal", volume = "12", number = "3", pages = "244--261", year = "2003", }Abstract: We review and study the performance of four different index structures for indexing set-valued attributes designed to speed up set equality, subset and superset queries. All index structures are based on traditional techniques, namely signatures and inverted files. More specifically, we consider sequential signature files, signature trees, extendible signature hashing, and a B-tree based implementation of inverted lists. The latter is refined by a compression scheme in order to keep space requirements within acceptable bounds. The performance study is based on real implementations subjected to a benchmark accounting for different set sizes, domain sizes, and data distributions (uniform and skewed).
Estimating the Output Cardinality of Partial Preaggregation with a Measure of Clusteredness
Year: 2003 Journal: Proc. 29th Int. Conf. on Very Large Data Bases (VLDB), Berlin, Germany Download: vldb03.pdf BibTeX: @InProceedings{Helmer:2003:EOC, author = "Sven Helmer and Thomas Neumann and Guido Moerkotte", title = "Estimating the Output Cardinality of Partial Preaggregation with a Measure of Clusteredness", booktitle = "Proc. 29th Int. Conf. on Very Large Data Bases (VLDB), Berlin, Germany", pages = "656--667", year = "2003", }Abstract: We introduce a new parameter, the clusteredness of data, and show how it can be used for estimating the output cardinality of a partial preaggregation operator. This provides the query optimizer with an important piece of information for deciding whether the application of partial preaggregation is bene cial. Experimental results are very promising, due to the high accuracy of the cardinality estimation based on our measure of clusteredness.
Three Cases for Query Decorrelation in XQuery
Year: 2003 Journal: 1st Int. XML Database Symposium (XSYM), Berlin, Germany Download: unnesting_xmlsym03.pdf BibTeX: @InProceedings{May:2003:TCQ, author = "Norman May and Sven Helmer and Guido Moerkotte", title = "Three Cases for Query Decorrelation in XQuery", booktitle = "1st Int. XML Database Symposium (XSYM), Berlin, Germany", pages = "70--84", year = "2003", }Abstract: We present algebraic equivalences that allow to unnest nested algebraic expressions for order-preserving algebraic operators. We illustrate how these equivalences can be applied successfully to unnest nested queries given in the XQuery language. Measurements illustrate the performance gains possible our approach.
Lock-based Protocols for Cooperation on XML Documents
Year: 2003 Journal: Proc. 14th Int. Workshop on Database and Expert Systems Applications (at DEXA Conf.), Prague Download: technical report BibTeX: @Article{Helmer:2003:LPC, author = "Sven Helmer and Carl-Christian Kanne and Guido Moerkotte", title = "Lock-based Protocols for Cooperation on XML Documents", journal = "Proc. 14th Int. Workshop on Database and Expert Systems Applications (at DEXA Conf.), Prague", pages = "230-234", year = "2003", }Abstract: The eXtensible Markup Language (XML) is well accepted in several different Web application areas. As soon as many users and applications work concurrently on the same collection of XML documents - e.g. on an XML database via a Web interface - isolating accesses and modifications of different transactions becomes an important issue. We discuss four different core protocols for synchronizing access to and modifications of XML document collections. These core protocols synchronize structure traversals and modifications. They are meant to be integrated into a native XML base management System (XBMS) and are based on two phase locking. We also demonstrate the different degrees of cooperation that are possible with these protocols by various experimental results. Furthermore, we also discuss extensions of these core protocols to full-fledged protocols. Further, we show how to achieve a higher degree of concurrency by exploiting the semantics expressed in Document Type Definitions (DTDs).
Quantifiers in XQuery
Year: 2003 Journal: Proc. 4th Int. Conf. on Web Information Systems Engineering (WISE), Rome, Italy Download: unnesting_wise03.pdf BibTeX: @Article{May:2003:QIX, author = "Norman May and Sven Helmer and Guido Moerkotte", title = "Quantifiers in XQuery", journal = "Proc. 4th Int. Conf. on Web Information Systems Engineering (WISE), Rome, Italy", pages = "313--316", year = "2003", }Abstract: We present algebraic equivalences that allow to unnest nested algebraic expressions containing quantifiers for order-preserving algebraic operators. We illustrate how these equivalences can be applied successfully to unnest nested queries formulated in XQuery. Measurements illustrate the performance gains possible by unnesting.
Nested Queries and Quantifiers in an Ordered Context
Year: 2003 Journal: Technical Report Download: technical report, Library Server BibTeX: @TechReport{May:2004:NQQTR, author = "Norman May and Sven Helmer and Guido Moerkotte", title = "Nested Queries and Quantifiers in an Ordered Context", institution = "University of Mannheim, Department for Mathematics and Computer Science", year = "2003", type = "Technical Report", number = "TR-2003-002" }Abstract: We present algebraic equivalences that allow to unnest nested algebraic expressions for order-preserving algebraic operators. We illustrate how these equivalences can be applied successfully to unnest nested queries given in the XQuery language. Measurements illustrate the performance gains possible by unnesting.
A Robust Scheme for Multilevel Extendible Hashing
Year: 2003 Journal: Proc. 18th International Symposium on Computer and Information Sciences (ISCIS), Antalya, Turkey Download: technical report BibTeX: @Article{Helmer:2003:RSM, author = "Sven Helmer and Thomas Neumann and Guido Moerkotte", title = "A Robust Scheme for Multilevel Extendible Hashing", journal = "Proc. 18th International Symposium on Computer and Information Sciences (ISCIS), Antalya, Turkey", pages = "218--225", year = "2003", }Abstract: Dynamic hashing, while surpassing other access methods for uniformly distributed data, usually performs badly for non-uniformly distributed data. We propose a robust scheme for multi-level extendible hashing allowing efficient processing of skewed data as well as uniformly distributed data. In order to test our access method we implemented it and compared it to several existing hashing schemes. The results of the experimental evaluation demonstrate the superiority of our approach in both index size and performance.
Evaluating Different Approaches for Indexing Fuzzy Sets
Year: 2003 Journal: Fuzzy Sets and Systems Download: link to electronic edition BibTeX: @Article{Helmer:2003:EDA, author = "Sven Helmer", title = "Evaluating Different Approaches for Indexing Fuzzy Sets", journal = "Fuzzy Sets and Systems", year = 2003, volume = 140, number = 1, pages = "167--182", month = "November" }Abstract: Providing efficient query processing in database systems is one step towards gaining acceptance of such systems by end users. We propose several techniques for indexing fuzzy sets in databases to improve the query evaluation performance. Three of the presented access methods are based on superimposed coding, while the fourth relies on inverted files. The efficiency of these techniques was evaluated experimentally. We present results from these experiments, which clearly show the superiority of the inverted files.
Book Review of: Recent issues on fuzzy databases
Year: 2003 Journal: Fuzzy Sets and Systems Download: link to electronic edition BibTeX: @Article{Helmer:2003:BRR, author = "Sven Helmer", title = "Book Review: Recent issues on fuzzy databases", journal = "Fuzzy Sets and Systems", year = 2003, volume = 140, number = 1, pages = "229--230", month = "November" }A Domain Specific Visual Query Language for the High Energy Physics Environment
Year: 2003 Journal: 3rd Workshop on Domain\-Specific Modeling, An OOPSLA 2003 Workshop, Anaheim, CA, USA Download: dsvql_oopsla2003.pdf BibTeX: @InProceedings{Amaral:2003:DSV, author = {V. Amaral and S. Helmer and G. Moerkotte}, editor = {Juha-Pekka Tolvanen and Jeff Gray and Matti Rossi}, title = "A Domain Specific Visual Query Language for the High Energy Physics Environment", booktitle = {3rd Workshop on Domain\-Specific Modeling, An OOPSLA 2003 Workshop, Anaheim, CA, USA}, publisher = {Jyv{\"a}skyl{\"a} University Printing House, Finland}, month = {October}, year = {2003}, isbn = {951-39-1582-4}, pages = {9-16} }Abstract: This paper presents Pheasant, a framework for analyzing data collected during high energy physics experiments. Our approach features a visual query language especially adapted to the needs of the physicists analyzing the data. We also provide interfaces to the currently applied tools, so that users can still do the same work as before, but on a higher level of abstraction. This helps users to concentrate on their actual job by hiding the intricacies. Although Pheasant is domain-specific, we introduce enough extensibility to enable it to meet future demands.
Indexstrukturen für XML
Year: 2003 Download: http://www.dpunkt.de/buch/3-89864-189-9.html BibTeX: @InBook{Helmer:2003:IXM, author = "S. Helmer and G. Moerkotte", title = "Web & Datenbanken", chapter = "Indexstrukturen für XML", pages = "217--248", publisher = "dpunkt-Verlag", year = "2003", }Abstract: table of contents
Constructing Optimal Bushy Trees Possibly Containing Cross Products for Order-Preserving Joins is in P
Year: 2003 Download: MA-03-12.pdf BibTeX: @TechReport{Moerkotte:2003:COB, author = "G. Moerkotte", title = "Constructing Optimal Bushy Trees Possibly Containing Cross Products for Order-Preserving Joins is in P", year = "2003", }Abstract: One of the main features of XQuery compared to traditional query languages like SQL, is that it preserves the input order - unless specified otherwise. As a consequence, order-preserving algebraic operators are needed to capture the semantics of XQuery correctly. One important algebraic operator is the order-preserving join. The order-preserving join is associative but, in contrast to the traditional join operator, not commutative. Since join ordering (i.e. finding the optimal execution plan for a given set of join operators) has been an important topic of query optimization for SQL, it is expected that it will also play a major role in optimizing XQuery. The search space for ordering traditional joins is exponential in size. Although the lack of commutativity reduces the search space for ordering order-preserving joins, we show that it is still exponential. This raises the question whether the join ordering problem is also NP-hard, as in the traditional setting. We answer this question by introducing the first polynomial algorithm that produces optimal bushy trees possibly containing cross products.
XML-Datenbanken und ihre Anwendung
Year: 2003 Journal: it - Information Technology Download: it0303_137.pdf BibTeX: @Article{Helmer:2003:XDA, author = {Sven Helmer and Carl-Christian Kanne and Guido Moerkotte}, title = {XML-Datenbanksysteme und ihre Anwendung.}, journal = {it - Information Technology}, volume = {45}, number = {3}, year = {2003}, pages = {137-142}, }Abstract: Due to its fast proliferation, the amount of XML data is steadily increasing and XML-based applications become more and more complex. This requires effective and efficient methods to manage large amounts of persistent XML. We identify the most common requirements for persistence solutions and discuss alternative approaches to store XML. Our conclusion will be that only database management systems specifically designed for XML fulfill the requirements. Last, we give an overview of one such system called Natix.
A Visual Query Language for HEP Analysis
Year: 2003 Journal: IEEE Nuclear Sciences Symposium NSS, Portland, Oregon, USA Download: nss1-2003.pdf BibTeX: @Article{Amaral:2003:VQL, author = {V. Amaral and S. Helmer and G. Moerkotte}, title = "A Visual Query Language for HEP Analysis", journal = {IEEE Nuclear Sciences Symposium NSS, Portland, Oregon, USA}, year = {2003}, }Abstract: Pheasant is the first proposal for a declarative domain specific visual query language for HEP data analysis. It has been designed based on our experience dealing with Hera-B event data and query patterns. Its main goal is to allow the physicist to describe the decay selection queries by means of visual operators, to be run against the experiments' existing storage bases and analysis frameworks. Our visual language aims to be a simple-to-use tool with which a user can express complex decay queries in an easy way with reduced programming efforts. Indeed, the user does not have to deal with intricacies like physical storage details. In our communication we will describe how we determined the visual primitives by looking at the query patterns. We will also describe our language in an informal manner in terms of syntax, semantics, and example queries.
Designing and Implementing a New Abstraction Layer to Optimize the HEP Analysis Process
Year: 2003 Journal: IEEE Nuclear Sciences Symposium NSS, Portland, Oregon, USA Download: nss2-2003.pdf BibTeX: @Article{Amaral:2003:DIN, author = {V. Amaral and S. Helmer and G. Moerkotte}, title = "Designing and Implementing a New Abstraction Layer to Optimize the HEP Analysis Process", journal = {IEEE Nuclear Sciences Symposium NSS, Portland, Oregon, USA}, year = {2003}, }Abstract: Presently, in the new HEP experiments we are observing a growing complexity in the interfaces of the analysis tools. As a consequence, end users have to master programming, understand complex frameworks and data storage details before achieving the physics goals. This reduces significantly the efficiency of the analysis process. In order to tackle this problem we propose introducing a new abstraction layer between the end users and the current interfaces. We go further in our approach by proposing to perform the mass event selection by the usage of a visual language that expresses the physics analysis queries in a declarative way. Through our solution we can express complex decay queries easily with no programming efforts, abstracting the storage and optimization details, and reducing the need to deal with the normal framework intricacies. We describe the methodology used designing and implementing the new abstraction layers.
:: 2002
Anatomy of a native XML base management system
Year: 2002 Journal: VLDB Journal Download: natixvldbj02.ps, technical report BibTeX: @Article{Fiebig:2002:ANX, author = "Thorsten Fiebig and Sven Helmer and Carl-Christian Kanne and Guido Moerkotte and Julia Neumann and Robert Schiele and Till Westmann", title = "Anatomy of a native XML base management system", journal = "VLDB Journal", volume = "11", number = "4", pages = "292--314", year = "2002", }Abstract: Several alternatives to manage large XML document collections exist, ranging from file systems over relational or other database systems to specifically tailored XML base management systems. In this paper we give a tour of Natix, a database management system designed from scratch for storing and processing XML data. Contrary to the common belief that management of XML data is just another application for traditional databases like relational systems, we illustrate how almost every component in a database system is affected in terms of adequacy and performance. We show how to design and optimize areas such as storage, transaction management - comprising recovery and multi-user synchronization - as well as query processing for XML.
Incorporating XSL Processing into Database Engines
Year: 2002 Journal: Proc. 28th Int. Conf. on Very Large Data Bases (VLDB), Hong Kong, China Download: vldb02xslt.pdf BibTeX: @Article{Moerkotte:2002:IXP, author = "Guido Moerkotte", title = "Incorporating XSL Processing into Database Engines", journal = "Proc. 28th Int. Conf. on Very Large Data Bases (VLDB), Hong Kong, China", pages = "107--118", year = "2002", }Abstract: The two observations that 1) many XML documents are stored in a database or generated from data stored in a database and 2) processing these documents with XSL stylesheet processors is an important, often recurring task justify a closer look at the current situation. Typically, the XML document is retrieved or constructed from the database, exported, parsed, and then processed by a special XSL processor. This cumbersome process clearly sets the goal to incorporate XSL stylesheet processing into the database engine. We describe one way to reach this goal by translating XSL stylesheets into algebraic expressions. Further, we present algorithms to optimize the template rule selection process and the algebraic expression resulting from the translation. Along the way, we present several undecidability results hinting at the complexity of the problem on hand
Optimized Translation of XPath into Algebraic Expressions Parameterized by Programs Containing Navigational Primitives
Year: 2002 Journal: Proc. 3rd Int. Conf. on Web Information Systems Engineering (WISE), Singapore Download: wise2002.ps, technical report BibTeX: @Article{Helmer:2002:OTX, author = "Sven Helmer and Carl-Christian Kanne and Guido Moerkotte", title = "Optimized Translation of XPath into Algebraic Expressions Parameterized by Programs Containing Navigational Primitives", journal = "Proc. 3rd Int. Conf. on Web Information Systems Engineering (WISE), Singapore", pages = "215--224", year = "2002", }Abstract: We propose a new approach for the efficient evaluation of XPath expressions. This is important, since XPath is not only used as a simple, stand-alone query language, but is also an essential ingredient of XQuery and XSLT. The main idea of our approach is to translate XPath into algebraic expressions parameterized with programs. These programs are mainly built from navigational primitives like accessing the first child or the next sibling. The goals of the approach are 1. to enable pipelined evaluation, 2. to avoid producing duplicate (intermediate) result nodes, 3. to visit as few document nodes as possible, and 4. to avoid visiting nodes more than once. This improves the existing approaches, because our method is highly efficient.
Compiling Away Set Containment and Intersection Joins
Year: 2002 Download: TR-02-004.pdf BibTeX: @TechReport{Helmer:2002:CAS, author = "Sven Helmer and Guido Moerkotte", title = "Compiling Away Set Containment and Intersection Joins", institution = "University of Mannheim", year = "2002" }Abstract: We investigate the effect of query rewriting on joins involving set-valued attributes in object-relational database management systems. We show that by unnesting set-valued attributes (that are stored in an internal nested representation) prior to the actual set containment or intersection join we can improve the performance of query evaluation by an order of magnitude. By giving example query evaluation plans we show the increased possibilities for the query optimizer.
Natix: A Technology Overview
Year: 2002 Journal: Proc. Web, Web-Services, and Database Systems, NODe 2002 Web and Database-Related Workshops Download: natixnod2002.ps BibTeX: @Article{Fiebig:2002:NTO, author = "Thorsten Fiebig and Sven Helmer and Carl-Christian Kanne and Guido Moerkotte and Julia Neumann and Robert Schiele and Till Westmann", title = "Natix: A Technology Overview", journal = "Proc. Web, Web-Services, and Database Systems, NODe 2002 Web and Database-Related Workshops" pages = "12--33", year = "2002" }Abstract: Several alternatives to manage large XML document collections exist, ranging from file systems over relational or other database systems to specifically tailored XML base management systems. In this paper we review Natix, a database management system designed from scratch for storing and processing XML data. Contrary to the common belief that management of XML data is just another application for traditional databases like relational systems, we indicate how almost every component in a database system is affected in terms of adequacy and performance. We show what kind of problems have to be tackled when designing and optimizing areas such as storage, transaction management comprising recovery and multi-user synchronization as well as query processing for XML.
Early Grouping Gets the Skew
Year: 2002 Download: TR-02-009.pdf BibTeX: @TechReport{Helmer:2002:EGG, author = "Sven Helmer and Thomas Neumann and Guido Moerkotte", title = "Early Grouping Gets the Skew", institution = "University of Mannheim", year = "2002" }Abstract: We propose a new algorithm for external grouping with large results. Our approach handles skewed data gracefully and lowers the amount of random IO on disk considerably. Contrary to existing grouping algorithms, our new algorithm does not require the optimizer to employ complicated or error-prone procedures adjusting the parameters prior to query plan execution. We implemented several variants of our algorithm as well as the most commonly used algorithms for grouping and carried out extensive experiments on both synthetic and real data. The results of these experiments reveal the dominance of our approach. In case of heavily skewed data we outperform the other algorithms by a factor of two.
:: 2001
Algebraic XML Construction in Natix
Year: 2001 Journal: Proc. 2nd Int. Conf. on Web Information Systems Engineering (WISE), Kyoto, Japan Download: wise01.ps BibTeX: @Article{Fiebig:2001:AXC, author = "Thorsten Fiebig and Guido Moerkotte", title = "Algebraic XML Construction in Natix", journal = "Proc. 2nd Int. Conf. on Web Information Systems Engineering (WISE), Kyoto, Japan", pages = "212--221", year = "2001", }Algebraic XML Construction and its Optimization in Natix
Year: 2001 Journal: WWW Journal 4(3) Download: www02.ps BibTeX: @Article{Fiebig:2001:AXO, author = "Thorsten Fiebig and Guido Moerkotte", title = "Algebraic XML Construction and its Optimization in Natix", journal = "WWW Journal 4(3)", pages = "167-187", year = "2001", }Abstract: While using an algebra that acts on sets of variable bindings for evaluating XML queries, the problem of constructing XML from these bindings arises. One approach is to define a powerful operator that is able to perform a complex construction of a representation of the XML result document. The drawback is that such an operator in its generality is hard to implement and disables algebraic optimization since it has to be executed last in the plan. Therefore we suggest to construct XML documents by special query execution plans called construction plans built from simple, easy to implement and efficient operators. The paper proposes four simple algebraic operators needed for XML document construction. Further, we introduce an optimizing translation algorithm of construction clauses into algebraic expressions and briefly point out algebraic optimizations enabled by our approach.
Isolation in XML Bases
Year: 2001 Download: TR-01-015.pdf BibTeX: @TechReport{Helmer:2001:IXB, author = "Sven Helmer and Carl-Christian Kanne and Guido Moerkotte", title = "Isolation in XML Bases", institution = "University of Mannheim", year = "2001" }Abstract: The eXtensible Markup Language (XML) is well accepted in many different application areas. As a consequence, there is an increasing need for persistently storing XML documents. As soon as many users and applications work concurrently on the same collection of XML documents - i.e. an XML base - isolating accesses and modifications of different transactions becomes an important issue. We discuss six different core protocols for synchronizing access to and modifications of XML document collections. These core protocols synchronize structure traversals and modifications. They are meant to be integrated into a native XML base management System (XBMS). Four of the six core protocols are based on two phase locking, one uses time stamps, and the last one uses a novel dynamic commit-ordering approach. The latter two protocols achieve a higher degree of concurrency by a novel implicit representation of multiple versions. We also discuss extensions of these core protocols to full-fledged protocols. Further, we show how the two phase locking based protocols can achieve a higher degree of concurrency by exploiting the semantics expressed in Document Type Definitions (DTDs).
Indexing Fuzzy Data
Year: 2001 Journal: Proc. Joint 9th IFSA World Congress and 20th NAFIPS Int. Conf., Vancouver Download: paper BibTeX: @Article{Helmer:2001:IFD, author = "Sven Helmer", title = "Indexing Fuzzy Data", journal = "Proc. Joint 9th IFSA World Congress and 20th NAFIPS Int. Conf., Vancouver", pages = "2120--2125", year = "2001", }Abstract: Providing efficient query processing in database sys- tems is one step in gaining acceptance of such systems by end users. We propose several techniques for indexing fuzzy sets in databases to improve the query evalu- ation performance. Three of the presented access methods are based on superimposed coding, while the fourth relies on inverted files. The efficiency of these techniques was evaluated experimentally. We present results from these experiments, which clearly show the superiority of the inverted files.
Natix - ein natives XML-DBMS
Year: 2001 Journal: Datenbank Spektrum Download: dbspektrum01.ps BibTeX: @Article{Fiebig:2001:NNX, author = "Thorsten Fiebig and Carl-Christian Kanne and Guido Moerkotte", title = "Natix - ein natives XML-DBMS", journal = "Datenbank Spektrum" number = "1" pages = "5-13" year = "2001", }Abstract: Die rapide Zunahme an XML-Dokumenten verlangt nach einer systematischen Verwaltung von XML-Dokument-Kollektionen. Das in diesem Aufsatz beschriebene Datenbankmanagementsystem Natix ist speziell für die effiziente Verwaltung von XML-Dokumenten entwickelt worden. Wir beschreiben die Architektur von Natix und ausgewählte Module für die Speicherung und Anfrageauswertung.
Studies for Optimisation of Data Analysis Queries for HEP Using HERA-B Commissioning Data
Year: 2001 Journal: Int. Conf. on Computing in High Energy and Nuclear Physics (CHEP'01), Beijin Download: paper BibTeX: @Article{Amaral:2001:SOD, author = "Vasco Amaral and Guido Moerkotte and Antonio Amorim and Sven Helmer", title = "Indexing Fuzzy Data", journal = "Int. Conf. on Computing in High Energy and Nuclear Physics (CHEP'01), Beijing" pages = "154-155" year = "2001", }Abstract: In this paper we present an overview of the ongoing studies to build up a framework that supports the analysis after the reprocessing phase. This framework aims to develop a standard data query language for the HEP community. The related studies have been considering the relational database model as possible approach opposed to the object model. Several optimizing and tunning techniques are being used in technologies like DB2, Oracle and Root, that are simultaneously being evaluated. The experience obtained can be seen as a valuable testbed for the future LHC and simultaneously as interesting input for the development of the GRID.
:: 2000
Efficient Storage of XML Data
Year: 2000 Journal: ICDE, San Diego, USA Download: xmlvldb99.ps BibTeX: @Article{Kanne:2000:ESX, author = "Carl-Christian Kanne and Guido Moerkotte", title = "Efficient Storage of XML Data", journal = "ICDE, San Diego, USA", pages = "198--", year = "2000", }Abstract: We introduce Natix, an efficient, native repository for storing, retrieving and managing tree-structured large objects, preferably XML documents. In contrast to traditional large object (LOB) managers, we do not split at arbitrary byte positions but take the semantics of the underlying tree structure of XML documents into account. Our parameterizable split algorithm dynamically maintains physical records of size smaller than a page which contain sets of connected tree nodes. This not only improves efficiency by clustering subtrees but also facilitates their compact representation. Existing approaches to store XML documents either use flat files or map every single tree node onto a separate physical record. The increased flexibility of our approach results in higher efficiency. Performance measurements validate this claim.
Evaluating Queries on Structure with eXtended Access Support Relations
Year: 2000 Download: webdb00.ps Abstract: There are three common design decisions taken by today's search engines. First, they do not replicate the data found on the Web. Second, they rely on full-text indexes instead. Third, they do not support the querying of document structure. The main reason for the latter is that HTML's ability to express semantics with syntactic structure is very limited. This is di erent for XML since it allows for self-describing data. Due to its flexibility by inventing arbitrary new element and attribute names, XML allows to encode semantics within syntax. The consequence is that search engines for XML should support the querying of structure. In our current work on search engines for XML data on the Web, we want to keep the rst two design decisions of traditional search engines but modify the last one according to the new requirements implied by the necessity to query structure. Since our search engine accepts queries with structural information, a full-text index does not su ce any longer. What is needed is a scalable i ndex structure that allows to answer queries over the structure of XML documents. One possible index structure called eXtended Access Support Relation (XASR) is introduced. Further, we report on a search engine for XML data called Mumpits. Due to its prototypical character, we intentionally kept the design and implementation of Mumpits very simple. Its design is centered around a single XASR and its implementation heavily builds on a commercial relational database management system.
The Implementation and Performance of Compressed Databases
Year: 2000 Download: sigrec00.pdf, technical report Abstract: In this paper, we show how compression can be integrated into a relational database system. Specifically, we describe how the storage manager, the query execution engine, and the query optimizer of a database system can be extended to deal with compressed data. Our main result is that compression can significantly improve the response time of queries if very light-weight compression techniques are used. We will present such light-weight compression techniques and give the results of running the TPC-D benchmark on a so compressed database and a non-compressed database using the AODB database system, an experimental database system that was developed at the Universities of Mannheim and Passau. Our benchmark results demonstrate that compression indeed offers high performance gains (up to 50%) for IOintensive queries and moderate gains for CPU-intensive queries. Compression can, however, also increase the running time of certain update operations. In all, we recommend to extend today s database systems with light weight compression techniques and to make extensive use of this feature.
Optimization and Evaluation of Disjunctive Queries
Year: 2000 Download: DisjQueries_IEEE2000.pdf, technical report Abstract: It is striking that the optimization of disjunctive queries i.e., those which contain at least one or-connective in the query predicate has been vastly neglected in the literature, as well as in commercial systems. In this paper, we propose a novel technique, called bypass processing, for evaluating such disjunctive queries. The bypass processing technique is based on new selection and join operators that produce two output streams: the true-stream with tuples satisfying the selection (join) predicate and the false-stream with tuples not satisfying the corresponding predicate. Splitting the tuple streams in this way enables us to "bypass" costly predicates whenever the "fate" of the corresponding tuple (stream) can be determined without evaluating this predicate. In the paper, we show how to systematically generate bypass evaluation plans utilizing a bottom-up building block approach. We show that our evaluation technique allows to incorporate the standard SQL semantics of null values. For this, we devise two different approaches: One is based on explicitly incorporating three-valued logic into the evaluation plans; the other one relies on two-valued logic by "moving" all negations to atomic conditions of the selection predicate. We describe how to extend an iterator-based query engine to support bypass evaluation with little extra overhead. This query engine was used to quantitatively evaluate the bypass evaluation plans against the traditional evaluation techniques utilizing a CNF- or DNF-based query predicate.
YAXQL: A Powerful and Web-Aware Query Language Supporting Query Reuse and Data Integration
Year: 2000 Download: yaxql.ps Abstract: Since XML seems to be the next wave on the web, several query languages for XML have been proposed. Unfortunately, none of these proposals comes even close to fulfill the requirements for such a query language. We review the requirements for a query language for XML and propose a new query language, YAXQL, which fulfills these requirements.
:: 1999
Efficient Maintenance of Materialized Mediated Views
Year: 1999 Download: sigmod95.ps Abstract: Integrating data and knowledge from multiple heterogeneous sources - like databases, knowledge bases or specific software packages - is often required for answering certain queries. Recently, a powerful framework for defining mediated view spanning multiple knowledge bases by a set of constrained rules (cf. the work of Kanellakis et al.) was proposed. Within this paper, we investigate the materialization of these views by unfolding the view definition and the efficient maintenance of the resulting materialized mediated view in case of updates. Thereby, we consider two kinds of updates: updates to the view and updates to the underlying sources. For each of these two cases several efficient algorithms maintaining materialized mediated views are given. We improve on previous algorithms like the DRed algorithm and introduce a new fixpoint operator Wp which - opposed to the standard fixpoint operator Tp - allows to correctly capture the update's semantics without any recomputation of the materialized view.
Index structures for efficiently accessing fuzzy data including cost models and measurements
Year: 1999 Journal: Fuzzy Sets and Systems Download: link to electronic edition BibTeX: @Article{Helmer:1999:ISE, author = "Birgit Boss and Sven Helmer", title = "Index structures for efficiently accessing fuzzy data including cost models and measurements", journal = "Fuzzy Sets and Systems", year = 1999, volume = 108, number = 1, pages = "11--37", month = "November" }Abstract: Recently, the call for database management systems (DBMS) with uncertainty capabilities is getting stronger. Many of the existing approaches to modeling uncertainty in database management systems are based on the theory of fuzzy sets. High performance is a necessary precondition for the acceptance of such systems by end users. However, performance issues have been quite neglected in research on fuzzy DBMS so far. In this article they are addressed explicitly. We propose new index structures for fuzzy DBMS based on the well known technique of superimposed coding together with detailed cost models. The correctness of the cost models as well as the efficiency of the index structures proposed is validated by a number of measurements on experimental fuzzy databases.
Interactive Biochemical Pathways
Year: 1999 Variations on Grouping and Aggregation
Year: 1999 Download: MA-99-11.ps Abstract: Grouping and aggregation are constantly gaining importance in the evaluation of queries. Hence, there is a need to improve the execution times of queries containing these operations. To achieve this goal we propose to extend the relational algebra by new operators that address different query patterns and that allow for more eAEcient implementation than those currently used. These new operators have to be specific enough to allow improved performance and general enough to be of general use. We present three patterns and the corresponding operators and show how these operators can be used to speed up query evaluation by a factor of two.
Variationsmöglichkeiten bei der Transformation von UML-Basiskonzepten in Dokumenttypdefinitionen
Year: 1999 Download: MA-99-10.ps Abstract: Wir betrachten verschiedene Transformationsmöglichkeiten eines in UML gegebenen konzeptuellen Schemas in eine XML-Dokumenttypdefinition. Dabei konzentrieren wir uns auf die Basiskonstrukte, die in UML zur Verfügung stehen. Die verschiedenen Transformationsmöglichkeiten werden in Hinsicht auf den Erhalt der Semantik des konzeptuellen Schemas bewertet.
:: 1998
Diag-Join: An Opportunistic Join Algorithm for 1:N Relationships
Year: 1998 Download: vldb98.diagjoin.ps, technical report Abstract: Time of creation is one of the predominant (often implicit) clustering strategies found not only in Data Warehouse systems: line items are created together with their corresponding order, objects are created together with their subparts and so on. The newly created data is then appended to the existing data. We present a new join algorithm, called Diag-Join, which exploits time-of-creation clustering. The performance evaluation reveals its superiority over standard join algorithms like nested-loop join and GRACE hash join. We also present an analytical cost model for Diag-Join.
Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing
Year: 1998 Download: vldb98.sma.ps Abstract: Small Materialized Aggregates (SMAs for short) are considered a highly flexible and versatile alternative for materialized data cubes. The basic idea is to compute many aggregate values for small to medium-sized buckets of tuples. These aggregates are then used to speed up query processing. We present the general idea and present an application of SMAs to the TPC-D benchmark. We show that exploiting SMAs for TPC-D Query~1 results in a speed up of two orders of magnitude. Then, we investigate the problem of query processing in the presence of SMAs. Last, we briefly discuss some further tuning possibilities for SMAs.
Optimizing generalized path expressions using full text indexes
Year: 1998 Download: nisj.ps Abstract: Query languages for object bases became enriched by generalized path expressions that allow for attribute and path variables. Optimizing queries containing generalized path expressions attracted some interest. However, many interesting queries require still a full scan over the whole object base. This unbearable situation can be remedied best by utilizing index structures. However, traditional database indexes fail to support generalized path expressions. We propose to use a flexible mix of database and full text indexes in order to evaluate efficiently queries containing generalized path expressions. We introduce an algebraic framework for the optimization of such queries using full text indexes, and we report on a first prototype implementation including some performance results.
Constructing Optimal Bushy Processing Trees for Join Queries is NP-hard (Technical Report of The University of Mannheim)
Year: 1998 Download: MA-96-11.ps Abstract: We show that constructing optimal bushy processing trees for join queries is NP-hard. More specifically, we show that even the construction of optimal bushy trees for computing the cross product for a set of relations is NP-hard.
Efficient Dynamic Programming Algorithms for Ordering Expensive Joins and Selections
Year: 1998 Download: edbt98.dynprog.ps Abstract: The generally accepted optimization heuristics of pushing selections down does not yield optimal plans in the presence of expensive predicates. Therefore, several researchers have proposed algorithms to compute optimal processing trees for queries with expensive predicates. All these approaches are incorrect--with one exception [3]. Our contribution is as follows. We present a formally derived and correct dynamic programming algorithm to compute optimal bushy processing trees for queries with expensive predicates. This algorithm is then enhanced to be able to (1) handle several join algorithms including sort merge with a correct handling of interesting sort orders, to (2) perform predicate splitting, to (3) exploit structural information about the query graph to cut down the search space. Further, we present efficient implementations of the algorithms. More specifically we introduce unique solutions for efficiently computing the cost of the intermediate plans and for saving memory spaceby utilizing bitvector contraction. Our implementations impose no restrictions on the type of query graphs, the shape of processing trees or the class of cost functions. We establish the correctness of our algorithms and derive tight asymptotic bounds on the worst case time and space complexities. We also report on a series of benchmarks showing that queries of sizes which are likely to occur in practice can be optimized over the unconstrained search space in less than a second.
The Evaluation of Content-Based Web Queries
Year: 1998
:: 1997
Evaluation of Main Memory. Join Algorithms for Joins with Set Comparison Predicates
Year: 1997 Download: vldb97.setjoin.ps, technical report Abstract: Current data models like the NF2 model and object-oriented models support set-valued attributes. Hence, it becomes possible to have join predicates based on set comparison. This paper introduces and evaluates several main memory algorithms to evaluate efficiently this kind of join. More specifically, we concentrate on the set equality and the subset predicates.
Index Structures for Databases Containing Data Items with Set-valued Attributes
Year: 1997 Download: MA-97-02.ps Abstract: We introduce two new hash-based index structures to index set-valued attributes. Both are able to support subset and superset queries. Analytical cost models for the new index structures as well as for the two existing index structures, sequential signature file and Russian Doll Tree, are presented and experimentally validated. Using the validated cost model, we express the performance of all four index structures in terms of the performance of the sequential signature file. This allows a direct analytical comparison of their performance. Last, we report on our benchmark results comparing the real performance of all four index structures. We especially investigate their performance for skewed data.
On the Complexity of Generating Optimal Plans with Cross Products
Year: 1997 Download: pods97.ps Abstract: In modern advanced database systems the optimizer is often faced with the problem of finding optimal evaluation strategies for queries involving a large number of joins. Examples are queries generated by deductive database systems and path expressions in object-oriented database systems. The best plan can be found in the very large search space of bushy trees where plans are allowed to contain cross products. A general question arises: For which (sub-) problems can we expect to find polynomial algorithms generating the best plan? We attack this question from both ends of the spectrum. First, we show that we cannot expect to find any polynomial algorithm for any subproblem as long as optimal bushy trees are to be generated. More specifically, we show that the problem is NP-hard independent of the query graph. Second, for the restricted cIass of chain queries, we present two efficient algorithms for the problem of generating left-deep trees possibly containing cross products.
Optimizing Queries with Universal Quantification in Object-Oriented and Object-Relational Databases
Year: 1997 Download: vldb97.universal.ps Abstract: We investigate the optimization and evaluation of queries with universal quantification in the context of the object-oriented and object-relational data models. The queries are classified into 16 categories depending on the variables referenced in the so-called range and quantifier predicates. For the three most important classes we enumerate the known query evaluation plans and devise some new ones. These alternative plans are primarily based on anti-semijoin, division, generalized grouping with count aggregation, and set difference. In order to evaluate the quality of the many different evaluation plans a thorough performance analysis on some sample database configurations was carried out. The quantitative analysis reveals that--if applicable--the anti-semijoin-based plans are superior to all the other alternatives, even if we employ the most sophisticated division algorithms. Furthermore, exploiting object-oriented features, anti-semijoin plans can be derived even when this is not possible in the relationa l context.
Heuristic and Randomized Optimization for the Join Ordering Problem
Year: 1997 Download: joinorderingalgos.ps Abstract: Recent developments in database technology, such as deductive database systems, have given rise to the demand for new, cost-effective optimization techniques for join expressions. In this paper many different algorithms that compute approximative solutions for optimizing join orders are studied since traditional dynamic programming techniques are not appropriate for complex problems. First, two possible solution spaces, the space of left-deep and bushy processing trees, respectively, are evaluated from a statistical point of view. The result is that the common limitation to left-deep processing trees is only advisable for certain join graph types. Basically, optimizers from three classes are analyzed: heuristic, randomized and genetic algorithms. Each one is extensively scrutinized with respect to its working principle and its fitness for the desired application. It turns out that randomized and genetic algorithms are well suited for optimizing join expressions. They generate solutions of high quality within a reasonable running time The benefits of heuristic optimizers, namely the short running time, are often outweighed by merely moderate optimization performance.
Querying Documents in Object Databases
Year: 1997 Download: dilib97.ps Abstract: We consider the problem of storing and accessing documents (SGML and HTML, in particular) using database technology. To specify the database image of documents, we use structuring schemas that consist in grammars annotated with data base programs. To query documents, we introduce an extension of OQL, the ODMG standard query language for object databases. Our extension (named OQL-doc) allows to query documents without a precise knowledge of their structure using in particular generalized path expressions and pattern matching. This allows us to introduce in a declarative language (in the style of SQL or OQL), navigational and information retrieval styles of accessing data. Query processing in the context of documents and path expressions leads to challenging implementation issues. We extend an object algebra with new operators to deal with generalized path expressions. We then consider two essential complementary optimization techniques: we show that almost standard database optimization techniques can be used to answer queries without having to load the entire document into the database. we also consider the interaction of full-text indexes (e.g., inverted files) with standard database collection indexes (e.g., B-trees) that provide important speed-up. The paper is an overview of a research project at INRIA-Rocquencourt.
RAW: A Relational Algebra for the Web
Year: 1997 Download: raw.ps Abstract: The main idea underlying the paper is to extend the relational algebra such that it becomes possible to process queries against the World-Wide Web. These extensions are minor in that we tried to keep them at the domain level. Additionally to the known domains (int, bool,float, string), we introduce three new domains to deal with URLs, html-documents or fragments thereof, and path expressions. Over these domains we define several functions that are accessible from the algebra within the subscripts of the relational operators. The approach allows us to reuse the operators of the relational algebra without major modifications. Indead, the only extensions necessary is the introduction of a map operator. Further, two modifications to the scan and the indexscan are necessary. Finally, the indexscan which has the functionality of a typical meta-search engine is capable of computing a unified rank based on the tuple order provided by the underlying search engines.
:: 1996
Evaluating Queries with Generalized Path Expressions
Year: 1996 Download: sigmod96.ps Abstract: In the past few years, query languages featuring generalized path expressions have been proposed. These languages allow the interrogation of both data and structure. They are powerful and essential for a number of applications. However, until now, their evaluation relied on a rather naive and inefficient algorithm. In this paper, we extend an object algebra with two new operators and present some interesting rewriting techniques for queries featuring generalized path expressions. We also show how a query optimizer can integrate the new techniques.
Indexing a Fuzzy Database Using the Technique of Superimposed Coding - Cost Models and Measurements (Technical Report of The University of Mannheim)
Year: 1996 Download: MA-96-02.ps Abstract: Recently, new applications have emerged that require database management systems with uncertainty capabilities. Many of the existing approaches to modeling uncertainty in database management systems are based on the theory of fuzzy sets. High performance is a necessary precondition for the acceptance of such systems by end users. However, performance issues have been quite neglected in research on fuzzy database management systems so far. In this article they are addressed explicitly. We propose new index structures for fuzzy database management systems based on the well known technique of superimposed coding together with detailed cost models. The correctness of the cost models as well as the efficiency of the index structures proposed is validated by a number of measurements on experimental fuzzy databases.
On the Costs of Monitoring and Reorganization of Object Bases for Clustering (new version)
Year: 1996 Download: sigrec96.ps Abstract: Clustering is one of the most effective performance enhancing techniques and many proposals for clustering object bases together with their evaluation proof their effectiveness. All proposals rely on some information which is typically gathered during a monitoring phase. Subsequently, a reorganization schema is applied to achieve good clustering. Nevertheless, there exists little work on the monitoring and the reorganization phase. Further, methods and costs for monitoring and reorganization have neither been proposed nor evaluated. The goal is to fill this gap. More specifically, we report on a clustering tool for object bases supporting all phases from monitoring over database reorganization to the evaluation of the effectiveness of the applied clustering strategy. Performance figures for all phases are given.
Optimal Ordering of Selections and Joins in Acyclic Queries with Expensive Predicates
Year: 1996 Download: RWTH-96-03.ps Abstract: The generally accepted optimization heuristics of pushing selections down does not yield optimal plans in the presence of expensive predicates. Therefore, several researchers have proposed algorithms for the optimal ordering of expensive joins and selections in a query evaluation plan. All of these algorithms have an exponential run time. For a special case, we propose a polynomial algorithm which - in one integrated step - computes the optimal join order and places expensive predicates optimally within the join tree. the special case is characterized by the following statements: only left-deep trees are considered, no cross-products are considered, the cost function has to exhibit the ASI property, and cheap selections are pushed before-hand.
The PARK Semantics for Active Rules
Year: 1996 Download: edbt96.PARK.ps Abstract: Active databases are an important topic of current database research. However, the semantics of the underlying mechanism, event-condition-action rules (or ECA rules for short), is unclear so far. In order to define a clear semantics for sets of active rules, we first derive the requirements such a semantics must fulfill. Since currently no semantics fulfills these requirements, we continue with the definition of the PARK semantics adhering to all requirements. The PARK semantics is a smooth integration of inflationary fixpoint semantics with conflict resolution. Through this approach, the PARK semantics is powerful enough to deal with recursive active rules. Furthermore, the actual conflict resolution strategy is a parameter of the PARK semantics. This guarantees a wide range of applicability.
:: 1995
Konstruktion von Anfrageoptimierern für Objektbanken, (Habilitation)
Year: 1995 Efficient Evaluation of Aggregates on Bulk Types
Year: 1995 Download: RWTH-95-05.ps Abstract: A new method for efficiently evaluating queries with aggregate functions is presented. More specifically, we introduce a class of aggregate queries where traditional query evaluation strategies in general require O(n2) time and space in the size of the (at most two) input relations. For this class of aggregate queries our approach needs at most O(n log n) time and linear space. Further, our approach deals not only with relations but with general bulk types like sets, bags, and lists.
On the Complexity of Generating Optimal Left-Deep Processing Trees with Cross Products
Year: 1995 Download: icdt95.ps Abstract: Producing optimal left-deep trees is known to be NP-complete for general join graphs and a quite complex cost function counting disk accesses for a special block-wise nested-loop join. Independent of any cost function is the dynamic programming approach to join ordering. The number of alternatives this approach generates is known as well. Further, it is known that for some cost functions - those fulfilling the ASI property the problem can be solved in polynomial time for acyclic query graph, i.e., tree queries. Unfortunatly, some cost functions like sort merge could not be treated so far. We do so by a slight detour showing that this cost function (and others too) are optimized if and only if the sum of the intermediate result sizes is minimized. This validates the database folklore that minimizing intermediate result sizes is a good heuristic. Then we show that summarizing the intermediate result sizes has the ASI property. It further motivates us to restrict the subsequent investigations to this cost function called Cout for which we show that the problem remains NP-complete in the general case. Then, we concentrate on the main topic of the paper: the complexity of producing left-deep processing trees possibly containing cross products. Considering cross products is known to possibly result in cheaper plans. More specifically, we show that the problem (LD-X-Star) of generating optimal left-deep processing trees possibly containing cross products is NP-complete for star queries. Further, we give an algorithm for treating star queries which is more efficient than dynamic programming. the NP-completeness of LD-X-Star immediatly implies the NP-completness for the more general tree queries.
Efficient Maintenance of Materialized Mediated Views
Year: 1995 Download: guptabookchapter.ps Abstract: Integrating data and knowledge from multiple heterogeneous sources - like databases, knowledge bases or specific software packages - is often required for answering certain queries. Recently, a powerful framework for defining mediated views spanning multiple knowledge bases by a set of constrained rules was proposed. We investigate the materialization of these views by unfolding the view definition and the efficient maintenance of the resulting materialized mediated view in case of updates. Thereby, we consider two kinds of updates: updates to the view and updates to the underlying sources. For each of these two cases several efficient algorithms maintaining materialized mediated views are given. We improve on previous algorithms like the Dred algorithm and introduce a new fixpoint operator Wp which - opposed to the standard fixpoint operator Tp - allows us to correctly capture the update's semantics without any recomputation of the materialized view.
Bypassing Joins in Disjunctive Queries
Year: 1995 Download: vldb95.bypass.ps, technical report Abstract: The bypass technique, which was formerly restricted to selections only [KMPS94], is extended to join operations. Analogous to the selection case, the join operator may generate two output streams - the join result and its complement - whose subsequent operator sequence is optimizing individually. By extending the bypass technique to joins, several problems have to be solved. An algorithm for exhaustive generation of the search space for bypass plans has to be developed. The search space for bypass plans is quite large. Hence, partial exploration strategies still resulting in sufficiently efficient plans have to be developed. Since the complement of a join can be very large, those cases where the complement can be restricted to the complement of the semijoin have to be detected. We attack all three problems. Especially, we present an algorithm generating the optimal bypass plan and one algorithm producing near optimal plans exploiting the search space only partially. As soon as disjunctions occur, bypassing results in savings. Since the join operator is often more expensive than the selection, the savings for bypassing joins are even higher than those for selections only. We give a quantitative assessment of these savings on the basis of some example queries. Further, we evaluate the performance of the two bypass plan generating algorithms.
Aspects of Object Bases in Scientific Computing
Year: 1995 Download: oocns95.ps Abstract: The paper discusses various requirements of scientific applications.
An experimental study on the complexity of left-deep join ordering problems for cyclic queries
Year: 1995 Download: RWTH-95-04.ps Abstract: Not only in deductive databases, logic programming, and constraint satisfaction problems but also in object bases where each single dot in a path expression corresponds to a join, the optimizer is faced with the problem of ordering large numbers of joins. This might explain the renewed interest in the join ordering problem. Although many join ordering techniques have been invented and benchmarked over the last years, little is known on the actual effectiveness of the developed methods and the cases where they are bound to fail. The problem attacked is the discovery of parameters and their qualitative influence on the complexity of single problem instances and on the effectiveness of join ordering techniques including search procedures, heuristics, and probabilistic algorithms. Thus an extensive analysis of the search space is carried out, with particular emphasis on the existence of phase transitions in this space and on the influence the parameters have on these transitions. Having these parameters on hand serves two important tasks: For a given heuristic, parameter combinations can be identified where it performs nearly optimal and others where it performs badly. Hence, on the one hand, we can be confident about the results of a heuristic for well determined cases and, on the other hand, can avoid the application of a heuristic where it is bound to fail; when benchmarking join ordering heuristics, one had - up to now - to choose a benchmark arbitrarily. Given the parameters which influence the complexity of a join ordering problem it becomes possible to consciously design challenging benchmarks.
Query Optimization Techniques Exploiting Class Hierarchies
Year: 1995 Download: RWTH-95-07.ps Abstract: Since the introduction of object base management systems (OBMS), many query optimization techniques tailored for object query languages have been proposed. They adapt known optimization techniques to the OBMS context, exploit special object-oriented features, or give solutions to problems specific to querying objects. Nonetheless, one of the most prominent features of object models - namely class hierarchies - have so far not been exploited for query optimization. The current paper proposes new optimization techniques for queries referring to classes integrated into a class hierarchy. The techniques are generic in the sense that we do not give a set of algebraic equivalencies the optimizer has to apply, but instead try to provide the reader with a general understanding of how to exploit class hierarchies for query optimization purposes. We give general descriptions of the techniques as well as illustrating examples. Besides yielding considerable savings in terms of execution time, the presented optimization techniques have the additional advantages of being easily implementable and resulting only in a neglectable increase in optimization time.
:: 1994
Classification and Optimization of nested Queries in Object Bases
Year: 1994 Download: Paper BibTeX: @Article {Cluet:1994:CON, author = " Sophie Cluet and Guido Moerkotte", title = "Classification and Optimization of Nested Queries in Object Bases", journal = "BDA 94", year = "1994" }Abstract: Many declarative query languages for object-oriented (oo) databases allow nested subqueries. This paper contains a complete classification of oo nested queries and appropriate unnesting optimization strategies based on algebraic rewriting. We adapt some known relational techniques and introduce new ones that use and are concerned with features specific to object-oriented queries. In particular, we introduce two new and powerful grouping operators which will form the basis for our unnesting techniques.
Object-Oriented Database Management: Applications in Engineering and Computer Science.
Year: 1994 Optimizing Disjunctive Queries with Expensive Predicates
Year: 1994 Download: sigmod94.ps Abstract: In this work we propose and assess a novel technique, called it bypass processing, for optimizing the evaluation of disjunctive queries in object-oriented database systems. In the object-oriented data model, selection predicates often contain terms whose evaluation cost varies tremendously; e.g., the evaluation cost of a user-defined function may be orders of magnitude more expensive than an attribute access (and comparison). The idea of bypass processing consists of avoiding the evaluation of such expensive terms whenever the outcome of the entire selection predicate can already be deduced by testing other, less expensive terms. In order to validate the viability of bypass evaluation, we extend a previously developed optimizer architecture and incorporate three alternative optimization algorithms for generating bypass processing plans and - for comparison purpose - one algorithm for generating conventional query evaluation plans. The optimizer is assessed with respect to running time and quality of the obtained query evaluation plans on a (generated) set of queries.
Indexing Multiple Sets
Year: 1994 Download: vldb94.cgtree.ps Abstract: We examine the performance of B+-tree based index structures for multiple sets as developed in the context of object bases. Index structures for multiple sets can be classified into those that group entries according to their key value and those that group entries according to their set membership. The former are particularly suited for exact match queries on all indexed sets, the latter especially support range queries on a small number of all indexed sets. The goal is to thoroughly evaluate the performance of both grouping strategies. There exist two good reasons for adding a new index structure to the evaluation: The performance potentials of set grouping index structures are not yet fully exploited. Up to now, the database administrator has the choice between the key grouping and the set grouping index structures. If the application profile consists of both, exact match queries and range queries, this is not really a choice. Hence, a more flexible index structure is needed which can be tuned to a given mix containing both, exact match and range queries. These two reasons led us to the development of the CG-tree. The focus of the paper is on introducing the CG-tree and on a thorough analysis of the performance of the CH-index [Kim], H-tree and CG-tree under various conditions.
Autonomous Objects: A Natural Model for Complex Applications (Journal version)
Year: 1994 Download: JIIS.ps Abstract: Object-oriented database systems are emerging as the "next generation" DBMSs for advanced applications, e.g., VLSI design, mechanical CAD/CAM, software engineering, etc. However, a close analysis indicates that the requirements imposed by these application domains cannot be met by an object-oriented model that relies purely on passive objects. In this work we go beyond the conventional single-thread-of-control paradigm of object models and assume a distributed object world that is inhabited by active objects which can autonomously initiate responses to environmental changes. In thepresent work we develop a model of autonomous objects that can cooperate with each other by synchronous or asynchronous message passing. It is shown how events can be incorporated into this model that can be used to relay messages (signals) to an arbitrary number of potentially "interested" objects. We show that this very sparse extension of an object-oriented model gives rise to manyfold high-level features which can be controlled by events. The object-oriented paradigm allows one to isolate the rules according to which events are being raised. This leads to a potentially rather efficient execution model compared to existing relational concepts which are typically globally defined event trigger mechanisms. Furthermore, we spend one section on synchronization aspects of autonomously operating objects; in particular, we show how serializability can be achieved in such a decentralized model.
Function Materialization in Object Bases: Design, Realization and Evaluation (journal version)
Year: 1994 Download: TKDE92.ps, technical report Abstract: View materialization is a well-known optimization technique of relational database systems. In this work we present a similar, yet more powerful optimization concept for object-oriented data models: function materialization. Exploiting the object-oriented paradigm - namely classification, object identity, and encapsulation - facilitates a rather easy incorporation of function materialization into (existing) object-oriented systems. Only those types (classes) whose instances are involved in some materialization are appropriately modified and recompiled - thus leaving the remainder of the object system invariant. Furthermore, the exploitation of encapsulation (information hiding) and object identity provides for additional performance tuning measures which drastically decrease the rematerialization overhead incurred by updates in the object base. First, it allows to cleanly separate the object instances that are irrelevant for the materialized functions from those that are involved in the materialization of some function result and, thus, to penalize only those involved objects upon update. Second, the principle of information hiding facilitates fine-grained control over the invalidation of precomputed results. Based on specifications given by the data type implementor the system can exploit operational semantics to better distinguish between update operations that invalidate a materialized result and those that require no rematerialization. The paper concludes with a quantitative analysis of function materialization based on two sample performance benchmarks obtained from our experimental object base system GOM. Keywords: access support, query optimization, object-oriented data model, view materialization, function materialization
Eine Basis für effiziente Konsistenzprüfung
Year: 1994 Download: KHE-94-7.ps Abstract: Im folgenden Beitrag wird eine Übersetzungstechnik für effiziente Konsistenztests in Datenbanken vorgestellt. Während die meisten Arbeiten auf dem Gebiet der Konsistenzprüfung eine Interpretation der vereinfachten Konsistenzbedingungen zur Laufzeit durchführen, wird hier eine Übersetzungstechnik verwendet, welche deklarativ beschriebene Konsistenzbedingungen in eine erweiterte relationale Algebra übersetzt. Da für die Übersetzung der Konsistenzbedingungen in eine algebraische Darstellung zahlreiche Möglichkeiten existieren, ist eine Optimierung, beispielsweise durch Ausnutzung statistischen Wissens, möglich. Ein weiterer Vorteil des Ansatzes ergibt sich aus den Erweiterungen der Algebra. Typische algebraische Operatoren haben zwei Eingabemengen und nur eine Ausgabemenge. Werden dagegen mehr als eine Ausgabemenge und mehr als zwei Eingabemengen zugelassen, verbessern sich die Kommunikations- und Ausdrucksfähigkeiten der Operatoren. Die Verwendung dieser als Bypass-Technik bezeichneten Methode ermöglicht einerseits schnelle Konsistenztests zur Laufzeit und bildet andererseits die Basis für eine weitere Effizienzsteigerung durch bessere grobkörnige Parallelisierung der resultierenden algebraischen Ausdrücke. Anhand eines Kostenmodells und mittels Messungen auf einer Hauptspeicherdatenbank werden die Vorteile der Bypass-Technik gezeigt.
Aktive und mobile Objekte als Modellierungskonzept für dezentrale Ingenieuranwendungen
Year: 1994 Download: stak-94.ps Abstract: Dieses Papier zeigt, wie schon durch sehr einfache Erweiterungen einer herkömmlichen objektorientierten Datenbankprogrammiersprache um Basiskonzepte zur Modellierung aktiver und mobiler Objekte technische Anwendungen, in denen dezentrale und (geografisch) verteilte Informationsverarbeitung eine bedeutende Rolle spielt, wirkungsvoll unterstützt werden können. Die Ideen werden an einem konkreten Beispiel aus dem Fertigungsbereich demonstriert.
Ausgewählte Kapitel der Implementierung objektorientierter Datenbanksysteme (interner Bericht, Uni Karlsruhe)
Year: 1994 Download: KHE-94-18.ps
:: 1993
Exploiting Consistency Maintenance for Planning
Year: 1993 Nested Queries in Object Bases
Year: 1993 Download: Paper, Technical Report Abstract: Many declarative query languages for object-oriented (oo) databases allow nested subqueries. This paper contains the first algebra which is capable of handling arbitrary nested queries and the first complete classification of oo nested queries and the according unnesting strategies. For unnesting, a two-phase approach is used. The first phase - called dependency-based optimization - transforms queries at the query language level in order to treat common subexpressions and independent subqueries more efficiently. The transformed queries are translated to nested algebraic expressions. These entail nested loop evaluation which may be very inefficient. Hence, the second phase unnests nested algebraic expressions to allow for more efficient evaluation. The paper also discusses the differences between unnesting in the relational and unnesting in the oo context. Since the first phase is rather simple, the paper concentrates on the second phase.
Optimizing Join Expressions (Extended Abstract)
Year: 1993 Partition-Based Clustering in Object Bases: From Theory to Practice
Year: 1993 Download: fodo93.ps Abstract: We classify clustering algorithms into sequence-based techniques - which transform the object net into a linear sequence - and partition-based clustering algorithms. Tsangaris and Naughton [TN91, TN92] have shown that the partition-based techniques are superior. However, their work is based on a single partitioning algorithm, the Kernighan and Lin heuristics, which is not applicable to realistically large object bases because of its high running-time complexity. The contribution of this paper is two-fold: we devise a new class of greedy object graph partitioning algorithms (GGP) whose running-time complexity is moderate while still yielding good quality results. Our extensive quantitative analysis of all well-known pratitioning algorithms indicates that no one algorithm performs superior for all object net characteristics. Therefore, we propose an adaptable clustering strategy according to a multi-dimensional grid: the dimensions correspond to particular characteristics of the object base - given by, e. g. number and size of objects, degree of object sharing - and the grid entries indicate the most suitable clustering algorithm for the particular configuration.
Towards More Flexible Schema Management in Object-Bases
Year: 1993 Download: moerkotte92toward.pdf Autonomous Objects: A Natural Model for Complex Applications (Conference version)
Year: 1993 Download: ngits93.ps A Blackboard Architecture for Query Optimization in Object Bases
Year: 1993 Download: vldb93.blackboard.ps Abstract: Adopting the blackboard architecture from the area of Artificial Intelligence, a novel kind of optimizer enabling two desirable ideas will be proposed. Firstly, using such a well-structured approach backpropagation of the optimized queries allows an evolutionary improvement of (crucial) parts of the optimizer. Secondly, the A* search strategy can be applied to harmonize two contrary properties: lternatives are generated whenever necessary, and straight-forward optimizing is performed whenever possible, however. The generic framework for realizing a blackboard optimizer is proposed first. Then, in order to demonstrate the viability of the new approach, a simple example optimizer is presented. It can be viewed as an incarnation of the generic framework.
Database Design with User-Definable Modelling Concepts
Year: 1993 Basiskonzepte objektorientierter Datenbanksysteme
Year: 1993 Generating Consistent Test Data for a Variable Set of General Consistency Constraints
Year: 1993 Download: vldbj92.pdf Abstract: To address the problem of generating test data for a set of general consistency constraints, we propose a new two-step approach: First the interdependencies between consistency constraints are explored and a generator formula is derived on their basis. During its creation, the user may exert control. In essence, the generator formula contains information to restrict the search for consistent test databases. In the second step, the test database is generated. Here, two different approaches are proposed. The first adapts an already published approach to generating finite models by enhancing it with requirements imposed by test data generation. The second, a new approach, operationalizes the generator formula by translating it into a sequence of operators, and then executes it to construct the test database. For this purpose, we introduce two powerful operators: the generation operator and the test-and-repair operator. This approach also allows for enhancing the generation operators with heuristics for generatin g facts in a goal-directed fashion. It avoids the generation of test data that may contradict the consistency constraints, and limits the search space for the test data. This article concludes with a careful evaluation and comparison of the performance of the two approaches and their variants by describing a number of benchmarks and their results.
Controlled Redundancy in Object Bases: Optimization Methods for Organizational Data Modelling
Year: 1993 Download: hicss93.ps Abstract: We review the definition and then analyze two optimization techniques that were developed for object-oriented databases. Both of these approaches are based on incorporating redundancy into the object base. However, the redundancy is entirely controlled by the system, rather than burdening the user with this task. One technique, called Access Support relations, is based on redundantly maintaining frequently traversed object paths. The other, Function Materialization, is founded on precomputing particularly performance critical functions. In both cases, the redundantly maintained information can be viewed as an index to speed up object base queries. We discuss the exploitation of these optimization approaches on two sample, yet representative applications from the area of organizational data modeling.
Das GOM-Handbuch: Objektorientierte Programmierung und Datenmodellierung
Year: 1993 Download: GOM-Manual.ps Optimizing Join Orders
Year: 1993 Download: MIP9307.ps Abstract: Recent developments in database technology, such as deductive database systems, have given rise to the demand for new, cost-effective optimization techniques for join expressions. In this paper many different algorithms that compute approximative solutions for optimizing join orders are studied since traditional dynamic programming techniques are not appropriate for complex problems. First, two possible solution spaces, the space of left-deep and bushy processing trees, respectively, are evaluated from a statistical point of view. The result is that the common limitation to left-deep processing trees is only advisable for certain join graph types. Basically, optimizers from three classes are analyzed: heuristic, randomized and genetic algorithms. Each one is extensively scrutinized with respect to its working principle and its fitness for the desired application. It turns out that randomized and genetic algorithms are well suited for optimizing join expressions. They generate solutions of high quality within a reasonable running time The benefits of heuristic optimizers, namely the short running time, are often outweighed by merely moderate optimization performance.
Optimizing Disjunctive Queries in Object Bases
Year: 1993 Download: MIP9308.ps Abstract: In this work we propose and assess a novel technique, called bypass processing, for optimizing the evaluation of disjunctive queries in object-oriented database systems. In the object- oriented data model, selection predicates often contain terms whose evaluation cost varies tremendously; e. g., the evaluation cost of a user-defined function may be orders of magnitude more expensive than an attribute access (and comparison). The idea of bypass processing consists of avoiding the evaluation of such expensive terms whenever the outcome of the entire selection predicate can already be deduced by testing other, less expensive terms. In order to validate the viability of bypass evaluation, we extend a previously developed optimizer architecture and incorporate three alternative optimization algorithms for generating bypass processing plans and - for comparison purpose - one algorithm for generating conventional query evaluation plans. The optimizer is assessed with respect to running time and quality of the obtained query evaluation plans on a (generated) set of queries.
Physical Object Management
Year: 1993 Abstract: In this chapter we will discuss a few physical object management techniques that are rooted in the relational context but were thoroughly adapted to the needs and requirements of the objet-oriented model(s). After introducing a sample object-oriented database schema (and an extension thereof) we discuss Access Support Relations, an indexing technique for path expressions. Then, we overview techniques for indexing within type hierarchies and describe the materialization of functions in an object-oriented context. We also provide an overview of techniques for deriving good object clustering and finally sketch a few proposals for storage models for objects.
:: 1992
Consistency Driven Planning
Year: 1992 Download: daisd92.ps Abstract: This paper describes a novel approach to linear planning. The presented algorithm is based on consistency maintaining procedure developed for deductive databases. This allows to handle any range-restricted formula as a goal, and to avoid the well-known problems of linear planning. The kernel of the planner consists of a deductive database system enhanced by a repair mechanism which allows to generate possible changes an inconsistent database must undergo in order to regain consistency. The generated repairs serve two purposes: as they describe possible worlds in which the goal holds, they can be used to reason about these worlds and eliminate those which have undesired properties. Moreover, the repairs can be used for driving the actual planning procedure such that the generated plans lead to the selected possible world.
Multiple Subsitutability Without Affecting the Taxonomy
Year: 1992 Download: Fashion.ps Abstract: Two areas where common object-oriented modeling power lacks the necessary expressiveness are identified. An analysis of the situation shows that there exists a single reason why the real world situations cannot be modeled adequately. What really is missing is a means to express that objects of a certain type are able to behave in a fashion objects of another type would do. To remedy this situation we introduce a single new concept, define its semantics, give a thorough analysis of its applicability in light of strong typing, and illustrate its symbiosis with inheritance and genericity. The concept is illuminated by means of several examples. Among those the realization of views and versions are the most interesting.
Optimizing Boolean Expressions in Object Bases
Year: 1992 Download: vldb92.bxp.ps Abstract: In this paper we address the problem of optimizing the evaluation of Boolean expressions in the context of object-oriented data modeling. We develop a new heuristic for optimizing the evaluation sequence of Boolean expressions based on selectivity and cost estimates of the terms constituting the Boolean expression. The quality and efficiency of the heuristic is evaluated based on a quantitative analysis which compares our heuristic with the optimal, but infeasible algorithm and other known methods. The heuristic is based on the selectivity and evaluation-cost estimates of the terms of which the Boolean expression is composed. Deriving these inputs of the heuristics, i.e., the selectivity and cost estimates, is then addressed. We use an adaptation of well-known sampling for estimating the selectivity of terms. The cost estimation is much more complex than in the relational context due to the possibility of invoking functions within a Boolean expression. We develop the decapsulation method that derives cost estimates by analyzing the implementation of (encapsulated) functions.
Access Support Relations: An Indexing Method for Object Bases
Year: 1992 Download: infsys.ASR.ps Abstract: In this work access support relations are introduced as a means for optimizing query processing in object-oriented database systems. The general idea is to maintain separate structures (disassociated from the object representation) to redundantly store those object references that are frequently traversed in database queries. The proposed access support relation technique is no longer restricted to relate an object (tuple) to an atomic value (attribute value) as in conventional indexing. Rather, access support relations relate objects with each other and can span over reference chains which may contain collection-valued components in order to support queries involving path expressions. We present several alternative extensions and decompositions of access support relations for a given path expression, the best of which has to be determined according to the application-specific database usage profile. An analytical performance analysis of access support relations is developed. This analytical cost model is, in particular, used to determine the best access support relation extension and decomposition with respect to specific database configuration and usage characteristics.
On the Notion of Concept
Year: 1992 Download: er91.concept.ps Abstract: The notion of concept is central to the notion of data model in databases. Its purpose is to provide construction mechanisms for organizing a universe of data into manageable segments with a well-defined structure. However, so far the notion lacks rigor with a concomitant danger of inconsistent, overlapping or redundant use of concepts. The paper takes the approach that rigor is achieved by expressing the intuitive semantics within a formal logic. In doing so, the notion of concept may also provide a mechanism for structuring sets of facts, rules, and constraints in general deductive databases. In the paper a formal semantics for concept definitions is given by showing that each concept definition can be interpreted as a mapping from one database state to another. A number of examples show the usefulness of the concept definition. Especially, the basics of object-orientation are modeled as concepts. The paper also examines some difficult issues arising in connection with the concepts in conventional data models. First practical experiences are related, and topics for further research are pointed out.
Structuring the Distributed Object World of CIM
Year: 1992 Download: INCOME.ps Abstract: This paper's thesis is that today's object-oriented data models do not offer enough expressiveness to structure the distributed world of CIM in an appropriate manner. Two relationships useful for modeling CIM applications which common object-oriented models lack are identified. The first expresses that some objects describe different prospectives - or facets - of one entity in the real world. The other represents the distribution of activities within a factory. The usefulness of both relationships is argued and a comparison with other relationships which are mentioned in the literature is given.
Clustering in Object Bases
Year: 1992 Download: KHE-92-6.ps Goal Completion in Consistency Driven Planning
Year: 1992 Download: goalCompletion.ps Abstract: Lately, the notion of goal completion was introduced. Unfortunately, no proposal concerning its implementation and integration into a general purpose planner is given. Further, only informal arguments on the benefits of goal completion are presented. Within this paper, we present some formal results on the completeness of planning strategies when goal completion is applied. Further, anew reduction method for the search space of planning strategies presented, which, in conjunction with goal completion preserves their completeness. Some benchmark results complete the paper.
Die Datenbankprogrammiersprache GOMpl
Year: 1992 Download: KHE-92-25.ps Abstract: Das objektorientierte Datenbanksystem GOM wurde am Institut für Programmstrukturen und Datenorganisation (IPD) im Rahmen des Sonderforschungsbereichs 346 "Rechnerintegrierte Konstruktion und Fertigung von Bauteilen" entwickelt. Dieses Dokument enthält eine Beschreibung des Datenmodells von GOM, der persistenten Programmiersprache GOMpl. Hierbei gehen wir in Form einer Einführung auf die grundlegenden Konzepte von GOM ein. Insbesondere werden die für GOMpl zentralen Konzepte Typ, Objekt, Persistenz, Polymorphic und Generizität beschrieben. Weiterhin gehen wir auf Nebenläufigkeit von GOM-Anwendungen sowie auf die Schnittstelle zwischen GOMpl und der Programmiersprache C ein. Im Ausblick skizzieren wir mögliche Erweiterungen von GOMpl, die als Konzepte bereits vorliegen, aber noch nicht in die Sprache aufgenommen wurden. Eine ausführlichere Beschreibung von GOMpl findet sich in dem GOM-Handbuch [KKM+90].
:: 1991
Function Materialization in Object Bases
Year: 1991 Download: sigmod91.ps Abstract: We describe function materialization as an optimization concept in object-oriented databases. Exploiting the object-oriented paradigm - namely classification, object identity, and encapsulation facilitates a rather easy incorporation of function materialization into (existing) object-oriented systems. Furthermore, the exploitation of encapsulation (information hiding) and object identity provides for additional performance tuning measures which drastically decrease the rematerialization overhead incurred by updates in the object base. The paper concludes with a quantitative analysis of function materialization based on a sample performance benchmark obtained from our experimental object base system GOM.
Reactive Consistency Control in Deductive Databases
Year: 1991 Download: tods91.pdf Abstract: Classical treatment of violations of consistency constraints is to back out a database operation or transaction. In applications with large numbers of fairly complex consistency constraints this clearly is an unsatisfactory solution. Instead, if a violation is detected the user should be given a diagnosis of the constraints that failed, a line of reasoning on the cause that could have led to the violation, and suggestions for a therapy. These demands suggest that methods are borrowed from knowledge-based and expert systems to meet them. The paper introduces the basic issues that arise in connection with such a reactivity of a database system, and illustrates a line of approach with the aid of a simple example.
On the Compilation of Consistency Constraints
Year: 1991 A Framework for Strong Typing and Type Inference in (Persistent) Object Models
Year: 1991 Download: dexa91.typisierung.ps Abstract: In this paper a coherent framework of subtyping is developed that achieves strong typing in (persistent) object models without compromising flexibility and expressiveness. The typing framework is based on the concept of substitutability - meaning that from the type consistency perspective objects exhibiting a particular functional characteristic are substitutable for objects of another kind without compromising type safety. We deduce substitutability from the functional specification of object types, called the type signature. Utilizing our framework we show that inheritance-based subtyping on tuple-structured types cannot generally be extended to set-structured or any other collection types, e.g., lists, without sacrificing static verifiability of type consistency.
Query Optimization in Object Bases: Exploiting Relational Techniques
Year: 1991 Download: Dagstuhl.ps Abstract: There exists a large body of knowledge - gathered over a period of almost two decades - in relational query optimization. In this paper we investigate to what extent the relational optimization techniques can be applied to object-oriented query processing. As a "testbed" we chose our object-oriented database GOM which incorporates two very general indexing structures: access support relations which materialize frequently traversed path expressions and generalized materialization relations which maintain precomputed function results. These two indexing structures provide a good "yardstick" for analyzing a query optimizer since they subsume many other access support schemes proposed in the literature. In the analysis of rule-based query optimization we contrast two different internal query representation languages that were developed for rule-based optimization in GOM: 1. the procedurally oriented term language and 2. an object algebra representation that is based on relational algebra with a few extensions. The two representation languages are illustrated by way of optimizing an example query, step by step. Our initial approach to algebraic optimization reveals that the starting point for the optimization process is crucial. Therefore, we propose the so-called most costly normal form (MCNF) as the initial algebraic representation of object queries. Re-optimizing the example query illustrates that this starting point leads to a more structured approach 1. to query rewriting and 2. to generating alternatives during the optimization process.
Modelling Distributed CIM Applications Utilizing Autonomous Objects
Year: 1991 Download: er90.ubiquity.ps Abstract: Object-oriented database systems are being proposed as the next-generation database technology - especially for so-called non-standard application domains. However, we have argued in an earlier paper that the conventional object paradigm falls short of providing adequate support for such complex application domains as Computer-Integrated Manufacturing (CIM). Today's object-oriented data models still adhere to the conventional procedure invocation execution model. This leads to entirely passive objects whose behavioral pattern is dictated by outside requests under the single-thread-of-control paradigm. Very often, the result is a rather "unnatural" model of complex applications, e.g., CASE and CIM, because the multitude of tasks that are executed in parallel in such distributed application environments has to be forced into a single control flow. We propose to bridge the gap between reality and the informational model by a model of autonomous objects that communicate with each other by "true" message passing. In this paper we present language concepts that enhance the (conventional) object paradigm towards object autonomy and attempt a (first) validation based on modeling a sample factory.
GOM: A Strongly Typed Persistant Model with Polymorphism
Year: 1991 Abstract: In this paper the persistent object model GOM is described. GOM is an object-oriented data model that provides the most essential object features in a "lean" and coherent syntactical framework. These features include: object identity, object instantiation, subtyping and inheritance, operation refinement, dynamic (late) binding. One of the main goals in the design of GOM was type safety. In order to achieve this we developed a strongly typed language that enables the verification of type safety at compile time. It is shown in this paper how commonly encountered "traps" for strong typing are avoided in GOM by specifying a very clean subtyping semantics on the basis of substitutability and type signatures. The typing rules that enforce strong typing at compile time somewhat restrain the flexibility of the object model because subtyping and inheritance has to be restricted - in particular - for collection-valued types. The solution to regain the expressiveness that was traded off for safety is by combining inheritance and explicitly controlled operation polymorphism. To make operations polymorphic signatures may contain type variables which are substituted by named types at compile time.
:: 1990
Correcting Anomalies of Standard Inheritance - A Constraint-Based Approach
Year: 1990 Download: dexa90.inheritance.ps Abstract: When using standard inheritance of attributes from superclasses one problem often arises: the subtype relation is conflicting with the subset relation. Inspired by geometrical modeling we will propose a solution to this problem on the basis of constraint inheritance which is - internally - modeled as a term rewriting system. We treat multiple inheritance and preserve strong typing. Further we discuss homogeneous and heterogeneous inheritance. In the case of homogeneous inheritance we treat both, fixed and overwritable defaults. Class constants can be seen as a special case of fixed defaults. For a set of type definitions, i.e., the user defined schema with type associated constraints, we derive sets of rewrite rules which are then used to model the different kinds of inheritance just mentioned. These rules serve two purposes: first, they are used to rewrite the user defined schema to derive an internal schema and, second, to rewrite the operators. The last step enhances efficiency because the type associated constraints can be utilized for operator optimization.
Autonomy over Ubiquity: Coping with the Complexity of a Distributed World
Year: 1990 Download: ER90.ps Abstract: Today's distributed database systems adhere to the principle of ubiquity such that all the information is everywhere available in an equitable fashion. This transparent data distribution leads to at least two severe problems: The complexity becomes unmanageable if one tries to impose a centralized control upon the computer model of the "real" world, that is naturally distributed. If at all possible, the centralized control of ubiquity would be prohibitively expensive in terms of performance. We argue that the paradigm of ubiquity should be replaced by object autonomy in a distributed, persistent environment. The objects have built-in control over many important aspects that were previously controlled centrally: the life cycle, the cooperation control, and the distribution control. We describe the functionality of such an autonomous object model in metaphorical terms, concentrating on internal object aspects, inter-object communication and distributed infrastructure. We then validate - at a conceptual level - our approach by redesigning the informational model of a real CIM application that is operational in an experimental factory (Produktionstechnisches Labor) at the University of Karlsruhe.
Future Database Technology: Driving Forces and Directions
Year: 1990 Download: TRENDS.ps Access Support in Object Bases
Year: 1990 Download: sigmod90.ps, technical report Abstract: In this work access support relations are introduced as a means for optimizing query processing in object-oriented database systems. The general idea is to maintain separate structures (disassociated from the object representation) to store object references that are frequently traversed in database queries. The proposed access support relation technique is no longer restricted to relate an object (tuple) to an atomic value (attribute value) as in conventional indexing. Rather, access support relations relate objects with each other and can span over reference chains which may contain collection-valued components in order to support queries involving path expressions. We present several alternative extensions of access support relations for a given path expression, the best of which has to be determined according to the application-specific database usage profile. An analytical performance analysis of access support relations is developed. This analytical cost model is, in particular, used to determine the best access relation extension and decomposition with respect to specific database configuration and usage characteristics.
Advanced Query Processing in Object Bases Using Access Support Relations (VLDB 90 Conference, Brisbane, Australia, Aug. 1990)
Year: 1990 Download: vldb90.asrqueryopt.pdf Abstract: Even though the large body of knowledge of relational query optimization techniques can be utilized as a starting point for object-oriented query optimization the full exploitation of the object-oriented paradigm requires new, customized optimization techniques - not merely the assimilation of relational methods. This paper describes such an optimization strategy used in the GOM (Generic Object Model) project which combines established relational methods with new techniques designed for object models. The optimization method unites two concepts: access support relations and rule-based query optimization. Access support relations constitute an index structure that is tailored for accessing objects along reference chains leading from one object to another via single-valued or set-valued attributes. The idea is to redundantly maintain frequently traversed reference chains separate from the object representation. The rule-based query optimizer generates for a declaratively stated query an evaluation plan that utilizes as much as possible the existing access support relations. This makes the exploitation of access support relations entirely transparent to the database user. The rule-based query optimizer is particularly amenable to incorporating search heuristics in order to prune the search space for an optimal (or near-optimal) query evaluation plan.
:: 1988
Efficient Consistency Control in Deductive Databases
Year: 1988 Download: icdt88.ps Abstract: In this paper a theoretical framework for efficiently checking the consistency of deductive databases is provided and proven to be correct. Our method is based on focussing on the relevant parts of the database by reasoning forwards from the updates of a transaction, and using this knowledge about real or just possible implicit updates for simplifying the consistency constraints in question. Opposite to the algorithms by Kowalski/Sadri and Lloyd/Topor, we are neither committed to determine the exact set of implicit updates nor to determine a fairly large superset of it by only considering the head literals of deductive rule clauses. Rather, our algorithm unifies these two approaches by allowing to choose any of the above or even intermediate strategies for any step of reasoning forwards. This flexibility renders possible the integration of statistical data and knowledge about access paths into the checking process. Second, deductive rules are organized into a graph to avoid searching for applicable rules in the proof procedure. This graph resembles a connection graph, however, a new method of interpreting it avoids the introduction of new clauses and links.