Principles of Data Integration is the first comprehensive textbook of data integration, covering theoretical principles and implementation issues as well as current challenges raised by the semantic web and cloud computing. The book offers a range of data integration solutions enabling you to focus on what is most relevant to the problem at hand. Readers will also learn how to build their own algorithms and implement their own data integration application.
Written by three of the most respected experts in the field, this book provides an extensive introduction to the theory and concepts underlying today's data integration techniques, with detailed, instruction for their application using concrete examples throughout to explain the concepts.
This text is an ideal resource for database practitioners in industry, including data warehouse engineers, database system designers, data architects/enterprise architects, database researchers, statisticians, and data analysts; students in data analytics and knowledge discovery; and other data professionals working at the R&D and implementation levels.
Key Features
- Offers a range of data integration solutions enabling you to focus on what is most relevant to the problem at hand
- Enables you to build your own algorithms and implement your own data integration applications
Dedication
Preface
1. Introduction
1.1 What Is Data Integration?
1.2 Why Is It Hard?
1.3 Data Integration Architectures
1.4 Outline of the Book
Bibliographic Notes
Part I: Foundational Data Integration Techniques
2. Manipulating Query Expressions
2.1 Review of Database Concepts
2.2 Query Unfolding
2.3 Query Containment and Equivalence
2.4 Answering Queries Using Views
Bibliographic Notes
3. Describing Data Sources
3.1 Overview and Desiderata
3.2 Schema Mapping Languages
3.3 Access-Pattern Limitations
3.4 Integrity Constraints on the Mediated Schema
3.5 Answer Completeness
3.6 Data-Level Heterogeneity
Bibliographic Notes
4. String Matching
4.1 Problem Description
4.2 Similarity Measures
4.3 Scaling Up String Matching
Bibliographic Notes
5. Schema Matching and Mapping
5.1 Problem Definition
5.2 Challenges of Schema Matching and Mapping
5.3 Overview of Matching and Mapping Systems
5.4 Matchers
5.5 Combining Match Predictions
5.6 Enforcing Domain Integrity Constraints
5.7 Match Selector
5.8 Reusing Previous Matches
5.9 Many-to-Many Matches
5.10 From Matches to Mappings
Bibliographic Notes
6. General Schema Manipulation Operators
6.1 Model Management Operators
6.2 Merge
6.3 ModelGen
6.4 Invert
6.5 Toward Model Management Systems
6.5 Bibliographic Notes
7. Data Matching
7.1 Problem Definition
7.2 Rule-Based Matching
7.3 Learning-Based Matching
7.4 Matching by Clustering
7.5 Probabilistic Approaches to Data Matching
7.6 Collective Matching
7.7 Scaling Up Data Matching
Bibliographic Notes
8. Query Processing
8.1 Background: DBMS Query Processing
8.2 Background: Distributed Query Processing
8.3 Query Processing for Data Integration
8.4 Generating Initial Query Plans
8.5 Query Execution for Internet Data
8.6 Overview of Adaptive Query Processing
8.7 Event-Driven Adaptivity
8.8 Performance-Driven Adaptivity
Bibliographic Notes
9. Wrappers
9.1 Introduction
9.2 Manual Wrapper Construction
9.3 Learning-Based Wrapper Construction
9.4 Wrapper Learning without Schema
9.5 Interactive Wrapper Construction
Bibliographic Notes
10. Data Warehousing and Caching
10.1 Data Warehousing
10.2 Data Exchange: Declarative Warehousing
10.3 Caching and Partial Materialization
10.4 Direct Analysis of Local, External Data
Bibliographic Notes
Part II: Integration with Extended Data Representations
11. XML
11.1 Data Model
11.2 XML Structural and Schema Definitions
11.3 Query Language
11.4 Query Processing for XML
11.5 Schema Mapping for XML
Bibliographic Notes
12. Ontologies and Knowledge Representation
12.1 Example: Using KR in Data Integration
12.2 Description Logics
12.3 The Semantic Web
Bibliographic Notes
13. Incorporating Uncertainty into Data Integration
13.1 Representing Uncertainty
13.2 Modeling Uncertain Schema Mappings
13.3 Uncertainty and Data Provenance
Bibliographic Notes
14. Data Provenance
14.1 The Two Views of Provenance
14.2 Applications of Data Provenance
14.3 Provenance Semirings
14.4 Storing Provenance
Bibliographic Notes
Part III: Novel Integration Architectures
15. Data Integration on the Web
15.1 What Can We Do with Web Data?
15.2 The Deep Web
15.3 Topical Portals
15.4 Lightweight Combination of Web Data
15.5 Pay-as-You-Go Data Management
Bibliographic Notes
16. Keyword Search
16.1 Keyword Search over Structured Data
16.2 Computing Ranked Results
16.3 Keyword Search for Data Integration
Bibliographic Notes
17. Peer-to-Peer Integration
17.1 Peers and Mappings
17.2 Semantics of Mappings
17.3 Complexity of Query Answering in PDMS
17.4 Query Reformulation Algorithm
17.5 Composing Mappings
17.6 Peer Data Management with Looser Mappings
Bibliographic Notes
18. Integration in Support of Collaboration
18.1 What Makes Collaboration Different
18.2 Processing Corrections and Feedback
18.3 Collaborative Annotation and Presentation
18.4 Dynamic Data: Collaborative Data Sharing
Bibliographic Notes
19. The Future of Data Integration
19.1 Uncertainty, Provenance, and Cleaning
19.2 Crowdsourcing and “Human Computing”
19.3 Building Large-Scale Structured Web Databases
19.4 Lightweight Integration
19.5 Visualizing Integrated Data
19.6 Integrating Social Media
19.7 Cluster- and Cloud-Based Parallel Processing and Caching
Bibliography
Index
Han & Kamber, Data Mining: Concepts and Techniques, 2e (MK 2006). (9781558609013) $74.95
Allemang, Semantic Web for the Working Ontologist (MK 2008) 97801238735560. $69.95/51.95EURO/42.99GBP
Witten/Frank, Data Mining: Practical Machine Learning Tools and Techniques, 2e (MK 2005). (9780120884070) $69.95/51.95EURO/42.99GBP
Database practitioners in industry, i.e., data warehouse engineers, database system designers, data architects/enterprise architects, database researchers, statisticians, data analysts, and other data professionals working at the R&D and implementation levels. Students in data analytics and knowledge discovery