New Springer Book, soon available from the Springer shop
C. Batini, M. Scannapieco
Data and Information Quality: Dimensions, Principles and Techniques
This new book:
- Presents an extensive description of the techniques that constitute the core of data and information quality research
- Combines concrete practical solutions, such as methodologies, benchmarks, and case studies with sound theoretical formalisms
- Includes also necessary foundations from probability theory, statistical data analysis, and machine learning
Special issue of ACM JDIQ on Web Data Quality
- Christian Bizer, University of Mannheim, Germany, email@example.com
- Luna Dong, Google, USA, firstname.lastname@example.org
- Ihab Ilyas, University of Waterloo, Canada, email@example.com
- Maria-Esther Vidal, Universidad Simon Bolivar, Venezuela, firstname.lastname@example.org
The volume and variety of data that is available on the web has risen sharply. In addition to traditional data sources and formats such as CSV files, HTML tables and deep web query interfaces, new techniques such as Microdata, RDFa, Microformats and Linked Data have found wide adoption. In parallel, techniques for extracting structured data from web text and semi-structured web content have matured resulting in the creation of large-scale knowledge bases such as NELL, YAGO, DBpedia, and the Knowledge Vault.
Independent of the specific data source or format or information extraction methodology, data quality challenges persist in the context of the web. Applications are confronted with heterogeneous data from a large number of independent data sources while metadata is sparse and of mixed quality. In order to utilize the data, applications must first deal with this widely varying quality of the available data and metadata.
The goal of this special issue of JDIQ is to present innovative research in the areas of Web Data Quality Assessment and Web Data Cleansing. Specific topics within the scope of the call include, but are not limited to, the following:
Web Data Quality Assessment:
- Metrics and methods for assessing the quality of web data, including Linked Data, Microdata, RDFa, Microformats and tabular data.
- Methods for uncovering distorted and biased data / data SPAM detection.
- Methods for quality-based web data source selection.
- Methods for copy detection.
- Methods for assessing the quality of instance- and schema-level links Linked Data.
- Ontologies and controlled vocabularies for describing the quality of web data sources and metadata.
- Best practices for metadata provision.
- Cost and benefits of web data quality assessment and benchmarks.
Web Data Cleansing:
- Methods for cleansing Web data, Linked Data, Microdata, RDFa, Microformats and tabular data.
- Conflict resolution using semantic knowledge and truth discovery.
- Human-in-the-loop and crowdsourcing for data cleansing.
- Data quality for automated knowledge base construction.
- Empirical evaluation of scalability and performance of data cleansing methods and benchmarks.
Applications and use cases in the life sciences, healthcare, media, social media, government and sensor data.
||November 1, 2015
||January 15, 2016
||February 15, 2016
||March 30, 2016
New options for ACM authors to manage rights and permissions for their work
ACM introduces a new publishing license agreement, an updated copyright transfer agreement, and a new author-pays option which allows for perpetual open access through the ACM Digital Library. For more information, visit the ACM Author Rights webpage.