We are pleased to announce
that three workshops and
one tutorial will take
place at the HP campus,
Fort Collins, Colorado, in
Building 3, on Tuesday, 16
September, 2014. This
year, each
workshop/tutorial will take
up the full day. Please
enter Building 3 Lobby
(best entrance to the
facilities is from Harmony
Road) and proceed to your
event. Many billions of
documents are stored in the
Portable Document Format
(PDF). These documents
contain a wealth of
information, however, that
information is often
perceived as inaccessible.
However, often this is down
to the tools used to create
and process them rather than
PDF itself. Initial versions
of PDF were primarily aimed
at perfect,
device-independent print and
display reproduction.
However, future versions have
included many additions aimed
at adding
“structureâ€
to PDF documents as well as
improved support for print
and display. These non-print
capabilities include the
obvious items such as
bookmarks, article threads,
hyperlinks, commenting,
logical structure, metadata,
file attachments, digital
signatures and more. While
many are aware of the
print-based capabilities of
PDF, fewer are aware of these
non-print capabilities. In
fact, there is significant
misinformation related to PDF
even in the scientific
communities. It is these
higher-level features of PDF
that make it such a versatile
container format for modern
documents by allowing it to
combine structural markup
with reliable, high-quality
presentation.The workshops are: DChanges 2014 - Document Changes: Modeling, Detection, Storage and Visualization Organizers: Gioele Barabucci, Uwe M. Borghoff, Angelo Di Iorio, Sonja Maier, Ethan Munson The goal of DChanges is to share ideas, common issues and principles about models and algorithms for change tracking and detection, versioning and collaborative editing. We want to look at these topics from different perspectives and want to identify the most common issues and the peculiarities of each domain and each approach. This edition in particular will be focused on interpretation, visualisation, processing and exploitation of changes. One of last edition's outcomes was that we identified the need for novel interfaces to better understand and exploit detected changes. Several issues were pointed out as still unsolved: interfaces do not scale when dealing with many changes, changes at different levels of abstraction are often not sufficiently taken into account, detection and visualisation are often inter-mixed, logs are often detailed but underexploited, and versioning techniques are not very well suited for non-technical people. Contributions on related topics (diff and merging algorithms, change tracking, applications to other domains) and from related areas (e.g., software engineering, collaboration, or ontology management) are also welcome. Further information is available on the workshop website. SemADoc: Semantic Analysis of Documents Organizers: Carlotta Domeniconi, Evangelos Milios A large number of document management problems would benefit from having the semantics of documents explicitly represented. However, manually assigning semantic descriptions to documents is labour intensive and error prone. At the same time, the manual generation of domain specific taxonomies is not only labour intensive, but it also needs to be repeated often as the domains themselves and their key concepts shift with time. In this workshop we will focus on document content analysis and semantic enrichment to generate a layer of semantic description of documents that is useful for document management tasks, such as semantic information retrieval, conceptual organization and clustering of document collections for sense making, semantic expert profiling, and document recommender systems. The workshop is timely and relevant to the Document Engineering community, as its focus is on semantically enriching documents and document collections, to make them more accessible to their readers. The task is nontrivial due to the volume of text data and the rate at which text data is accumulated by companies, government, and individuals. Further information is available on the workshop website. DH-CASE II: Collaborative Annotations in Shared Environments: metadata, tools and techniques in the Digital Humanities Organizers: Patrick Schmitz, Laurie Pearce, Quinn Dombrowski Digital Humanities is rapidly becoming a central part of humanities research, drawing upon tools and approaches from Computer Science, Information Organization, and Document Engineering to address the challenges of analyzing and annotating the growing number and range of corpora that support humanist scholarship.
From cuneiform tablets,
ancient scrolls, and
papyri, to contemporary
letters, books, and
manuscripts, corpora of
interest to humanities
scholars span the
world’s
cultures and historic
range. More and more
documents are being
transliterated,
digitized, and made
available for study with
digital tools.
Scholarship ranges from
translation to
interpretation, from
syntactic analysis to
multi-corpus synthesis of
patterns and ideas.
Underlying much of
humanities scholarship is
the activity of
annotation. Annotation of
the
“aboutnessâ€
of documents and entities
ranges from linguistic
markup, to structural and
semantic relations, to
subjective commentary;
 annotation of
“activityâ€
around documents and
entities includes
scholarly workflows,
analytic processes, and
patterns of influence
among a community of
scholars. Sharable
annotations and
collaborative
environments support
scholarly discourse,
facilitating traditional
practices and enabling
new ones.Â
Â
The focus of this
workshop is on the tools
and environments that
support annotation,
broadly defined,
including modeling,
authoring, analysis,
publication and sharing.
We will explore shared
challenges and differing
approaches, seeking to
identify emerging best
practices, as well as
those approaches that may
have potential for wider
application or
influence.
Further information is
available on the workshop
website.
DocEng 2014: PDF Tutorial Organizers: Matthew Hardy and Steven Bagley The focus of this tutorial is to give attendees practical knowledge of how to create and handle PDFs that take advantage of the non-print features of PDF to provide rich access to the information within, using a variety of commercial and open-source tools. We will get under-the-hood of PDF and analyze the poor practices that cause PDFs to be inaccessible; see how to access the text and graphics within a PDF; and the features of PDF that can be used to make the information much more accessible. We will also discuss some of the new ISO standards that provide profiles for producing Accessible PDFs. No prior experience of PDF is expected.
|