Main Data
Author: Alexander Mehler, Serge Sharoff, Marina Santini
Editor: Alexander Mehler, Serge Sharoff, Marina Santini
Title: Genres on the Web Computational Models and Empirical Studies
Publisher: Springer-Verlag
ISBN/ISSN: 9789048191789
Edition: 1
Price: CHF 166.60
Publication date: 01/01/2010
Category: Informatik, EDV Buch
Language: English
Technical Data
Pages: 362
Kopierschutz: DRM
Geräte: PC/MAC/eReader/Tablet
Formate: PDF
Table of contents

The volume 'Genres on the Web' has been designed for a wide audience, from the expert to the novice. It is a required book for scholars, researchers and students who want to become acquainted with the latest theoretical, empirical and computational advances in the expanding field of web genre research. The study of web genre is an overarching and interdisciplinary novel area of research that spans from corpus linguistics, computational linguistics, NLP, and text-technology, to web mining, webometrics, social network analysis and information studies. This book gives readers a thorough grounding in the latest research on web genres and emerging document types.

The book covers a wide range of web-genre focused subjects, such as:
• The identification of the sources of web genres
• Automatic web genre identification
• The presentation of structure-oriented models
• Empirical case studies

One of the driving forces behind genre research is the idea of a genre-sensitive information system, which incorporates genre cues complementing the current keyword-based search and retrieval applications.

Table of contents
Personal Note9
Part I Introduction14
1 Riding the Rough Waves of Genre on the Web 15
1.1 Why Is Genre Important?15
1.1.1 Zooming In: Information on the Web16
1.2 Trying to Grasp the Ungraspable?18
1.2.1 In Quest of a Definition of Web Genre for Empirical Studies and Computational Applications20
1.3 Empirical and Computational Approaches to Genre: Open Issues21
1.3.1 Web Documents21
1.3.2 Corpora, Genres and the Web26
1.3.3 Empirical and Computational Models of Web Genres30
1.4 Conclusions34
1.5 Outline of the Volume35
Part II Identifying the Sources of Web Genres43
2 Conventions and Mutual Expectations 44
2.1 Genres Are Not Rule-Bound44
2.2 So, Let's Ask the Readers46
2.3 An Editorial, Third Party, View of Genres on the Web51
2.4 Data Source: Observation of User Actions53
2.5 Conclusions56
3 Identification of Web Genres by User Warrant 58
3.1 Introduction58
3.2 Criteria for the Identification of Web Genre60
3.3 Operationalizing Traditional Genre Theory for the World Wide Web61
3.3.1 A Genre's User Group61
3.3.2 Genre: Function, Form and Substance63
3.3.3 Genres on the Web: Further Implications for Research66
3.4 Developing a Web Genre Palette66
3.4.1 Collecting Genre Terminology in the Users' Own Words67
3.4.2 Users Choose the Best of the Collected Genre Terminology69
3.4.3 User Validation of the Genre Palette72
3.4.4 A Fourth Study: Determining the Genres' Usefulness for Web Search75
3.5 Conclusion76
4 Problems in the Use-Centered Development of a Taxonomy of Web Genres 79
4.1 Introduction79
4.1.1 What Is the Purpose of a Genre Taxonomy?80
4.2 Why Is It Hard to Develop a Web Genre Taxonomy?81
4.2.1 Difficulties in Defining Genres81
4.2.2 Difficulties in Developing the Scope and Expressiveness of the Taxonomy83
4.3 A Use-Centered Development of a Taxonomy of Web Genres85
4.3.1 Research Design: Naturalistic Field Study85
4.3.2 Research Informants85
4.3.3 Data Elicitation86
4.3.4 Data Analysis87
4.4 Results88
4.5 Discussion89
4.6 Conclusions92
Part III Automatic Web Genre Identification95
5 Cross-Testing a Genre Classification Model for the Web 96
5.1 Introduction96
5.2 Approximating Genre Population on the Web99
5.2.1 Noise100
5.2.2 Description of the Corpora Used for Cross-Testing101
5.3 The Web as Communication105
5.3.1 Genre Palette105
5.3.2 Linguistically- and Functionally-Motivated Features107
5.4 The Genre Model107
5.4.1 Methodology110
5.4.2 Flow and Hypotheses111
5.5 Results113
5.5.1 Cross-Testing Performance on Single Labels: BBC and 7-Webgenre Collections114
5.5.2 Performances of Other Single-Label Models on the 7-Webgenre Collection117
5.5.3 Cross-Testing Performance on Single Labels: Mapped Web Genres120
5.5.4 Cross-Testing Performance on Single Labels: HCG and MCG in Isolation122
5.5.5 The SPIRIT Sample: An Attempt to Assess Multilabelling122
5.6 Discussion126
5.7 Conclusion and Future Work127
6 Formulating Representative Features with Respect to Genre Classification138
6.1 Introduction138
6.2 Defining Genre Classification141
6.2.1 Document Representation in Conventional Text Classification141
6.2.2 Harmonic Descriptor Representation (HDR) of Documents141
6.2.3 Defining Genre145
6.3 Classifiers146
6.4 Dataset147
6.5 Features149
6.6 Results151
6.6.1 Overall Accuracy151
6.6.2 Precision and Recall152
6.7 Conclusions154
7 In the Garden and in the Jungle 157
7.1 Introduction157
7.2 Text Typology for the Web159
7.3 An Experiment in Automatic Classification of the Web163
7.4 Analysis of Results167
7.4.1 Qualitative Assessment of Texts in Each Category167
7.4.2 Assessing the Composition of ukWac169
7.5 Conclusions and Future Research170
8 Web Genre Analysis: Use Cases, Retrieval Models, and Implementation Issues 175
8.1 Introduction175
8.1.1 Contributions176
8.2 Use Cases: Genre Analysis in the Retrieval Practice176
8.2.1 Genre-Enabled Web Search177
8.2.2 Information Extraction Based on Genre Information177
8.2.3 Organizing Collections in Both Topic and Genre Dimensions179