Extracting content from text continues to be an important research problem for information processing and management. Approaches to capture the semantics of text-based document collections may be based on Bayesian models, probability theory, vector space models, statistical models, or even graph theory.

As the volume of digitized textual media continues to grow, so does the need for designing robust, scalable indexing and search strategies (software) to meet a variety of user needs. Knowledge extraction or creation from text requires systematic yet reliable processing that can be codified and adapted for changing needs and environments.

This book will draw upon experts in both academia and industry to recommend practical approaches to the purification, indexing, and mining of textual information. It will address document identification, clustering and categorizing documents, cleaning text, and visualizing semantic models of text.

Les mer
I Clustering and Classification.- 1 Cluster-Preserving Dimension Reduction Methods for Efficient Classification of Text Data.- 2 Automatic Discovery of Similar Words.- 3 Simultaneous Clustering and Dynamic Keyword Weighting for Text Documents.- 4 Feature Selection and Document Clustering.- II Information Extraction and Retrieval.- 5 Vector Space Models for Search and Cluster Mining.- 6 HotMiner: Discovering Hot Topics from Dirty Text.- 7 Combining Families of Information Retrieval Algorithms Using Metalearning.- III Trend Detection.- 8 Trend and Behavior Detection from Web Queries.- 9 A Survey of Emerging Trend Detection in Textual Data Mining.
Les mer

 

As the volume of digitized textual information continues to grow, so does the critical need for designing robust and scalable indexing and search strategies/software to meet a variety of user needs. Knowledge extraction or creation from text requires systematic, yet reliable processing that can be codified and adapted for changing needs and environments.

Survey of Text Mining is a comprehensive edited survey organized into three parts: Clustering and Classification; Information Extraction and Retrieval; and Trend Detection. Many of the chapters stress the practical application of software and algorithms for current and future needs in text mining. Authors from industry provide their perspectives on current approaches for large-scale text mining and obstacles that will guide R&D activity in this area for the next decade.

Topics and features:

* Highlights issues such as scalability, robustness, and software tools

* Brings together recent research and techniques from academia and industry

* Examines algorithmic advances in discriminant analysis, spectral clustering, trend detection, and synonym extraction

* Includes case studies in mining Web and customer-support logs for hot- topic extraction and query characterizations

* Extensive bibliography of all references, including websites

This useful survey volume taps the expertise of academicians and industry professionals to recommend practical approaches to purifying, indexing, and mining textual information. Researchers, practitioners, and professionals involved in information retrieval, computational statistics, and data mining, who need the latest text-mining methods and algorithms, will find the book an indispensable resource.

Les mer
Springer Book Archives
Springer Book Archives
Includes supplementary material: sn.pub/extras
GPSR Compliance The European Union's (EU) General Product Safety Regulation (GPSR) is a set of rules that requires consumer products to be safe and our obligations to ensure this. If you have any concerns about our products you can contact us on ProductSafety@springernature.com. In case Publisher is established outside the EU, the EU authorized representative is: Springer Nature Customer Service Center GmbH Europaplatz 3 69115 Heidelberg, Germany ProductSafety@springernature.com
Les mer

Produktdetaljer

ISBN
9781441930576
Publisert
2011-10-09
Utgiver
Vendor
Springer-Verlag New York Inc.
Høyde
235 mm
Bredde
155 mm
Aldersnivå
Research, P, 06
Språk
Product language
Engelsk
Format
Product format
Heftet

Redaktør