Learning To Crawl Web Forums

Lieferzeit: Lieferbar innerhalb 14 Tagen

35,90 

ISBN: 6135812343
ISBN 13: 9786135812343
Autor: Punjabi, Vipul
Verlag: LAP LAMBERT Academic Publishing
Umfang: 60 S.
Erscheinungsdatum: 04.02.2018
Auflage: 1/2018
Format: 0.5 x 22 x 15
Gewicht: 107 g
Produktform: Kartoniert
Einband: Kartoniert
Artikelnummer: 3708748 Kategorie:

Beschreibung

Present Forum Crawler Under Supervision (FoCUS), a supervised web-scale forum crawler. The goal of FoCUS is to crawl relevant forum content from the web with minimal overhead. Forum threads contain information content that is the target of forum crawlers. Although forums have dierent layouts or styles and are powered by dierent forum software packages, they always have similar implicit navigation paths connected by speci c URL types to lead users from entry pages to thread pages. Based on this observation, we reduce the web forum crawling problem to a URL-type recognition problem. And we show how to learn accurate and eective regular expression patterns of implicit navigation paths from automatically created training sets using aggregated results from weak page type classi ers. Robust page type clas-si ers can be trained from as few as ve annotated forums and applied to a large set of unseen forums.

Autorenporträt

Mr.Vipul D.Punjabi M-Tech IT, BE Computer, Assistant Professor in R.C.Patel Institute of Technology, Shirpur.

Herstellerkennzeichnung:


OmniScriptum SRL
Str. Armeneasca 28/1, office 1
2012 Chisinau
MD

E-Mail: info@omniscriptum.com

Das könnte Ihnen auch gefallen …