A study of Document object Model and removing noise from web pages

Lieferzeit: Lieferbar innerhalb 14 Tagen

39,90 

ISBN: 6139843669
ISBN 13: 9786139843664
Autor: Sultan, Bisma
Verlag: LAP LAMBERT Academic Publishing
Umfang: 60 S.
Erscheinungsdatum: 25.07.2019
Auflage: 1/2019
Format: 0.5 x 22 x 15
Gewicht: 107 g
Produktform: Kartoniert
Einband: Kartoniert
Artikelnummer: 7842695 Kategorie:

Beschreibung

The Violation of official rules of HTML results in some spot or error. Furthermore, when we convert a word document into web page, the code contains some unnecessary html tags as well as proprietary tags. Such undesirable, redundant, inessential, irrelevant tags are considered as noise. These noisy elements disturb the web page contents and make it difficult to read the contents of web page. Noise adversely affects web data mining and by eliminating noise we can reduce storage and indexing requirements Noise removal helps us to improve the performance of web page clustering, classification, content mining, and summarization. In the proposed work, web page noise has been identified by using four popular web browsers namely Google chrome, Internet Explorer7, Mozilla Firefox and opera and three web authoring tools which are Ms Word, Dreamweaver8 and Microsoft expression web4. Once the noise has been identified, we then classified this noise into different categories based on the source of word document. The experiment was conducted by running 40 web pages on the four popular web browsers and the results obtained shows that web page noise to a large extend depends on the source

Autorenporträt

The author is pursuing PhD in Computer Science at University of Kashmir, Srinagar. She has been awarded M.Tech in CSE from University of jammu, and B.Tech in CSE from University of Kashmir. Her areas of expertise are Web Technologies and Deep Learning

Herstellerkennzeichnung:


OmniScriptum SRL
Str. Armeneasca 28/1, office 1
2012 Chisinau
MD

E-Mail: info@omniscriptum.com

Das könnte Ihnen auch gefallen …