Web Page Segmentation and Pagination for Enhancing Readability
Lemari et al investigated the effects of the text visual structure on text comprehension in segmented presentation [12] They found if readers are not provided with any information about the text visual structure (pagination) or if they are provided with unusable information, they heavily rely
Systems Design & Programming Paging and Segmentation CMPE 310
Systems Design & Programming Paging and Segmentation CMPE 310 Privilege Levels CPL is defined by the descriptors, so access to them must be restricted Privileged Instructions: Q Those that affect the segmentation and protection mechanisms (CPL=0 only) For example, LGDT, LTR, HLT Q Those that alter the Interrupt flag (CPL
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B
Content is final as presented, with the exception of pagination BALLA-ARABÉ et al : FAST AND ROBUST LSM FOR IMAGE SEGMENTATION 3 In this case, the external force can be included as follows:
Gestion de la mémoire - Cours systemes dexploitation
I La segmentation La pagination : La mémoire virtuelle étudiée jusqu’ici est à une dimension, les adresses virtuelles sont comprises entre 0 et une adresse maximale Chaque segment est une suite d’adressescontinus de 0 à une adresse maximale autorisée Les segments ont des tailles différentes qui varient en cours
A 3D CNN-LSTM-Based Image-to-Image Foreground Segmentation
This article has been accepted for inclusion in a future issue of this journal Content is final as presented, with the exception of pagination AKILAN et al : 3D CNN-LSTM-BASED IMAGE-TO-IMAGE FOREGROUND SEGMENTATION 3 Fig 3 CNN feature flows: (a) ResNet flow, and (b) the residual feature mapping of our 3D CNN-LSTM FG segmenter
RoadNet-RT: High Throughput CNN Architecture and SoC Design
This article has been accepted for inclusion in a future issue of this journal Content is final as presented, with the exception of pagination BAI et al : RoadNeT-RT: HIGH THROUGHPUT CNN ARCHITECTURE AND SoC DESIGN FOR REAL-TIME ROAD SEGMENTATION 3 Fig 2 The mainstream structures for real-time semantic segmentation
ChipNet: Real-Time LiDAR Processing for Drivable Region
region segmentation However, due to the diversity in road scene, it is difficult to design a feature descriptor that handles all visual cases and light conditions In addition, Shen et al proposed a series of algorithms to cluster super-pixels that could improve vision based semantic segmentation [28], [29]
Transformation-Consistent Self-Ensembling Model for
You et al [13] combined radial projection and self-training learning to improve the segmentation of retinal vessel from fundus image Portela et al [14] presented a clustering-based Gaussian mixture model to automatically segment brain MR images Later on, Gu et al [16] constructed forest oriented superpixels for vessel segmentation For
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL 49, NO 11
This article has been accepted for inclusion in a future issue of this journal Content is final as presented, with the exception of pagination SONG et al : AN 8–16 Gb/s, 0 65–1 05 pJ/b, VOLTAGE-MODE TRANSMITTER WITH ANALOG IMPEDANCE MODULATION EQUALIZATION 3 Fig 4 2-tap FIR equalization in low-swing voltage-mode drivers
QCM - Système dexploitation
ressources matérielles et logicielles de l'ordinateur B) créé une "machine virtuelle" qui est plus facile à programmer que la machine réelle et qui est la même quelque soit la machine réelle C) gère la segmentation et la pagination de la M P D) les 3 dernières réponses E) aucune des 4 dernières réponses 2
[PDF] telecharger un livre de grammaire pdf
[PDF] larousse conjugaison pdf
[PDF] telecharger larousse difficultés grammaticales pdf
[PDF] larousse grammaire francais
[PDF] larousse orthographe pdf
[PDF] larousse livre de bord orthographe pdf
[PDF] telecharger livre larousse grammaire gratuit pdf
[PDF] introduction grammaire generative
[PDF] chomsky théorie
[PDF] chomsky linguistique pdf
[PDF] aspects de la théorie syntaxique pdf
[PDF] grammaire grecque ancien
[PDF] sommaire paginé rapport de stage
[PDF] grammaire grec moderne pdf
Vol. 2(4), Jan. 2016, pp. 234-238
234Article History:
JKBEI DOI: 649123/11034
Received Date:
15 Sep. 2015
Accepted Date:
17 Dec.
2015Available Online: 09
Jan. 2016
Web Page Segmentation and Pagination for Enhancing ReadabilityAhmad Pouramini
Department of Computer Engineering,
Sirjan University of Technology, Sirjan, Iran
*Corresponding Author's E-mail: pouramini@sirjantech.ac.irAbstract
eb page readability can be defined as the combination of reading comprehension, reading speed and user satisfaction. To improve the readability of a web page, content extraction and transformation techniques are used to present the main content to the reader in a more readable fashion. In this paper, we present the design and architecture of a readability enhancement system. We aim at enhancing both the reading speed and comprehension. To achieve these goals, we extract and segment the main content into smaller coherent semantic units. These units are furtheraugmented with text signals such as section headings, captions and page numbers in order to convey the
text organization and the page visual structure; thus, enhancing the content comprehension. Our proposed system particularly suits constrained display devices such as mobile phones and PDAs Keywords: Reading comprehension; Readability Enhancement; Web page Customization.1. Introduction
The rapid growth of World Wide Web has been tremendous in recent years. With the large amount of information on the Internet, web pages have become the main source of information. However, reading web pages on computer screen or a mobile phone have some difficulties. Beside the maincontent, a web page may comprise of distracting parts such as ads, animations, logos, that can degrade
the readability of the main content. In addition, color contrast, font style, letter spacing, layout, line
height and length of the content are among the other factors that affect the web page readability [1].
The problem can be more serious for specific individuals such as older adults, visually impaired users,
non-native readers (those reading a page in a non-native language). These people need moreconcentration to comprehend the text, especially if the text is a news or scientific article [2]. In this
paper, we propose a system for enhancing the web page readability. We define readability as the combination of reading comprehension, reading speed and user satisfaction. The main stages of our method are extracting the main content; segmenting it into coherent semantic units and presentingeach unit on a separated page to the reader. In our definition, a semantic unit is a discrete chunk of
information that serves a specific meaningful purpose within the overall structure of a topic, such as
itemized lists, paragraphs, images, tables. In addition, text signals such as section headings are added
to each page in order to help the user to keep track of text organization.2. Web Page Readability
Typically, readability tools use content extraction methods to eliminate distracting parts and content reformatting and transformation to enhance the reading speed and comprehension of the content. Eliminating distracting parts can also provide easier access to the web over constraineddevices like mobile phones [3], [4]. In the following sections, we review literature on the enhancing
page readability and usability. WAhmad Pouramini et al. / Vol. 2(4) Jan. 2016, pp. 234-238 JKBEI DOI: 649123/11034
235Journal of Knowledge
-Based Engineering and Innovation (JKBEI)Universal Scientific Organization,
http://www.aeuso.org/jkbeiISSN: 2413-6794 (Online)
2.1. Content Transformation and reformatting
Richards et al. [5] proposed guidelines to have web content adaptations and transformations for specific populations including disabled people, older adults and visually impaired users. Certain changes such as font style, font enlargement, increased inter-letter spacing, and enhanced color contrast can increase legibility for this population. Some other studies focus on the readability enhancement on the user interface level. In an effort to enhance online reading, Walker et al. used a visual -syntactic text formatting (VSTF) method in whichsentences are analyzed and reformatted into cascading patterns that cue syntactic structure and assist
visual processing [6]. In a similar work [7] offered a visual reformatting method which aims readability
for non-native readers of English documents.2.2. Scrolling vs. Paging
There are several studies on the effects of scrolling and paging on text reading and comprehension. Many of these studies support that the comprehension of text is better in paging, especially for narrative, complex and long texts. Piolat et al. found that while there was no difference in reading speed, paging resulted in better comprehension and recalling information [8]. Imai and Omodanireported that both the reading time and comprehension level were superior in paging than in scrolling
[9]. Sanches et al. also reported that for complex content, scrolling reduced reading comprehension,especially when working memory was low [2]. Similarly, Fukaya et al. found that on small touch devices
the comprehension level for narrative texts was slightly better in paging, and for reading of procedural
texts, bot h scrolling and paging are suitable [10]. According to Wastlaund et al., reading a text document with a page layout can reduces mental load and enhance the speed and comprehension [11].2.2. Visual Structure and Text Signals
Some other studies investigat
ed the effect of visual structure on reading comprehension [12], [13].The assumption relies on the effect of text signaling on text cognitive processing. Text signals are used
by authors to clarify text organization and emphasize important content [13]. They include a variety of
writing devices such as typographical cues, preview statements and overviews, titles and headings to communicate the text organization [13]. Hyona et al. found that the presence of headings in a text aids memory for the text. Moreover, they facilitate the search for specific information relevant to the headings. A heading that communicates organizational information may trigger processing of relations between two subsections that otherwise may not occur [14]. Other researches also showed how the presence of headings aids summarization [15]. Lemari et al. investigated the effects of the text visual structure on text comprehension in segmented presentation [12]. They found if readers are not provided with any information about thetext visual structure (pagination) or if they are provided with unusable information, they heavily rely
on the segmentation unit to give a structure to the text. As a result, if the segmentation unit does not match the text structure, it leads to a misinterpretation of the relationship between text segments.3. System Architecture
Based on these studies, we propose our system for web page readability enhancement. The overall architecture of the system is shown in Figure 1. As can be observed, the input of the system is the HTML document (DOM tree) and the output is the text segments extracted from the document. In between, there are two stages namely content extraction and content segmentation. The contentextraction identifies the main content of the web page. The output is one or more nodes classified as
the main content. These nodes are input to the segmentation stage, which decomposes them intoAhmad Pouramini et al. / Vol. 2(4) Jan. 2016, pp. 234-238 JKBEI DOI: 649123/11034
236Journal of Knowledge
-Based Engineering and Innovation (JKBEI)Universal Scientific Organization,
http://www.aeuso.org/jkbeiISSN: 2413-6794 (Online)
semantic units. The resulting segments are further processed to be presented into a more readable and comprehensible format. The following sections will explain these stages.Figure 1. Main stages of the proposed system.
3.1. Main Content Extraction
There are many approaches to perform the main content extraction of web pages. Most unsupervised methods utilize heuristics rules in order to automatically determine the main content.The features used in content classification range from visual to text features such as sentence or link
density. They basically vary in terms of how general or specific they intend to be, and the target application. To select a suitable approach for the proposed system, we made the following assumptions:• Our main goal is to enhance the readability of reading materials such as news articles, blogs,
encyclopedia articles and so on; therefore, we have more assumptions on the structure of the Web page. • We work on the rendered page by a web browser; therefore, we have access to the dynamic and visual properties of the page elements. • Speed and accuracy of the extraction algorithm is more important than its generality. Based on these assumptions and requirements, a suitable method could be a densitometric method, which has proved efficient for content-rich documents, such as news, encyclopedia articles [16]. To improve the efficiency of such a method, we can employ vision-based features such as the location of a block in the page (e.g. the main block often appears in the central part of the page) because the system has access to the rendered page elements. We selected the method introduced by Kohlschütter et al. for boilerplate detection using shallow text features such as link density and text density ratios. They assumed that textual content on web pages can be grouped into two main classes, long text (most likely the main content) and short text(most likely navigational boilerplate text) respectively. Using this simple classification model they
achieved competitive accuracy.3.1. Content Segmentation into Semantic Units
In this stage, the extracted nodes which contain the main content are decomposed into semantic units. By a semantic unit, we mean a piece of content which conveys coherent information. We use a recursive algorithm which takes as input the sub-tree associated with each extracted node of the DOMAhmad Pouramini et al. / Vol. 2(4) Jan. 2016, pp. 234-238 JKBEI DOI: 649123/11034
237Journal of Knowledge
-Based Engineering and Innovation (JKBEI)Universal Scientific Organization,
http://www.aeuso.org/jkbeiISSN: 2413-6794 (Online)
tree. It traverses this sub-tree"s nodes in depth first manner. When a node matching the features of a
semantic unit is visited, the algorithm adds it to the list of the extracted segments. The main feature
we used to distinguish such a unit is the HTML tag (paragraph (P), Lists (UL, OL), or headings (H1, H2 ...), Images (IMG), Tables (TABLE)).These tags show the main structural
units of a reading material. In addition, HTML line breaks (the BR tag) and horizontal line (the HR tag) is used to break down a textthat is not enclosed by a block-level element. For the tables and images, the caption must be extracted
as a separated text unit and then be merged to the corresponding unit. A caption is usually a short textenclosed by a block-level HTML element, before or after a table or an image. It may contain the Table"
or Figure" keywords. These features can be used to identify them.Figure 2. A page augmented with text signals.
3.2 Transformation and Presentation of the Units
By following the stages above, a list of segments is generated which must be presented to thereader. Before that, in order to enhance the comprehension of each segment, specific text signals are
added to it (see Figure 2). The main idea is to repeat the section's heading in every segment belonging
to that section. Note the headings are extracted as separated units in the previous stage. To add them
to the related segments, the list of the extracted units is iterated. By visiting a heading element, it is
stored in a variable (H1, H2, H3 ...) and is used as the title of the units that follow it. We use multiple
variables corresponding to different levels of headings (H1, H2, H3 ...) and repeat all or some of them in a hierarchical manner in the related units. For example, the title "Machine Learning > History andrelations to other fields > Relation to statistics" can be used for a text unit about the relation between
machine learning and statistics. In figure 2, just the last two headings (H2 and H3) are used because
the user usually recalls the main topic of text ("Machine Learning"). As mentioned before, thisorganization of headings helps the user to scan and find specific information faster. It also helps them
to grasp the relationship of the current topic with preceding topics. In addition to headings, the page
number and a progress bar showing the position of the current page in all the pages are also added to
assist the user with scanning the document.3.2 Implementation
The system
was implemented as an application with an embedded web browser, which provides access to the DOM tree and the rendered properties of the page elements. These properties are required in the extraction and segmentation phases. Moreover, by using the rendered elements oneach page, the original visual appearance of the text is preserved. To present the extracted units in
separated pages, JavaScript and CSS3 programming were used to build a sideshow. Finally, readabilityAhmad Pouramini et al. / Vol. 2(4) Jan. 2016, pp. 234-238 JKBEI DOI: 649123/11034
238Journal of Knowledge
-Based Engineering and Innovation (JKBEI)Universal Scientific Organization,
http://www.aeuso.org/jkbeiISSN: 2413-6794 (Online)
guidelines such as the font size, text and background contrast and spacing is used to provide a better
presentation especially on constrained display devices.Conclusion
In this paper, we have proposed the design of a Web page readability tool, which aims at enhancing both the reading speed and comprehension. We first reviewed the related studies on reading and comprehension and based on them, we proposed the architecture and the main stages of the system. The system mainly involves the main content extraction, decomposition of the extracted content into semantic units and pagination. The units are manipulated to convey the text organization by adding text signals such as the section headings. Moreover, presenting each unit in a separated page with enough spacing and margins and a proper font-size improves the reading speed and the user satisfaction. We pointed that this system is especially useful for constrained devices like PDAs and mobile phones.References
[1] Q. Pakistan, "Web readability factors affecting users of all ages," Australian Journal of Basic and Applied Sciences, vol. 5, no. 11, pp. 972-977, 2011.
[2] C. A. Sanchez and J. Wiley, "To scroll or not to scroll: Scrolling, working memory capacity, and comprehending complex texts," Human Factors: The Journal of the Human Factors and Ergonomics Society, vol. 51, no. 5, pp. 730-738, 2009.
[3] D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma, "Extracting content structure for web pages based on visual representation," in Web technologies and applications, Springer, 2003, pp. 406-417.
[4] S. Gupta, G. Kaiser, D. Neistadt, and P. Grimm, "DOM-based content extraction of hTML documents," in Proceedings of the 12th international conference on world wide web, 2003, pp. 207-214.
[5] J. T. Richards and V. L. Hanson, "Web accessibility: A broader view," in Proceedings of the 13th international conference on World Wide Web, 2004, pp. 72-79.
[6] S. Walker, P. Schloss, C. R. Fletcher, C. A. Vogel, and R. C. Walker, "Visual-syntactic text formatting: A new method to enhance online reading," Reading Online, vol. 8, no. 6, pp. 1096-1232, 2005.
[7] C.-H. Yu and R. C. Miller, "Enhancing web page readability for non-native readers," in Proceedings of the sIGCHI conference on human factors in computing systems, 2010, pp. 2523-2532.
[8] A. Piolat, J.-Y. Roussey, and O. Thunin, "Effects of screen presentation on text reading and revising," International Journal of Human-Computer Studies, vol. 47, no. 4, pp. 565-589, 1997.
[9] J. Imai and M. Omodani, "Reason why comprehension level tends to decrease at reading tasks on displays-challenge to the realization of readable electronic papers," Nihon Gazo Gakkaishi/Journal of the Imaging Society of Japan, vol. 46, no. 2, 2007.
quotesdbs_dbs12.pdfusesText_18