Web Page Segmentation and Pagination for Enhancing Readability

Lemari et al investigated the effects of the text visual structure on text comprehension in segmented presentation [12] They found if readers are not provided with any information about the text visual structure (pagination) or if they are provided with unusable information, they heavily rely

Systems Design & Programming Paging and Segmentation CMPE 310

Systems Design & Programming Paging and Segmentation CMPE 310 Privilege Levels CPL is defined by the descriptors, so access to them must be restricted Privileged Instructions: Q Those that affect the segmentation and protection mechanisms (CPL=0 only) For example, LGDT, LTR, HLT Q Those that alter the Interrupt flag (CPL

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B

Content is final as presented, with the exception of pagination BALLA-ARABÉ et al : FAST AND ROBUST LSM FOR IMAGE SEGMENTATION 3 In this case, the external force can be included as follows:

Gestion de la mémoire - Cours systemes dexploitation

I La segmentation La pagination : La mémoire virtuelle étudiée jusqu’ici est à une dimension, les adresses virtuelles sont comprises entre 0 et une adresse maximale Chaque segment est une suite d’adressescontinus de 0 à une adresse maximale autorisée Les segments ont des tailles différentes qui varient en cours

A 3D CNN-LSTM-Based Image-to-Image Foreground Segmentation

This article has been accepted for inclusion in a future issue of this journal Content is final as presented, with the exception of pagination AKILAN et al : 3D CNN-LSTM-BASED IMAGE-TO-IMAGE FOREGROUND SEGMENTATION 3 Fig 3 CNN feature ﬂows: (a) ResNet ﬂow, and (b) the residual feature mapping of our 3D CNN-LSTM FG segmenter

RoadNet-RT: High Throughput CNN Architecture and SoC Design

This article has been accepted for inclusion in a future issue of this journal Content is final as presented, with the exception of pagination BAI et al : RoadNeT-RT: HIGH THROUGHPUT CNN ARCHITECTURE AND SoC DESIGN FOR REAL-TIME ROAD SEGMENTATION 3 Fig 2 The mainstream structures for real-time semantic segmentation

ChipNet: Real-Time LiDAR Processing for Drivable Region

region segmentation However, due to the diversity in road scene, it is difﬁcult to design a feature descriptor that handles all visual cases and light conditions In addition, Shen et al proposed a series of algorithms to cluster super-pixels that could improve vision based semantic segmentation [28], [29]

Transformation-Consistent Self-Ensembling Model for

You et al [13] combined radial projection and self-training learning to improve the segmentation of retinal vessel from fundus image Portela et al [14] presented a clustering-based Gaussian mixture model to automatically segment brain MR images Later on, Gu et al [16] constructed forest oriented superpixels for vessel segmentation For

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL 49, NO 11

This article has been accepted for inclusion in a future issue of this journal Content is final as presented, with the exception of pagination SONG et al : AN 8–16 Gb/s, 0 65–1 05 pJ/b, VOLTAGE-MODE TRANSMITTER WITH ANALOG IMPEDANCE MODULATION EQUALIZATION 3 Fig 4 2-tap FIR equalization in low-swing voltage-mode drivers

QCM - Système dexploitation

ressources matérielles et logicielles de l'ordinateur B) créé une "machine virtuelle" qui est plus facile à programmer que la machine réelle et qui est la même quelque soit la machine réelle C) gère la segmentation et la pagination de la M P D) les 3 dernières réponses E) aucune des 4 dernières réponses 2

[PDF] pagination systeme d'exploitation

[PDF] telecharger un livre de grammaire pdf

[PDF] larousse conjugaison pdf

[PDF] telecharger larousse difficultés grammaticales pdf

[PDF] larousse grammaire francais

[PDF] larousse orthographe pdf

[PDF] larousse livre de bord orthographe pdf

[PDF] telecharger livre larousse grammaire gratuit pdf

[PDF] introduction grammaire generative

[PDF] chomsky théorie

[PDF] chomsky linguistique pdf

[PDF] aspects de la théorie syntaxique pdf

[PDF] grammaire grecque ancien

[PDF] sommaire paginé rapport de stage

[PDF] grammaire grec moderne pdf

Vol. 2(4), Jan. 2016, pp. 234-238

234

Article History:

JKBEI DOI: 649123/11034

Received Date:

15 Sep. 2015

Accepted Date:

17 Dec.

2015

Available Online: 09

Jan. 2016

Web Page Segmentation and Pagination for Enhancing Readability

Ahmad Pouramini

Department of Computer Engineering,

Sirjan University of Technology, Sirjan, Iran

*Corresponding Author's E-mail: pouramini@sirjantech.ac.ir

Abstract

eb page readability can be defined as the combination of reading comprehension, reading speed and user satisfaction. To improve the readability of a web page, content extraction and transformation techniques are used to present the main content to the reader in a more readable fashion. In this paper, we present the design and architecture of a readability enhancement system. We aim at enhancing both the reading speed and comprehension. To achieve these goals, we extract and segment the main content into smaller coherent semantic units. These units are further

augmented with text signals such as section headings, captions and page numbers in order to convey the

text organization and the page visual structure; thus, enhancing the content comprehension. Our proposed system particularly suits constrained display devices such as mobile phones and PDAs Keywords: Reading comprehension; Readability Enhancement; Web page Customization.

1. Introduction

The rapid growth of World Wide Web has been tremendous in recent years. With the large amount of information on the Internet, web pages have become the main source of information. However, reading web pages on computer screen or a mobile phone have some difficulties. Beside the main

content, a web page may comprise of distracting parts such as ads, animations, logos, that can degrade

the readability of the main content. In addition, color contrast, font style, letter spacing, layout, line

height and length of the content are among the other factors that affect the web page readability [1].

The problem can be more serious for specific individuals such as older adults, visually impaired users,

non-native readers (those reading a page in a non-native language). These people need more

concentration to comprehend the text, especially if the text is a news or scientific article [2]. In this

paper, we propose a system for enhancing the web page readability. We define readability as the combination of reading comprehension, reading speed and user satisfaction. The main stages of our method are extracting the main content; segmenting it into coherent semantic units and presenting

each unit on a separated page to the reader. In our definition, a semantic unit is a discrete chunk of

information that serves a specific meaningful purpose within the overall structure of a topic, such as

itemized lists, paragraphs, images, tables. In addition, text signals such as section headings are added

to each page in order to help the user to keep track of text organization.

2. Web Page Readability

Typically, readability tools use content extraction methods to eliminate distracting parts and content reformatting and transformation to enhance the reading speed and comprehension of the content. Eliminating distracting parts can also provide easier access to the web over constrained

devices like mobile phones [3], [4]. In the following sections, we review literature on the enhancing

page readability and usability. W

Ahmad Pouramini et al. / Vol. 2(4) Jan. 2016, pp. 234-238 JKBEI DOI: 649123/11034

235

Journal of Knowledge

-Based Engineering and Innovation (JKBEI)

Universal Scientific Organization,

http://www.aeuso.org/jkbei

ISSN: 2413-6794 (Online)

2.1. Content Transformation and reformatting

Richards et al. [5] proposed guidelines to have web content adaptations and transformations for specific populations including disabled people, older adults and visually impaired users. Certain changes such as font style, font enlargement, increased inter-letter spacing, and enhanced color contrast can increase legibility for this population. Some other studies focus on the readability enhancement on the user interface level. In an effort to enhance online reading, Walker et al. used a visual -syntactic text formatting (VSTF) method in which

sentences are analyzed and reformatted into cascading patterns that cue syntactic structure and assist

visual processing [6]. In a similar work [7] offered a visual reformatting method which aims readability

for non-native readers of English documents.

2.2. Scrolling vs. Paging

There are several studies on the effects of scrolling and paging on text reading and comprehension. Many of these studies support that the comprehension of text is better in paging, especially for narrative, complex and long texts. Piolat et al. found that while there was no difference in reading speed, paging resulted in better comprehension and recalling information [8]. Imai and Omodani

reported that both the reading time and comprehension level were superior in paging than in scrolling

[9]. Sanches et al. also reported that for complex content, scrolling reduced reading comprehension,

especially when working memory was low [2]. Similarly, Fukaya et al. found that on small touch devices

the comprehension level for narrative texts was slightly better in paging, and for reading of procedural

texts, bot h scrolling and paging are suitable [10]. According to Wastlaund et al., reading a text document with a page layout can reduces mental load and enhance the speed and comprehension [11].

2.2. Visual Structure and Text Signals

Some other studies investigat

ed the effect of visual structure on reading comprehension [12], [13].

The assumption relies on the effect of text signaling on text cognitive processing. Text signals are used

by authors to clarify text organization and emphasize important content [13]. They include a variety of

writing devices such as typographical cues, preview statements and overviews, titles and headings to communicate the text organization [13]. Hyona et al. found that the presence of headings in a text aids memory for the text. Moreover, they facilitate the search for specific information relevant to the headings. A heading that communicates organizational information may trigger processing of relations between two subsections that otherwise may not occur [14]. Other researches also showed how the presence of headings aids summarization [15]. Lemari et al. investigated the effects of the text visual structure on text comprehension in segmented presentation [12]. They found if readers are not provided with any information about the

text visual structure (pagination) or if they are provided with unusable information, they heavily rely

on the segmentation unit to give a structure to the text. As a result, if the segmentation unit does not match the text structure, it leads to a misinterpretation of the relationship between text segments.

3. System Architecture

Based on these studies, we propose our system for web page readability enhancement. The overall architecture of the system is shown in Figure 1. As can be observed, the input of the system is the HTML document (DOM tree) and the output is the text segments extracted from the document. In between, there are two stages namely content extraction and content segmentation. The content

extraction identifies the main content of the web page. The output is one or more nodes classified as

the main content. These nodes are input to the segmentation stage, which decomposes them into

Ahmad Pouramini et al. / Vol. 2(4) Jan. 2016, pp. 234-238 JKBEI DOI: 649123/11034

236

Journal of Knowledge

-Based Engineering and Innovation (JKBEI)

Universal Scientific Organization,

http://www.aeuso.org/jkbei

ISSN: 2413-6794 (Online)

semantic units. The resulting segments are further processed to be presented into a more readable and comprehensible format. The following sections will explain these stages.

Figure 1. Main stages of the proposed system.

3.1. Main Content Extraction

There are many approaches to perform the main content extraction of web pages. Most unsupervised methods utilize heuristics rules in order to automatically determine the main content.

The features used in content classification range from visual to text features such as sentence or link

density. They basically vary in terms of how general or specific they intend to be, and the target application. To select a suitable approach for the proposed system, we made the following assumptions:

• Our main goal is to enhance the readability of reading materials such as news articles, blogs,

encyclopedia articles and so on; therefore, we have more assumptions on the structure of the Web page. • We work on the rendered page by a web browser; therefore, we have access to the dynamic and visual properties of the page elements. • Speed and accuracy of the extraction algorithm is more important than its generality. Based on these assumptions and requirements, a suitable method could be a densitometric method, which has proved efficient for content-rich documents, such as news, encyclopedia articles [16]. To improve the efficiency of such a method, we can employ vision-based features such as the location of a block in the page (e.g. the main block often appears in the central part of the page) because the system has access to the rendered page elements. We selected the method introduced by Kohlschütter et al. for boilerplate detection using shallow text features such as link density and text density ratios. They assumed that textual content on web pages can be grouped into two main classes, long text (most likely the main content) and short text

(most likely navigational boilerplate text) respectively. Using this simple classification model they

achieved competitive accuracy.

3.1. Content Segmentation into Semantic Units

In this stage, the extracted nodes which contain the main content are decomposed into semantic units. By a semantic unit, we mean a piece of content which conveys coherent information. We use a recursive algorithm which takes as input the sub-tree associated with each extracted node of the DOM

Ahmad Pouramini et al. / Vol. 2(4) Jan. 2016, pp. 234-238 JKBEI DOI: 649123/11034

237

Journal of Knowledge

-Based Engineering and Innovation (JKBEI)

Universal Scientific Organization,

http://www.aeuso.org/jkbei

ISSN: 2413-6794 (Online)

tree. It traverses this sub-tree"s nodes in depth first manner. When a node matching the features of a

semantic unit is visited, the algorithm adds it to the list of the extracted segments. The main feature

we used to distinguish such a unit is the HTML tag (paragraph (P), Lists (UL, OL), or headings (H1, H2 ...), Images (IMG), Tables (TABLE)).

These tags show the main structural

units of a reading material. In addition, HTML line breaks (the BR tag) and horizontal line (the HR tag) is used to break down a text

that is not enclosed by a block-level element. For the tables and images, the caption must be extracted

as a separated text unit and then be merged to the corresponding unit. A caption is usually a short text

enclosed by a block-level HTML element, before or after a table or an image. It may contain the Table"

or Figure" keywords. These features can be used to identify them.

Figure 2. A page augmented with text signals.

3.2 Transformation and Presentation of the Units

By following the stages above, a list of segments is generated which must be presented to the

reader. Before that, in order to enhance the comprehension of each segment, specific text signals are

added to it (see Figure 2). The main idea is to repeat the section's heading in every segment belonging

to that section. Note the headings are extracted as separated units in the previous stage. To add them

to the related segments, the list of the extracted units is iterated. By visiting a heading element, it is

stored in a variable (H1, H2, H3 ...) and is used as the title of the units that follow it. We use multiple

variables corresponding to different levels of headings (H1, H2, H3 ...) and repeat all or some of them in a hierarchical manner in the related units. For example, the title "Machine Learning > History and

relations to other fields > Relation to statistics" can be used for a text unit about the relation between

machine learning and statistics. In figure 2, just the last two headings (H2 and H3) are used because

the user usually recalls the main topic of text ("Machine Learning"). As mentioned before, this

organization of headings helps the user to scan and find specific information faster. It also helps them

to grasp the relationship of the current topic with preceding topics. In addition to headings, the page

number and a progress bar showing the position of the current page in all the pages are also added to

assist the user with scanning the document.

3.2 Implementation

The system

was implemented as an application with an embedded web browser, which provides access to the DOM tree and the rendered properties of the page elements. These properties are required in the extraction and segmentation phases. Moreover, by using the rendered elements on

each page, the original visual appearance of the text is preserved. To present the extracted units in

separated pages, JavaScript and CSS3 programming were used to build a sideshow. Finally, readability

Ahmad Pouramini et al. / Vol. 2(4) Jan. 2016, pp. 234-238 JKBEI DOI: 649123/11034

238

Journal of Knowledge

-Based Engineering and Innovation (JKBEI)

Universal Scientific Organization,

http://www.aeuso.org/jkbei

ISSN: 2413-6794 (Online)

guidelines such as the font size, text and background contrast and spacing is used to provide a better

presentation especially on constrained display devices.

Conclusion

In this paper, we have proposed the design of a Web page readability tool, which aims at enhancing both the reading speed and comprehension. We first reviewed the related studies on reading and comprehension and based on them, we proposed the architecture and the main stages of the system. The system mainly involves the main content extraction, decomposition of the extracted content into semantic units and pagination. The units are manipulated to convey the text organization by adding text signals such as the section headings. Moreover, presenting each unit in a separated page with enough spacing and margins and a proper font-size improves the reading speed and the user satisfaction. We pointed that this system is especially useful for constrained devices like PDAs and mobile phones.

References

[1] Q. Pakistan, "Web readability factors affecting users of all ages," Australian Journal of Basic and Applied Sciences, vol. 5, no. 11, pp. 972-977, 2011.

[2] C. A. Sanchez and J. Wiley, "To scroll or not to scroll: Scrolling, working memory capacity, and comprehending complex texts," Human Factors: The Journal of the Human Factors and Ergonomics Society, vol. 51, no. 5, pp. 730-738, 2009.

[3] D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma, "Extracting content structure for web pages based on visual representation," in Web technologies and applications, Springer, 2003, pp. 406-417.

[4] S. Gupta, G. Kaiser, D. Neistadt, and P. Grimm, "DOM-based content extraction of hTML documents," in Proceedings of the 12th international conference on world wide web, 2003, pp. 207-214.

[5] J. T. Richards and V. L. Hanson, "Web accessibility: A broader view," in Proceedings of the 13th international conference on World Wide Web, 2004, pp. 72-79.

[6] S. Walker, P. Schloss, C. R. Fletcher, C. A. Vogel, and R. C. Walker, "Visual-syntactic text formatting: A new method to enhance online reading," Reading Online, vol. 8, no. 6, pp. 1096-1232, 2005.

[7] C.-H. Yu and R. C. Miller, "Enhancing web page readability for non-native readers," in Proceedings of the sIGCHI conference on human factors in computing systems, 2010, pp. 2523-2532.

[8] A. Piolat, J.-Y. Roussey, and O. Thunin, "Effects of screen presentation on text reading and revising," International Journal of Human-Computer Studies, vol. 47, no. 4, pp. 565-589, 1997.

[9] J. Imai and M. Omodani, "Reason why comprehension level tends to decrease at reading tasks on displays-challenge to the realization of readable electronic papers," Nihon Gazo Gakkaishi/Journal of the Imaging Society of Japan, vol. 46, no. 2, 2007.

quotesdbs_dbs12.pdfusesText_18

[PDF] Web Page Segmentation and Pagination for Enhancing Readability

Vol. 2(4), Jan. 2016, pp. 234-238

Article History:

JKBEI DOI: 649123/11034

Received Date:

15 Sep. 2015

Accepted Date:

17 Dec.

Available Online: 09

Jan. 2016

Ahmad Pouramini

Department of Computer Engineering,

Sirjan University of Technology, Sirjan, Iran

Abstract

1. Introduction

2. Web Page Readability

Journal of Knowledge

Universal Scientific Organization,

ISSN: 2413-6794 (Online)

2.1. Content Transformation and reformatting

2.2. Scrolling vs. Paging

2.2. Visual Structure and Text Signals

Some other studies investigat

3. System Architecture

Journal of Knowledge

Universal Scientific Organization,

ISSN: 2413-6794 (Online)

Figure 1. Main stages of the proposed system.

3.1. Main Content Extraction

3.1. Content Segmentation into Semantic Units

Journal of Knowledge

Universal Scientific Organization,

ISSN: 2413-6794 (Online)

These tags show the main structural

Figure 2. A page augmented with text signals.

3.2 Transformation and Presentation of the Units

3.2 Implementation

The system

Journal of Knowledge

Universal Scientific Organization,

ISSN: 2413-6794 (Online)

Conclusion

References