16 oct 2012 · 5 9 Creating a database resource from available data 8 2 Clusters and clouds The Advanced R / Bioconductor Programming workshop provides experienced an automatically-generated reference manual, which is a
Previous PDF | Next PDF |
[PDF] Download eBook \\ Advanced R: Data Programming and the Cloud
Advanced R: Data Programming and the Cloud (Paperback) Filesize: 4 99 MB Reviews This ebook can be worth a read, and superior to other Yes, it is
[PDF] Advanced R Programming - Lecture 5
18 sept 2017 · Advanced R Programming - Lecture 5 4/ 39 Input and output Basic I/O Cloud storage web APIs Post ”data” to server (to get something)
[PDF] An Introduction to R - The Comprehensive R Archive Network
An Introduction to R Notes on R: A Programming Environment for Data Analysis and Graphics Permission is granted to make and distribute verbatim copies of this manual provided the copyright Expressions as objects form an advanced part of R which will not clouds or to “brushing” (interactively highlighting) points
[PDF] R Programming
Chambers (2010) - Software for Data Analysis: Programming with R, Statistics for relatively advanced users: R has thousands of packages, de- signed recommend saving plots in PDF format, as this makes it easiest to integrate with a 57 clouds Exercise 8 1 Write a function which takes a positive integer n and write
[PDF] R and RStudio Basics - Tufts University
S:\Tutorials Tip Sheets\Tufts\Tutorial Data\R and RStudio Basics into that folder You can also to export your plot as an image file or a pdf To repository (https ://cloud r-project org/) Build/Debug/Profile: Advanced tools for programming
[PDF] The Book of R
Library of Congress Cataloging-in-Publication Data Names: Davies Title: The book of R : a first course in programming and statistics / by Basic 3D Cloud
[PDF] Hands-On Programming with R - cloudfrontnet
Practice and apply R programming concepts as you learn them Garrett Grolemund is a statistician, teacher, and R developer who works as a data scientist and
[PDF] R Computing services as SaaS in the Cloud - EGI (Indico) - EGIeu
R is a programming language and software environment for statistical among statisticians and data miners for developing statistical need to analyse data and are not IT experts R Aims to provide advanced capabilities for research on
[PDF] Advanced R / Bioconductor Programming
16 oct 2012 · 5 9 Creating a database resource from available data 8 2 Clusters and clouds The Advanced R / Bioconductor Programming workshop provides experienced an automatically-generated reference manual, which is a
[PDF] R For Dummies
2311 matches · R For Dummies is an introduction to the statistical programming language known as R We start A vector is the simplest type of data structure in R The R manual defines a vector as “a Mathematical functions: You can find these advanced functions on a technical Three‐dimensional scatterplots: cloud()
[PDF] advanced r programming hadley pdf
[PDF] advanced r programming wickham pdf
[PDF] advanced r statistical programming and data models pdf
[PDF] advanced reading and writing exercises
[PDF] advanced reading and writing syllabus pdf
[PDF] advanced unix commands cheat sheet
[PDF] advanced unix commands cheat sheet pdf
[PDF] advanced unix commands list with examples pdf
[PDF] advanced unix commands with examples pdf
[PDF] advanced unix pdf
[PDF] advantage of functional interface in java
[PDF] advantage of functional interface in java 8
[PDF] advantage of marker interface in java
[PDF] advantage of using interface in java
AdvancedR/BioconductorProgramming
Marc Carlson, Valerie Obenchain, Herve Pages, Paul Shannon, Dan Tenenbaum, MartinMorgan
115-16 October 2012
1 mtmorgan@fhcrc.orgContents
1 Introduction4
2 Packages5
2.1 Anatomy of a package
52.1.1 Essentials: a minimal package
52.1.2 A More Complete Package
92.2 Version Control - Introduction
102.3 Making the package more useful
112.4 Creating good packages and why it matters
112.4.1 Unit tests
112.4.2 Interoperability
122.4.3 From package toBioconductorpackage. . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 An Extended Example:MotifDb. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13
2.5.1 Introduction
132.5.2 Highlights
142.5.3 Package structure
142.5.4 Class Design
152.5.5 Classes and methods
172.5.6 The query method
172.5.7 zzz.R
182.5.8 Unit Tests
183 S4 classes and methods
2 03.1 Introduction
203.1.1 A dierent OO paradigm
203.1.2 S4 inBioconductor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21
3.1.3 From an end-user point of view
213.1.4 Chapter overview
233.2 Implementing theSNPLocationsclass. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.1 Choosing a good design
243.2.2 Class denition
253.2.3 Constructor
263.2.4 Implementinglength()and other accessors. . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.5 Theshowmethod. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
13.2.6 The validity method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.7 Coercion methods
303.3 Integrating theSNPLocationsclass to our package. . . . . . . . . . . . . . . . . . . . . . . . 32
3.3.1 Add theSNPLocations-class.Rle to the package. . . . . . . . . . . . . . . . . . . 32
3.3.2 Import the required packages and modify theNAMESPACEle. . . . . . . . . . . . . . 32
3.3.3 Add a man page for theSNPLocationsclass. . . . . . . . . . . . . . . . . . . . . . . . 3 3
3.3.4 Check the package
363.4 Extending an existing class
363.4.1 Constructor
373.4.2length(), accessors, andshowmethod. . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.3 The validity method
383.4.4 Coercion methods
393.5 Other important S4 features
403.6 Resources
404 Reference classes41
4.1 Introduction
414.2 Implementing reference classes
444.2.1 Fields
444.2.2 Inheritance
444.2.3 Best practices?
454.2.4 Cautions?
464.3 Exercises
475 Accessing Data: Data Base and Web Resources
4 85.1 Introduction
485.2 Creating other kinds of Annotation packages
505.3 Retrieving data from a web resource
515.3.1 Parsing XML
525.4 Setting up a package to expose a web service
555.5 Creating package accessors for a web service
565.5.1 Example: creatingkeytypesandcolsmethods. . . . . . . . . . . . . . . . . . . . . 56
5.5.2 Example 2: creating aselectmethod. . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.6 Retrieving data from a database resource
575.6.1 Getting a connection
585.6.2 Getting data out
585.6.3 Some basic SQL
595.6.4 Exploring the SQLite database fromR. . . . . . . . . . . . . . . . . . . . . . . . . . .60
5.7 Setting up a package to expose a SQLite database object
615.8 Creating package accessors for databases
625.8.1 Examples: creating acolsandkeytypesmethod. . . . . . . . . . . . . . . . . . . . 63
5.8.2 Example: creating akeysmethod. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.9 Creating a database resource from available data
655.9.1 Making a new connection
655.9.2 Importing data
655.9.3 Attaching other database resources
662
6 Performance: time and space6 9
6.1 Measuring performance
696.2 Debugging
716.2.1RWarnings and Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.3 Writing ecient scripts
726.3.1 Easy solutions
726.3.2 Moderate solutions
737 Using C Code75
7.1 Calling C from R
757.1.1 Example andRImplementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.1.2 The `.C' Interface
777.1.3 The `.Call' Interface
797.1.4Rcppandinline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82
7.2 Using C code in Packages
847.3 Debugging
857.4 EmbeddingR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85
7.4.1 Setup
857.4.2 Code
857.4.3 Compile and Run
867.4.4 Some Detail
877.5 Resources
888 Parallel Evaluation89
8.1Rparallelism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
8.2 Clusters and clouds
928.3 C parallelism
959 An Extended Example96
9.1 Package tour
969.1.1Bioconductorpackages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
9.1.2 Common work
ows 969.2 Highlights
969.2.1 Package structure
969.2.2 Classes and methods
979.2.3 Data resources
989.2.4 C code
989.2.5 ...
98References99
3Chapter 1
Introduction
The AdvancedR/BioconductorProgramming workshop provides experiencedRandBioconductorusersand package developers with an opportunity to develop advanced skills for creating performant, re-usable
software. This course is relevant toRsoftware development in general, but includes insights particularly
relevant to development of bioinformatics. The material is structured aroundRpackages and their im-plementation, including programming best practices, formal classes and methods, accessing data resources,
strategies for measuring performance and managing large data, interfacing C code, and parallel evaluation.
The course concludes with an extended tour of keyBioconductorpackages for representation and manipula-
tion of genomic data. Participants engage in lectures and hands-on exercises. Participants require a laptop
with internet access and a current browser.Dalgaard [
4 ] provides an introduction to statistical analysis withR. Kabalo [6] provides a broad survey ofR. Matlo [7] introducesRprogramming concepts. Chambers [3] provides more advanced insights into R. Gentleman [5] emphasizes use ofRfor bioinformatic programming tasks. TheRweb sitee numerates additional publications from the user community. TheR Studio
en vironmentpr ovidesa ni ce,c ross-platform environment for working inR.Table 1.1: Tentative schedule.Day 1
Morning Orientation;RandBioconductorPackages (package structure, name spaces, unit tests, documentation, version control). Afternoon Formal Classes and Methods (S4 and reference classes).Accessing Data Base (sqlite) and Web Resources.
Day 2Morning Assessing Performance and Data Size.
Calling C Code (.C and .Call interfaces).
Afternoon Parallel Evaluation.
Extended Example:IRanges,GenomicRanges,Biostringsand friends.4Chapter 2
Packages
2.1 Anatomy of a package
2.1.1 Essentials: a minimal package
We start with a shortad hocRfunction, one which proved useful in exploratory data analysis. If properly
generalized, it may useful to others, so we decide to make it into a package. The script loads a compendium of yeast expression data, and identies which of 500 genes had highly correlated expression over 200 experimental conditions: > correlationFinder <- function() + dataFile <- "sub_combined_complete_dataset_526G_198E.txt" + cor.threshold <- 0.85 + tbl <- read.table(dataFile, sep=?\t?, header=TRUE, quote=??, + comment.char=??, fill=TRUE, stringsAsFactors=FALSE) + rownames(tbl) <- tbl$X + exclude.these.columns <- !sapply(tbl, is,?numeric?) + if (any(exclude.these.columns)) + tbl <- tbl[, !exclude.these.columns] + mtx.cor <- cor(t(as.matrix(tbl)), use=?pairwise.complete.obs?) + mtx.cor <- upper.tri(mtx.cor) * mtx.cor + max <- nrow(mtx.cor) + ret <- list() + for (r in seq_len(max)) { + zz <- mtx.cor [r,] > cor.threshold + if (any(zz)) { + ret[[ rownames(mtx.cor)[r] ]] <- rownames(mtx.cor)[zz] + } # if any + } # for r 5 + ret You may wish to get a copy of this function intoRStudio. If so, follow these steps: ?From the Project menu, choose \New Project" ?If prompted, you may save (or not) your current workspace ?Click \Version Control" ?Click \Git" ?In the \Repository URL" box, pastehttps://github.com/dtenenba/AdvancedR_stage1 ?Press the Tab key. The \Project Directory Name" box is automatically lled in. ?Click \Create Project"Rprovides a function which helps us to create a fully-documented and easily shared package of code and
data. It creates a directory structure, and populates it with an almost-working set of les. We will examine
this directory structure, look at and make small modications to these automatically generated les, build
the package, and thenR CMD checkon it { a vital step when creating a package for distribution. > package.skeleton(?YeastmRNACor?, code_files=?yeastCorrelatedExpression.R?)These les and directories are created:
YeastmRNACor/Read-and-delete-me
YeastmRNACor/DESCRIPTION
YeastmRNACor/NAMESPACE
YeastmRNACor/man/correlationFinder.Rd
YeastmRNACor/man/YeastmRNACor-package.Rd
We will look at each of these, and addition les in Figure 2. 1 i nt urn.YeastmRNACor/Read-and-delete-me
1. Ed itt hehe lp lesk eletonsin man, possibly combining help les for multiple functions. 2. Ed itt heex portsi nNAMESPACE, and add necessary imports. 3.Pu tan yC /C++/Fortranc odein src.
4. If y ouh avecomp iledco de,ad da useDynLib()directive toNAMESPACE. 5.Run R CMD buildto build the package tarball.
6.Run R CMD checkto check the package tarball.
YeastmRNACor/DESCRIPTION
Package: YeastmRNACor
Type: Package
Title:
Y eastCor relationF inder
Version:
0. 99.0
Date: 2012-10-12
Author:
P aulSh annon
Maintainer:
P aulS hannon
Description:
F indS .cerevisiaegen eswi thc orrelatede xpressionLicense:
Ar tistic-2.0
6Figure 2.1: Package directory structure
7YeastmRNACor/NAMESPACE
exportPattern("^[[:alpha:]]+")YeastmRNACor/man/YeastmRNACor-package.Rd
\name{YeastmRNACor-package} \alias{YeastmRNACor-package} \alias{YeastmRNACor} \docType{package} \title{Yeast Correlation Finder
\description{ Find S.cerevisiae genes with correlated expression \details{ \tabular{ll}{Package: \tab YeastmRNACor\cr
Type: \tab Package\cr
Version: \tab 0.99.0\cr
Date: \tab 2012-10-12\cr
License: \tab Artistic-2.0\cr
\author{Paul Shannon
Maintainer: Paul Shannon
\references{ Allocco et al, 2004, "Quantifying the relationship between co-expression, co-regulation and gene function": \keyword{manip} Rdocumentation1provides a full list of the ocial keywords. YeastmRNACor/man/correlationFinder.Rd
\name{correlationFinder} \alias{correlationFinder} \title{ correlationFinder \description{1 8Finds yeast genes with correlated expression.
\usage{ correlationFinder() \details{ Calculates the upper triangular correlation matrix from mRNA expression data; identifies genes whose expression is highly correlated. \value{ A named list, in which the names are genes, and the values are the genes highly correlated to each of them. \author{Paul Shannon
\examples{ \dontrun{ correlated.list <- correlationFinder() \keyword{ array } \keyword{ manip } \keyword{ math } YeastmRNACor/R/yeastCorrelatedExpression.RThis le contains the original source code for our function.2.1.2 A More Complete Package
package.skeletoncreated only two sub-directories, and just ve les (see image above). A few more directories
and les are needed to create a fully-compliantBioconductorpackage, and a few more beyond that aresometimes needed as well. We will list and explain all of them here. TheMotifDbpackage, to be examined
later, will illustrate most of them.dataIf your package provides data which the user will load and use directly, then the standard approach
is to place a serialized (xxx.Rdata) le in the data directory. This le must then be documented as well, with a similarly named (xxx.Rd) man le. In other packages, data is provided only for packagetesting purposes, or the data is available to the user only through an interface, and in these cases the
data les reside in inst/extdata, as we will discuss. srcIf you have compiled code { typically C, C++, or Fortran { then the source les are placed here.vignettesVignettes are an essential tool, very helpful for introducing your package to users, and required
byBioconductor. They have an .Rnw sux, and consist of commentary intermixed with executable code. 9testsThis is the traditional directory in which to place test code for your package.R CMD checkautomati-
cally looks here. With the advent and popularity of the unitTest protocol, this directory contains just
one le containing one line, which provides a hook to run the unitTests, described below.instBy convention, theRpackage installer will place the contents of theinst/directory at the top level
of the installed package. inst/extdataAs mentioned above, this directory contains data les which are used for unitTests andexamples, or provided to the user after some processing. Files may be in a variety of formats, include
text tab-delimited or yaml les, or serialized into Rdata. Data provided directly to the user of the package goes in the data directory. inst/unitTestsOne or more unitTest les (discussed more fully below) can be placed here.inst/docHistorically, vignettes les were place here. The vignettes directory is now preferred, but this
directory is still supported.inst/scriptsTypically contains scripts used to create the package, for example, for parsing and transforming
data which then ends up in the data directory, or in inst/extdata.2.2 Version Control - Introduction
Version control is essential for:
?Saving your work ?Tracking the changes of a project ?Reverting to older versions ?Collaborating with othersBioconductorusesS ubversion,an dBioconductorpackage developers shouldl earnt heru dimentsof t hats ys-
tem. We are also intrigued byG itHub
wh ichpr ovidesan i nterestingm odelof di stributedc oded evelopment.Github is built on the
Gi t v ersioncon trols ystem. BioconductorusesSu bversion, andBioconductorpackage developers shouldl earnt her udimentsof t hat system. We are also intrigued byGi tHub
w hichp rovidesan i nterestingm odelof di stributedc odede vlopment.Github is built on the
Gi t v ersioncon trols ystem.We'll introduce Github in the context of the package we've just started working on. Our original script is
in this repository:https://github.com/dtenenba/AdvancedR_stage1. For now, just visit that URL with a web browser and look around. Notice that our original script is there, along with a data le. The minimal package is in a dierent repository,https://github.com/dtenenba/AdvancedR_stage2. We can clone, or check out, check this repository, from withinRStudioServer: ?From the Project menu, choose \New Project" ?If prompted, you may save (or not) your current workspace ?Click \Version Control" ?Click \Git" ?In the \Repository URL" box, pastehttps://github.com/dtenenba/AdvancedR_stage2 ?Press the Tab key. The \Project Directory Name" box is automatically lled in. ?Click \Create Project" The Github project is\cloned"into a directory calledAdvancedR_stage2. Your current working directoryis changed to this directory, both in theRconsole and in the File pane in the lower-right hand corner. Note
10that a Git pane appears in the pane at upper right. Those withoutRStudiocan check out the repository at
a command shell: git clone https://github.com/dtenenba/AdvancedR_stage2Note: Our use of version control in this course is a bit odd; We have several dierent repositories representing
a package at dierent stages of its evolution. In real life, there would probably just be a single repository
(though individual developers could create their own forks of it), and one could check out earlier iterations
of the package.2.3 Making the package more useful
Our package is great but it's of limited usefulness so far. It tries to open a le we may not have, and won't
run on any other le we may have. And we can't change the correlation threshold. Let's x that.We'll make several changes:
?Put the data le ininst/extdata. ?Add adataFileparameter tocorrelationFinder()with no default. ?Add acor.thresholdparameter tocorrelationFinder()with a default of 0.85. ?Update the man page to re ect these changes. Change the example so it works with the data le that's part of the package, (hint:?system.file) and remove thedontruntag so that the example is actually run. ?Extra credit: Write a rudimentary vignette. ?Make sure the package passesR CMD checkwithout warnings or errors. (Hint: use Tools/Shell to open a rudimentary command shell inRStudioServer). ?Install the package and view the man pages and vignette. Useexample()to run the example in the man page.Resources for this exercise:
?The Writing R Extensions Manual ?Source ofBioconductorPackages( logi nwi thu sernameand p assword' readonly'). The package, with these changes incorporated, can be found athttps://github.com/dtenenba/AdvancedR_stage3. Notice that it has a vignette. If a package has more than a couple of functions, a vignette is a must
(and in fact is a requirement forBioconductorpackages). A package that does not have a vignette will have
an automatically-generated reference manual, which is a compendium of all the man pages in the package,
but that doesn't tell you which function to run rst, or how to use the package for a given work ow. That'swhy vignettes are so critical, because as the name implies, they provide a narrative telling you how to use
the package. The vignette in this package isn't very comprehensive, but it hints at some future directions in
which the package could be taken.2.4 Creating good packages and why it matters
2.4.1 Unit tests
We will follow theBioconductorUnit Testing Guidelines page:http://www.bioconductor.org/developers/ unitTesting-guidelines 112.4.2 Interoperability
When creating a new package it is useful to familiarize yourself with pre-existing classes and methods.
Reusing the current infrastructure allows a new package to integrate smoothly with existing work ows.quotesdbs_dbs6.pdfusesText_12