Introduction to Databases
11/06/2018 Database = Data + Base the actual storage of all the information that are ... The very central concepts of relational databases are Tables ...
Part I: Introduction to Databases Introduction to Database Concepts
be well defined so that changes in some parts do not seriously influence others. 8. Introduction to Databases. Data Models. Q. A collection of tools for
Database Concepts
In this chapter. » Introduction. » File System. » Database Management. System. » Relational Data Model. » Keys in a Relational. Database. Database. Concepts.
database-concepts.pdf
Relational database concepts in general. •. Concepts and terminology in Introduction to Oracle Database. •. The operating system environment under which you
INTRODUCTION TO DBMS
Module1: (12 Hrs). Introduction to database Systems Basic concepts &Definitions
Introduction to Database Concepts
Introduction to Database Concepts. A Table with a View When we think of databases we often think ... A relational database describes the.
DATABASE MANAGEMENT SYSTEM (DBMS)
Data base System Concepts Silberschatz
Introduction to Database Systems Fundamental Concepts
the users of the database. • The schema of a db is held in the data dictionary. Data Model. • A set of concepts that can be used to describe the.
Wayne County Community College District
CIS 120 Introduction to Database Concepts. CREDIT HOURS: 3.00. CONTACT HOURS: 45.00. COURSE DESCRIPTION: This course is designed to introduce the student to
Oracle Database Concepts
Oracle Database Concepts 11g Release 2 (11.2). E40540-04 Concepts and terminology in Chapter 1
[PDF] Introduction to Database Concepts - Everything Computer Science
Chapter 1 Introduction to Database Concepts 1 1 Databases and Database Systems A database management system (DBMS) is an aggregate of data hardware
[PDF] Introduction to Databases
11 jui 2018 · Take a look at the following file directories: The very central concepts of relational databases are Tables (Relations) Relationships
[PDF] Database Concepts - NCERT
7 1 INTRODUCTION After learning about importance of data in the previous chapter we need to explore the methods to store and manage data electronically
[PDF] Introduction to Database Concepts
Tables and Entities • A relational database describes the relationships among different kinds of data – Captures ideas like those defined in the Affinity
[PDF] database-conceptspdf - Oracle Help Center
This manual provides an architectural and conceptual overview of the Oracle database Concepts and terminology in Introduction to Oracle Database
[PDF] databasepdf
1 Part I: Introduction to Databases Kostis Sagonas 2 Introduction to Databases Introduction to Database Concepts Q Purpose of Database Systems
[PDF] Introduction to Database Systems Fundamental Concepts
1 Introduction to Database Systems Fundamental Concepts Werner Nutt (in relational data model) Instance DTDs specify the format of documents
[PDF] An Introduction to Database Systems 8e By C J Datepdf
Part V (eight chapters) shows how relational concepts are relevant to a variety of further aspects of database technology-security distributed databases
(PDF) Database System: Concepts and Design - ResearchGate
PDF An organization must have accurate and reliable data for effective decision 1 Introduction to Database 1 1 Meaning and Definition of Database
[PDF] DATABASE MANAGEMENT SYSTEM (DBMS) - mrcetacin
records to the appropriate files Before database management systems (DBMSs) were introduced organizations usually stored information in such systems
Introduction to Databases
Presented by
Yun Shen (yshen16@bu.edu)
Research Computing
Introduction
•What is Database •Key Concepts •Typical Applications and Demo •Lastest TrendsResearch Computing
What is Database
•Three levels to view: Database = Data + Base, the actual storage of all the information that are interested The software tool package that helps gatekeeper and manage data storage, access and maintenances. It can be either in personal usage scope (MS Access, SQLite) or enterprise level scope (Oracle, MySQL, MS SQL, etc). All the possible applications built upon the data stored in databases (web site,BI application, ERP etc).
Research Computing
Examples at each level
•Level 1: data collection text files in certain format: such as many bioinformatic databases the actual data files of databases that stored through certain DBMS, i.e.MySQL, SQL server, Oracle, Postgresql, etc.
•Level 2: Database Management (DBMS) SQL Server, Oracle, MySQL, SQLite, MS Access, etc. •Level 3: Database Application Web/Mobile/Desktop standalone application - e-commerce, online banking, online registration, etc.Research Computing
Examples at each level
•Level 1: data collection text files in certain format: such as many bioinformatic databases the actual data files of databases that stored through certain DBMS, i.e.MySQL, SQL server, Oracle, Postgresql, etc.
•Level 2: Database Management System (DBMS) SQL Server, Oracle, MySQL, SQLite, MS Access, etc. •Level 3: Database Application Web/Mobile/Desktop standalone application - e-commerce, online banking, online registration, Wikipedia, etc.Research Computing
Database Types
•Flat Model •Navigational databases •Relational Model •Object model •Document model •Entity-attribute-value model •Star schemaResearch Computing
Typical Database Application Architecture
Research Computing
appDBMSDB
Data File Structure
Research Computing
•Demo : MySQL -- C:\ProgramData\MySQL\MySQL Server 8.0\Data\Access -- C:\ARCS_dbtutorial\db\access\
Postgresql -- /project/scv/examples/db/tutorial/data/postgresql/testdb/Data File Structure - MySQL
Research Computing
Data File Structure - Access
Research Computing
Data File Structure - PostgreSQL
Research Computing
ATTENTION
Research Computing
NO database files can be accessed directly,
but only through the database engine, called "DBMS"Typical Database Application Architecture
Research Computing
appDBMSDB
Three Common Acronyms
•SQL - Structured Query Language •CRUD - Create, Read, Update, Delete •ACID - Atomicity, Concurrency, Integrity and Durability (transaction)Disadvantage of conventional flat file
different forms, for example, state name, phone number, etc. This made it hard to modify data and keep clean track of change; even loss of data data columns, thus hard to understand and manage once the data structure gets complexResearch Computing
First Acronym - ACID
Atomicity - transactions are either all or none (commit/rollback)Consistency - only valid data is saved
Isolation - transactions would not affect each otherDurability - written data will not be lost
Good example : bank transaction
Most of challenges of ACID compliance come from multiple users/concurrent using of databaseResearch Computing
How Databases solves the problem?
•Self-describing data collection of related records (meta data, data about data) , detail explanation as below: - Self-describing means: structureof data, that can be considered 'Meta data'; It includes many related info : table column definition, index and key info, constraints, etc related meta dataas well, it makes personalization and customization of the application according to user profile much easier to handle. The typical example could be the user preference for those common social media sites or e-commerce sites, etc.Research Computing
Database Content
Typical Database
Research Computing
•User data •Medadata •Application metadata •Index and other overhead •User Data: tables to store user data •Meta data: keep the structure (schema) of the data, including table name, column name and type and contraints over the column(s) •Application meta data: application specific meta data regarding to user settings or functions of the application •Index and other overhead data: used for improving performance and maintenance, such as logs, track, security, etc.Terminology and Concept - Tables (Relations)
The very central concepts of relational databases areTables(Relations),Relationships.Table (formally called 'relation') - is the building block of relational database. Itstores data in 2D, with its row reflects one instance of record (tuple), and eachof its column reflects one aspect of the attributes of all instances, column mayalso be called 'field'.
For example, A 'student' table may contains (student id, first name, last name,grade, school name, home address, ...), and each row may represent onestudent's information, and each column of the table represents one piece ofinformation of all students. And this is called a 'relation'.
Research Computing
Primary Key and Foreign Key
•Primary key:Unique Identifiermade of one or more columns to uniquely identify rows in a table. If the primary key contains more than one column, it can be called'composite key'as well. •Foreign Key:is the primary key of another table, which is referenced in the current table. It's the key to establish the relationship between the two tables, and through DBMS, to impose referential integrity.Research Computing
Surrogate Key
•Surrogate key is a unique column •added to a relation to use as the primary key when lack of natural column serves as primary key, or when composite key needs to be replaced for various reasons. •Surrogate key is usually in form of auto increment numeric value, and of no meaning to the user, and thus cd often hidden in the table, or form or other entity for the internal use. •Surrogate keys are often used in the place of composite key to add more flexibility to the table.Research Computing
Terminology and Concept - E-R model
E-R Model: Entity-Relationship data model is the common technique used in database design. It captures the relationships between database tables and represent them in a graphical way. The relationships between two entities can be 1:1, 1:N, or M:N. And it is usually established through 'foreign key' constraint.Examples:
1:1 Employee - Locker
1:N Customer - Order, Order - Order Detail
M:N Student - Course
Research Computing
Sample E-R
diagram #1Research Computing
Sample E-R
diagram #2Research Computing
Sample E-R
diagram #3Research Computing
One more concept - Normalization
Wikipedia definition:
•Database normalization, or simplynormalization, is the process of restructuring arelational databasein accordance with a series of so- callednormal formsin order to reducedata redundancyand improvedata integrity. It was first proposed byEdgar F. Coddas an integral part of hisrelational model. •Normalization entails organizing thecolumns(attributes) andtables(relations) of a database to ensure that theirdependenciesare properly enforced by database integrity constraints. It is accomplished by applying some formal rules either by a process ofsynthesis(creating a new database design) ordecomposition(improving an existing database design).Research Computing
Unnormalized Form (UNF) -
same attributes can be contained in one rowExample: Students take courses
Research Computing
Id NameCourses
1. JackMathematics
Chemistry
2. TimChemistry
3. AnaPhysics
Chemistry
IdNameCourse1Course2
1. JackMathematics Chemistry
2. TimChemistry
3. AnaPhysicsChemistry
orFirst Normal Form (1NF) -
each attributes can only have one single value in one row; no duplicated row; no row/column orderExample: Students take courses
Research Computing
Id NameCourse
1. JackMathematics
4. JackChemistry
2. TimChemistry
3. AnaPhysics
5. AnaChemistry
Second Normal Form (2NF) -
1NF + no partial dependency (non-key attributes may not depend on any of candidate keys)
Example: Students take courses
Research Computing
IdNameCourseCourseIDDepartment
1.JackMathematicsM-1Math
4.JackChemistryC-1Chemistry
2.TimChemistryC-1Chemistry
3.AnaPhysicsP-1Physics
5.AnaChemistryC-1Chemistry
Second Normal Form (2NF) - continue
1NF + no partial dependency (non-key attributes may depend on any of candidate keys)
(Course) = Func(CourseID) (Department) = Func(CourseID)Research Computing
IdNameCourseID
1.JackM-1
4.JackC-1
2.TimC-1
3.AnaP-1
5.AnaC-1
Course
IDCourseDepartment
M-1Mathematics Math
C-1ChemistryChemistry
P-1PhysicsPhysics
Third Normal Form (3NF) -
2NF + no transitive dependency (non-key attributes may not depend on each other)
Example: Students take courses
Research Computing
IdName CourseCourseID DepartmentBuilding_no building1.Jack Mathematics M-1Math31D
4.Jack ChemistryC-1Chemistry22A
2.Tim ChemistryC-1Chemistry22A
3.Ana PhysicsP-1Physics18S
5.Ana ChemistryC-1Chemistry22A
Third Normal Form (3NF) - continue
very similar to 2NF, but more strict; no functional dependency between non-key attributes at all.Research Computing
IdName CourseID Building_no
1.Jack M-131
4.Jack C-122
2.Tim C-122
3.Ana P-118
5.Ana C-122
Course
IDCourseDepartment
M-1Mathematics Math
C-1ChemistryChemistry
P-1PhysicsPhysics
Building_no Building
18S 22A31D
Boyce-Codd Normal Form (BCNF) -
3NF + non dependency between all candidate keys
Example: 3NF, but not BCNF - Today's Court BookingsResearch Computing
CourtStart TimeEnd TimeRate Type
109:3010:30SAVER
111:0012:00SAVER
114:0015:30STANDARD
210:0011:30PREMIUM-B
211:3013:30PREMIUM-B
215:0016:30PREMIUM-A
Boyce-Codd Normal Form (BCNF) -
3NF + non dependency between all candidate keys
Example: convert to BCNF - Today's Court Bookings
Research Computing
Member
FlagCourtStart Time End Time
Yes109:3010:30
Yes111:0012:00
No114:0015:30
No210:0011:30
No211:3013:30
Yes215:0016:30
Today's Bookings
Rate TypeCourtMember
FlagSAVER1Yes
STANDARD 1No
PREMIUM-A 2Yes
PREMIUM-B 2No
Rate Type
One More Example - database anormalies
Research Computing
My_Anomly_ex_bank_orig
100001 Lee, Yuan11 Main Street23994.584.50
100002 Lee, Yuan11 Main Street100.740.00
100002 Tian, Fu12 Main Street100.740.00
100003 Wang, Michele 12 Main Street2500.80.00
100004 Dong, Yuan14 Main Street32003.984.50
100052 Yuan, Ben16 Main Street37.380.00
One More Example
Research Computing
My_Anomly_ex_bank_orig
100001 Lee, Yuan11 Main Street23994.584.50
100002 Lee, Yuan11 Main Street100.740.00
100002 Tian, Fu12 Main Street100.740.00
100003 Wang, Michele 12 Main Street2500.80.00
100004 Dong, Yuan14 Main Street32003.984.50
100052 Yuan, Ben16 Main Street37.380.00
One More Example
Research Computing
My_Anomly_ex_bank_orig
100001 Lee, Yuan11 Main Street23994.584.50
100002 Lee, Yuan11 Main Street100.740.00
100002 Tian, Fu12 Main Street100.740.00
100003 Wang, Michele 12 Main Street2500.80.00
100004 Dong, Yuan14 Main Street32003.984.50
100052 Yuan, Ben16 Main Street37.380.00
One More Example
Research Computing
My_Anomly_ex_bank_orig
100001 Lee, Yuan11 Main Street23994.584.50
100002 Lee, Yuan11 Main Street100.740.00
100002 Tian, Fu12 Main Street100.740.00
100003 Wang, Michele 12 Main Street2500.80.00
100004 Dong, Yuan14 Main Street32003.984.50
100052 Yuan, Ben16 Main Street37.380.00
One More Example
Research Computing
My_Anomly_ex_bank_orig
100001 Lee, Yuan11 Main Street23994.584.50
100002 Lee, Yuan11 Main Street100.740.00
100002 Tian, Fu12 Main Street100.740.00
100003 Wang, Michele 12 Main Street2500.80.00
100004 Dong, Yuan14 Main Street32003.984.50
100052 Yuan, Ben16 Main Street37.380.00
Gu, Zhen (???)
Advantages of Normalization
BCNF+ normalization can eliminate all anomalies :
keep consistent (because of the key constraints), in DB terminology - get away with all update anormaly. can be modeled with clear logical relationships grows with least redundancy and much durability.Research Computing
Advantages of Normalization
•No(less) data redundancy - means easy management, less storage, etc. •No headache caused by data operation anomalies. Good for data integrity and consistency.Research Computing
Disadvantages of Normalization
•Take effort •May increase complexity in data structure •Data retrieving efficiency may be discounted due to the need of join of multiple tables; So may not be proper in read-intensive data applications •Sometimes the constraints may be too strict to be flexible to make some customized change needed.Research Computing
Disadvantages of Normalization
•Hard to deal with complex data structures such as class, objects, rows in a field. •Query for comprehensive information can be costly.[6] •Due to fixed predesigned structure, it is not flexible in terms of restructure of dataResearch Computing
Modern applications
•Today companies likeGoogle,AmazonandFacebookdeal with loads of data and storing that data in an efficient manner is always a big task. They useNoSQLdatabase which is based on the principles of unnormalized relational model to deal with storage issue.Some of the examples ofNoSQLdatabases areMongoDB,Apache CassandraandRedis. These databases are morescalableand easy to query with as they do not involve expensive operations likeJOIN.Research Computing
Denormalization
•Normalization and denormalization both have advantages and disavantages. The best practice is always a trade off between the two. •Denormalization will increase the risk of loss of data integrity and the size of storage, but may gain the simplicity and intuitivity of presenting data.Research Computing
Denormalization - Example
•Customer (CustomerID, Name, Address, Zip, City, State)Research Computing
CustomerID [PK]
NameAddress
Zip [FK]
Zip [PK]
City StateCustomerID [PK]
NameAddress
Zip City StateDenormalization - Example
•This is the normalized table designResearch Computing
ZipCityState
02132BostonMA
02451WalthamMA
CustomerID NameAddressZip
101John111 Main St02132
102Adam17 Willow St02451
103Grace333 Burke St02132
Denormalization - Example
•This is the denormalized table designResearch Computing
CustomIDNameAddressZipCityState
101John111 Main St02132Boston MA
102Adam17 Willow St02451Waltham MA
103Grace333 Burke St02132Boston MA
Database Operation/Administration
•CRUD (Create/Read/Update/Delete) - four basic operations •All through SQL (Structured Query Language) Definition(Create/Modify tables/columns) , Access Control (permission)Research Computing
How to Learn SQL
•Same approach as many other languages, get a language reference first m#SQLRF004(complete, but more complicated, and may add its own flavor) •Start from basics (DDL, DML, DQL) •Extend to more complicate elements (Subquery/Join/Function/StoreProcedure/Index/Programming)
Research Computing
Demos and Exercises
•Microsoft Access Templates •E-R schema in Microsoft Access •Anomaly •Data import/export via Microsoft Access (may do some real life demo using actual data file from audience) •Data import/export via MySQLResearch Computing
Other Stuffs not covered by this tutorial
•Database Administration: concurrency, security, backup, recovery, and many more •Database Performance tuning: indexing, server configurations •Database programming •Database Technology Trend: BigData challenge, Data Warehouse, BI system, NoSQL, Cloud, Hadoop/Spark, etc.Research Computing
BI Systems
•Reporting System •Data Mining (Has big overlap with today's ML/AI trend) •Data Warehouse/Data Mart •ETL (Extract/Transform/Load)Research Computing
Big Data
•4Vs:Research Computing
Big Data Technologies
•Predictiveanalytics: to discover, evaluate,optimize, and deploy predictive models by analyzingbig datasources to improve businessperformanceor mitigate risk.
•NoSQL databases: key-value, document,and graph databases.•Search and knowledgediscovery: tools and technologiesto support self-service extraction of information and new insights from largerepositories of unstructuredand structured datathat resides in multiple sources such as file systems, databases,streams, APIs, andother platforms and applications.
•Stream analytics: softwarethat can filter, aggregate,enrich, and analyzea high throughputof data from multiple disparate live datasources and in any dataformat.
•In-memory data fabric: provides low-latencyaccessand processing of large quantitiesof databy distributing dataacross the dynamicrandom access memory (DRAM), Flash, or SSD of a distributed computersystem.
•Distributedfile stores: a computernetwork where datais stored on more than one node, often in a replicated fashion, for redundancyand performance.
•Data virtualization:a technologythat delivers information from various data sources, including big datasources such as Hadoopanddistributed datastores in real-time and near-realtime.
•Data integration: tools for dataorchestration across solutions such as AmazonElastic MapReduce(EMR), Apache Hive, Apache Pig,ApacheSpark, MapReduce,Couchbase,Hadoop,and MongoDB.
•Data preparation: softwarethat eases the burden of sourcing, shaping,cleansing, and sharing diverse and messy datasets toacceleratedata'susefulness for analytics.
•Data quality: products that conductdata cleansing and enrichment on large, high-velocitydatasets, using parallel operationsondistributed datastores and databases.
Research Computing
Summary of Training
•List important points from each lesson. •Provide resources for more information on subject.Research Computing
quotesdbs_dbs20.pdfusesText_26[PDF] introduction to digital filters pdf
[PDF] introduction to econometrics (3rd edition solutions chapter 2)
[PDF] introduction to econometrics (3rd edition solutions chapter 5)
[PDF] introduction to econometrics 3rd edition solutions chapter 3
[PDF] introduction to econometrics 3rd edition solutions chapter 4
[PDF] introduction to emu8086
[PDF] introduction to financial management questions and answers pdf
[PDF] introduction to financial statements pdf
[PDF] introduction to food and beverage service
[PDF] introduction to french pronunciation pdf
[PDF] introduction to functions pdf
[PDF] introduction to geographic information systems pdf
[PDF] introduction to geospatial science pdf
[PDF] introduction to gis and remote sensing pdf