11 jui 2018 · Postgresql -- /project/scv/examples/db/tutorial/data/postgresql/testdb/ The very central concepts of relational databases are Tables
Previous PDF | Next PDF |
[PDF] (DBMS) Purpose of Database System - Skyline University College
Introduction to Database Concepts Q Purpose of Database Management System (DBMS) Q Database systems offer solutions to all the above problems 6
[PDF] (DBMS) Purpose of Database System
Introduction to Database Concepts Q Purpose of Database Management System (DBMS) Q Database systems offer solutions to all the above problems 6
[PDF] Introduction to Database Concepts
A relational database describes the relationships among different kinds of data – Captures ideas like those defined in the Affinity and Collection rules – Allows
[PDF] Introduction to Databases
11 jui 2018 · Postgresql -- /project/scv/examples/db/tutorial/data/postgresql/testdb/ The very central concepts of relational databases are Tables
[PDF] Database Concepts - Oracle Help Center
To use this manual, you must know the following: • Relational database concepts in general • Concepts and terminology in Introduction to Oracle Database
[PDF] Database Concepts - MSDIS - University of Missouri
Introduction Very early attempts to build GIS began from scratch, using limited tools like operating systems compilers More recently, GIS have been built
[PDF] Introduction to Database Systems Fundamental Concepts
A set of concepts that can be used to describe the structure of a database: the data types, relationships, constraints, semantics and operational behaviour
[PDF] Introduction to Database Systems - School of Computer Science
managing this information • Databases in CS • Databases are a 'core topic' in computer science • Basic concepts and skills with database systems are part of
[PDF] Chapter 1: Introduction Database Management System (DBMS)
Database System Concepts Database Management System (DBMS) □ Collection of interrelated data □ Set of programs to access the data □ DBMS contains
[PDF] Database Management Systems - dde gjust
1 1 Introduction A database-management system (DBMS) is a collection of interrelated data and a set of manual systems These systems We study languages for describing schemas, after introducing the notion of data models in the next
[PDF] introduction to digital filters pdf
[PDF] introduction to econometrics (3rd edition solutions chapter 2)
[PDF] introduction to econometrics (3rd edition solutions chapter 5)
[PDF] introduction to econometrics 3rd edition solutions chapter 3
[PDF] introduction to econometrics 3rd edition solutions chapter 4
[PDF] introduction to emu8086
[PDF] introduction to financial management questions and answers pdf
[PDF] introduction to financial statements pdf
[PDF] introduction to food and beverage service
[PDF] introduction to french pronunciation pdf
[PDF] introduction to functions pdf
[PDF] introduction to geographic information systems pdf
[PDF] introduction to geospatial science pdf
[PDF] introduction to gis and remote sensing pdf
Introduction to Databases
Presented by
Yun Shen (yshen16@bu.edu)
Research Computing
Introduction
•What is Database •Key Concepts •Typical Applications and Demo •Lastest TrendsResearch Computing
What is Database
•Three levels to view: Database = Data + Base, the actual storage of all the information that are interested The software tool package that helps gatekeeper and manage data storage, access and maintenances. It can be either in personal usage scope (MS Access, SQLite) or enterprise level scope (Oracle, MySQL, MS SQL, etc). All the possible applications built upon the data stored in databases (web site,BI application, ERP etc).
Research Computing
Examples at each level
•Level 1: data collection text files in certain format: such as many bioinformatic databases the actual data files of databases that stored through certain DBMS, i.e.MySQL, SQL server, Oracle, Postgresql, etc.
•Level 2: Database Management (DBMS) SQL Server, Oracle, MySQL, SQLite, MS Access, etc. •Level 3: Database Application Web/Mobile/Desktop standalone application - e-commerce, online banking, online registration, etc.Research Computing
Examples at each level
•Level 1: data collection text files in certain format: such as many bioinformatic databases the actual data files of databases that stored through certain DBMS, i.e.MySQL, SQL server, Oracle, Postgresql, etc.
•Level 2: Database Management System (DBMS) SQL Server, Oracle, MySQL, SQLite, MS Access, etc. •Level 3: Database Application Web/Mobile/Desktop standalone application - e-commerce, online banking, online registration, Wikipedia, etc.Research Computing
Database Types
•Flat Model •Navigational databases •Relational Model •Object model •Document model •Entity-attribute-value model •Star schemaResearch Computing
Typical Database Application Architecture
Research Computing
appDBMSDB
Data File Structure
Research Computing
•Demo : MySQL -- C:\ProgramData\MySQL\MySQL Server 8.0\Data\Access -- C:\ARCS_dbtutorial\db\access\
Postgresql -- /project/scv/examples/db/tutorial/data/postgresql/testdb/Data File Structure - MySQL
Research Computing
Data File Structure - Access
Research Computing
Data File Structure - PostgreSQL
Research Computing
ATTENTION
Research Computing
NO database files can be accessed directly,
but only through the database engine, called "DBMS"Typical Database Application Architecture
Research Computing
appDBMSDB
Three Common Acronyms
•SQL - Structured Query Language •CRUD - Create, Read, Update, Delete •ACID - Atomicity, Concurrency, Integrity and Durability (transaction)Disadvantage of conventional flat file
different forms, for example, state name, phone number, etc. This made it hard to modify data and keep clean track of change; even loss of data data columns, thus hard to understand and manage once the data structure gets complexResearch Computing
First Acronym - ACID
Atomicity - transactions are either all or none (commit/rollback)Consistency - only valid data is saved
Isolation - transactions would not affect each otherDurability - written data will not be lost
Good example : bank transaction
Most of challenges of ACID compliance come from multiple users/concurrent using of databaseResearch Computing
How Databases solves the problem?
•Self-describing data collection of related records (meta data, data about data) , detail explanation as below: - Self-describing means: structureof data, that can be considered 'Meta data'; It includes many related info : table column definition, index and key info, constraints, etc related meta dataas well, it makes personalization and customization of the application according to user profile much easier to handle. The typical example could be the user preference for those common social media sites or e-commerce sites, etc.Research Computing
Database Content
Typical Database
Research Computing
•User data •Medadata •Application metadata •Index and other overhead •User Data: tables to store user data •Meta data: keep the structure (schema) of the data, including table name, column name and type and contraints over the column(s) •Application meta data: application specific meta data regarding to user settings or functions of the application •Index and other overhead data: used for improving performance and maintenance, such as logs, track, security, etc.Terminology and Concept - Tables (Relations)
The very central concepts of relational databases areTables(Relations),Relationships.Table (formally called 'relation') - is the building block of relational database. Itstores data in 2D, with its row reflects one instance of record (tuple), and eachof its column reflects one aspect of the attributes of all instances, column mayalso be called 'field'.
For example, A 'student' table may contains (student id, first name, last name,grade, school name, home address, ...), and each row may represent onestudent's information, and each column of the table represents one piece ofinformation of all students. And this is called a 'relation'.
Research Computing
Primary Key and Foreign Key
•Primary key:Unique Identifiermade of one or more columns to uniquely identify rows in a table. If the primary key contains more than one column, it can be called'composite key'as well. •Foreign Key:is the primary key of another table, which is referenced in the current table. It's the key to establish the relationship between the two tables, and through DBMS, to impose referential integrity.Research Computing
Surrogate Key
•Surrogate key is a unique column •added to a relation to use as the primary key when lack of natural column serves as primary key, or when composite key needs to be replaced for various reasons. •Surrogate key is usually in form of auto increment numeric value, and of no meaning to the user, and thus cd often hidden in the table, or form or other entity for the internal use. •Surrogate keys are often used in the place of composite key to add more flexibility to the table.Research Computing
Terminology and Concept - E-R model
E-R Model: Entity-Relationship data model is the common technique used in database design. It captures the relationships between database tables and represent them in a graphical way. The relationships between two entities can be 1:1, 1:N, or M:N. And it is usually established through 'foreign key' constraint.Examples:
1:1 Employee - Locker
1:N Customer - Order, Order - Order Detail
M:N Student - Course
Research Computing
Sample E-R
diagram #1Research Computing
Sample E-R
diagram #2Research Computing
Sample E-R
diagram #3Research Computing
One more concept - Normalization
Wikipedia definition:
•Database normalization, or simplynormalization, is the process of restructuring arelational databasein accordance with a series of so- callednormal formsin order to reducedata redundancyand improvedata integrity. It was first proposed byEdgar F. Coddas an integral part of hisrelational model. •Normalization entails organizing thecolumns(attributes) andtables(relations) of a database to ensure that theirdependenciesare properly enforced by database integrity constraints. It is accomplished by applying some formal rules either by a process ofsynthesis(creating a new database design) ordecomposition(improving an existing database design).Research Computing
Unnormalized Form (UNF) -
same attributes can be contained in one rowExample: Students take courses
Research Computing
Id NameCourses
1. JackMathematics
Chemistry
2. TimChemistry
3. AnaPhysics
Chemistry
IdNameCourse1Course2
1. JackMathematics Chemistry
2. TimChemistry
3. AnaPhysicsChemistry
orFirst Normal Form (1NF) -
each attributes can only have one single value in one row; no duplicated row; no row/column orderExample: Students take courses
Research Computing
Id NameCourse
1. JackMathematics
4. JackChemistry
2. TimChemistry
3. AnaPhysics
5. AnaChemistry
Second Normal Form (2NF) -
1NF + no partial dependency (non-key attributes may not depend on any of candidate keys)
Example: Students take courses
Research Computing
IdNameCourseCourseIDDepartment
1.JackMathematicsM-1Math
4.JackChemistryC-1Chemistry
2.TimChemistryC-1Chemistry
3.AnaPhysicsP-1Physics
5.AnaChemistryC-1Chemistry
Second Normal Form (2NF) - continue
1NF + no partial dependency (non-key attributes may depend on any of candidate keys)
(Course) = Func(CourseID) (Department) = Func(CourseID)Research Computing
IdNameCourseID
1.JackM-1
4.JackC-1
2.TimC-1
3.AnaP-1
5.AnaC-1
Course
IDCourseDepartment
M-1Mathematics Math
C-1ChemistryChemistry
P-1PhysicsPhysics
Third Normal Form (3NF) -
2NF + no transitive dependency (non-key attributes may not depend on each other)
Example: Students take courses
Research Computing
IdName CourseCourseID DepartmentBuilding_no building1.Jack Mathematics M-1Math31D
4.Jack ChemistryC-1Chemistry22A
2.Tim ChemistryC-1Chemistry22A
3.Ana PhysicsP-1Physics18S
5.Ana ChemistryC-1Chemistry22A
Third Normal Form (3NF) - continue
very similar to 2NF, but more strict; no functional dependency between non-key attributes at all.Research Computing
IdName CourseID Building_no
1.Jack M-131
4.Jack C-122
2.Tim C-122
3.Ana P-118
5.Ana C-122
Course
IDCourseDepartment
M-1Mathematics Math
C-1ChemistryChemistry
P-1PhysicsPhysics
Building_no Building
18S 22A31D
Boyce-Codd Normal Form (BCNF) -
3NF + non dependency between all candidate keys
Example: 3NF, but not BCNF - Today's Court BookingsResearch Computing
CourtStart TimeEnd TimeRate Type
109:3010:30SAVER
111:0012:00SAVER
114:0015:30STANDARD
210:0011:30PREMIUM-B
211:3013:30PREMIUM-B
215:0016:30PREMIUM-A
Boyce-Codd Normal Form (BCNF) -
3NF + non dependency between all candidate keys
Example: convert to BCNF - Today's Court Bookings
Research Computing
Member
FlagCourtStart Time End Time
Yes109:3010:30
Yes111:0012:00
No114:0015:30
No210:0011:30
No211:3013:30
Yes215:0016:30
Today's Bookings
Rate TypeCourtMember
FlagSAVER1Yes
STANDARD 1No
PREMIUM-A 2Yes
PREMIUM-B 2No
Rate Type
One More Example - database anormalies
Research Computing
My_Anomly_ex_bank_orig
100001 Lee, Yuan11 Main Street23994.584.50
100002 Lee, Yuan11 Main Street100.740.00
100002 Tian, Fu12 Main Street100.740.00
100003 Wang, Michele 12 Main Street2500.80.00
100004 Dong, Yuan14 Main Street32003.984.50
100052 Yuan, Ben16 Main Street37.380.00
One More Example
Research Computing
My_Anomly_ex_bank_orig
100001 Lee, Yuan11 Main Street23994.584.50
100002 Lee, Yuan11 Main Street100.740.00
100002 Tian, Fu12 Main Street100.740.00
100003 Wang, Michele 12 Main Street2500.80.00
100004 Dong, Yuan14 Main Street32003.984.50
100052 Yuan, Ben16 Main Street37.380.00
One More Example
Research Computing
My_Anomly_ex_bank_orig
100001 Lee, Yuan11 Main Street23994.584.50
100002 Lee, Yuan11 Main Street100.740.00
100002 Tian, Fu12 Main Street100.740.00
100003 Wang, Michele 12 Main Street2500.80.00
100004 Dong, Yuan14 Main Street32003.984.50
100052 Yuan, Ben16 Main Street37.380.00
One More Example
Research Computing
My_Anomly_ex_bank_orig
100001 Lee, Yuan11 Main Street23994.584.50
100002 Lee, Yuan11 Main Street100.740.00
100002 Tian, Fu12 Main Street100.740.00
100003 Wang, Michele 12 Main Street2500.80.00
100004 Dong, Yuan14 Main Street32003.984.50
100052 Yuan, Ben16 Main Street37.380.00
One More Example
Research Computing
My_Anomly_ex_bank_orig
100001 Lee, Yuan11 Main Street23994.584.50
100002 Lee, Yuan11 Main Street100.740.00
100002 Tian, Fu12 Main Street100.740.00
100003 Wang, Michele 12 Main Street2500.80.00
100004 Dong, Yuan14 Main Street32003.984.50
100052 Yuan, Ben16 Main Street37.380.00
Gu, Zhen (???)
Advantages of Normalization
BCNF+ normalization can eliminate all anomalies :
keep consistent (because of the key constraints), in DB terminology - get away with all update anormaly. can be modeled with clear logical relationships grows with least redundancy and much durability.Research Computing
Advantages of Normalization
•No(less) data redundancy - means easy management, less storage, etc. •No headache caused by data operation anomalies. Good for data integrity and consistency.Research Computing
Disadvantages of Normalization
•Take effort •May increase complexity in data structure •Data retrieving efficiency may be discounted due to the need of join of multiple tables; So may not be proper in read-intensive data applications •Sometimes the constraints may be too strict to be flexible to make some customized change needed.Research Computing
Disadvantages of Normalization
•Hard to deal with complex data structures such as class, objects, rows in a field. •Query for comprehensive information can be costly.[6] •Due to fixed predesigned structure, it is not flexible in terms of restructure of dataResearch Computing
Modern applications
•Today companies likeGoogle,AmazonandFacebookdeal with loads of data and storing that data in an efficient manner is always a big task. They useNoSQLdatabase which is based on the principles of unnormalized relational model to deal with storage issue.Some of the examples ofNoSQLdatabases areMongoDB,Apache CassandraandRedis. These databases are morescalableand easy to query with as they do not involve expensive operations likeJOIN.Research Computing
Denormalization
•Normalization and denormalization both have advantages and disavantages. The best practice is always a trade off between the two. •Denormalization will increase the risk of loss of data integrity and the size of storage, but may gain the simplicity and intuitivity of presenting data.Research Computing
Denormalization - Example
•Customer (CustomerID, Name, Address, Zip, City, State)Research Computing
CustomerID [PK]
NameAddress
Zip [FK]
Zip [PK]
City StateCustomerID [PK]
NameAddress
Zip City StateDenormalization - Example
•This is the normalized table designResearch Computing
ZipCityState
02132BostonMA
02451WalthamMA
CustomerID NameAddressZip
101John111 Main St02132
102Adam17 Willow St02451
103Grace333 Burke St02132
Denormalization - Example
•This is the denormalized table designResearch Computing
CustomIDNameAddressZipCityState
101John111 Main St02132Boston MA
102Adam17 Willow St02451Waltham MA
103Grace333 Burke St02132Boston MA
Database Operation/Administration
•CRUD (Create/Read/Update/Delete) - four basic operations •All through SQL (Structured Query Language) Definition(Create/Modify tables/columns) , Access Control (permission)Research Computing
How to Learn SQL
•Same approach as many other languages, get a language reference first m#SQLRF004(complete, but more complicated, and may add its own flavor) •Start from basics (DDL, DML, DQL) •Extend to more complicate elements (Subquery/Join/Function/StoreProcedure/Index/Programming)
Research Computing
Demos and Exercises
•Microsoft Access Templates •E-R schema in Microsoft Access •Anomaly •Data import/export via Microsoft Access (may do some real life demo using actual data file from audience) •Data import/export via MySQLResearch Computing
Other Stuffs not covered by this tutorial
•Database Administration: concurrency, security, backup, recovery, and many more •Database Performance tuning: indexing, server configurations •Database programming •Database Technology Trend: BigData challenge, Data Warehouse, BI system, NoSQL, Cloud, Hadoop/Spark, etc.Research Computing
BI Systems
•Reporting System •Data Mining (Has big overlap with today's ML/AI trend) •Data Warehouse/Data Mart •ETL (Extract/Transform/Load)Research Computing
Big Data
•4Vs:Research Computing
Big Data Technologies
•Predictiveanalytics: to discover, evaluate,optimize, and deploy predictive models by analyzingbig datasources to improve businessperformanceor mitigate risk.
•NoSQL databases: key-value, document,and graph databases.•Search and knowledgediscovery: tools and technologiesto support self-service extraction of information and new insights from largerepositories of unstructuredand structured datathat resides in multiple sources such as file systems, databases,streams, APIs, andother platforms and applications.
•Stream analytics: softwarethat can filter, aggregate,enrich, and analyzea high throughputof data from multiple disparate live datasources and in any dataformat.
•In-memory data fabric: provides low-latencyaccessand processing of large quantitiesof databy distributing dataacross the dynamicrandom access memory (DRAM), Flash, or SSD of a distributed computersystem.
•Distributedfile stores: a computernetwork where datais stored on more than one node, often in a replicated fashion, for redundancyand performance.
•Data virtualization:a technologythat delivers information from various data sources, including big datasources such as Hadoopanddistributed datastores in real-time and near-realtime.
•Data integration: tools for dataorchestration across solutions such as AmazonElastic MapReduce(EMR), Apache Hive, Apache Pig,ApacheSpark, MapReduce,Couchbase,Hadoop,and MongoDB.
•Data preparation: softwarethat eases the burden of sourcing, shaping,cleansing, and sharing diverse and messy datasets toacceleratedata'susefulness for analytics.
•Data quality: products that conductdata cleansing and enrichment on large, high-velocitydatasets, using parallel operationsondistributed datastores and databases.