Does Github have a database?
Yet at its core, GitHub.com remained built around one main database cluster (called mysql1) that housed a large portion of the data used by core GitHub features, like user profiles, repositories, issues, and pull requests.
With GitHub’s growth, this inevitably led to challenges.
,
How does GitHub work?
All developers have their Git client configured with their correct email address and linked to their GitHub user.
The project has one of the team nominated as a ‘code owner’ for making sure that code reviews are done in time and to administer the repository.
,
More Partitioning
In addition to vertical partitioning to move database tables, we also use horizontal partitioning (aka sharding).
This allows us to split database tables across multiple clusters, enabling more sustainable growth.
We’ll detail the tooling, linters, and Rails improvements related to this in a future blog post.
,
Moving Data Without Downtime
A schema domain that is virtually isolated is ready to be physically moved to another database cluster.
To move tables on the fly, we use two different approaches: Vitess, and a custom write-cutover process.
,
Results
The main database cluster mysql1, mentioned in the introduction, housed a large portion of the data used by many of GitHub’s most important features, like users, repositories, issues, and pull requests.
Since 2019 we achieved the ability to scale this relational database with the following results:.
1) In 2019, mysql1answered 950,000 queries/s on av.
,
Virtual Partitions
The first concept we introduced was virtual partitions of database schemas.
Before database tables can be moved physically, we have to make sure they are separated virtuallyin the application layer, and this has to happen without impacting teams working on new or existing features.
To do that, we group database tables that belong together into sche.
,
What is GitHub task-based development?
This article proposes a process that splits development work into task-based GitHub branches, incorporates daily database builds and integration testing, and uses Redgate tools to automate tasks such as:
- provisioning
- database scripting
- testing
,
What is the main database cluster GitHub?
The main database cluster mysql1, mentioned in the introduction, housed a large portion of the data used by many of GitHub’s most important features, like users, repositories, issues, and pull requests.
Since 2019 we achieved the ability to scale this relational database with the following results:.