Databases
Introduction
Databases are like a smart repository of data, where you can keep data in a structured, organized, and secure manner. While you might argue something like, "Why can't I just keep them in a single txt file?", remember things starts to get messy when you have a lot of data (millions of entries and more) and you still want to be fast (finding one entry in any modern database will be much faster than from a txt file). In a way, databases allow for efficient retrieval, manipulation, and analysis of data. They also offer powerful querying and analysis capabilities, enabling data scientists and analysts to extract insights and make data-driven decisions. This makes them a powerful tool for data science projects.
This article will briefly focus on the different types of databases, their features and some important generic concepts. You can also find separate articles for individual databases in this section. So without further delay, let's get started!
Types of Databases
Relational Databases
Relational databases are the most common type of database. They are structured using tables, with each table containing rows and columns wherein each row represents a record, and each column contains a field. This simple structure makes it easier to query the database and find the information that is needed. Additionally, relational databases can be linked together using keys, allowing data to be shared between different tables. This makes it possible to link data from different sources and use it to generate meaningful insights. Hence the name “relational” database.
Comparison of Relational Databases
Feature / Database | Oracle | MySQL | Microsoft SQL Server | PostgreSQL | MariaDB | SQLite | IBM Db2 | SAP HANA |
---|---|---|---|---|---|---|---|---|
License | Commercial | Open Source | Commercial | Open Source | Open Source | Open Source | Commercial | Commercial |
ACID Compliance | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Scalability | Vertical/Horizontal | Vertical | Vertical/Horizontal | Vertical/Horizontal | Vertical/Horizontal | Limited | Vertical/Horizontal | Horizontal |
Partitioning | Yes | Yes | Yes | Yes | Yes | No | Yes | Yes |
Replication | Yes | Yes | Yes | Yes | Yes | No | Yes | Yes |
JSON Support | Yes | Limited | Yes | Yes | Yes | No | Yes | Yes |
Geospatial Support | Yes | Limited | No | Yes (PostGIS) | Yes | No | Yes | Yes |
Stored Procedures | Yes | Yes | Yes | Yes | Yes | No | Yes | Yes |
User Base | Large Enterprises | Web Applications | Enterprises | Diverse Applications | Web Applications | Embedded Applications | Enterprises | Enterprises |
Popularity Rank | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
Warning
Some details in the above table (like Popularity Rank) are as per the information aggregated from multiple sources at the time the article was written (August 2024). This could change over time.
This table summarizes the key features and characteristics of some of the top relational databases, highlighting their strengths and use cases. Oracle leads in enterprise environments, while MySQL and PostgreSQL are popular for web applications and diverse workloads. Microsoft SQL Server is favored in corporate settings, and MariaDB is recognized for its compatibility with MySQL. SQLite is widely used for lightweight applications, and IBM Db2 and SAP HANA cater to specific enterprise needs.
Concepts
ACID Compliance
ACID compliance refers to a set of principles that ensure reliable processing of database transactions. The acronym ACID stands for Atomicity, Consistency, Isolation, and Durability. Each of these properties plays a crucial role in maintaining the integrity and reliability of data within database systems. [1]
- Atomicity: Atomicity guarantees that a transaction is treated as a single, indivisible unit. This means that either all operations within the transaction are completed successfully, or none are applied at all. If any part of the transaction fails, the entire transaction is rolled back, ensuring that the database remains unchanged in the event of an error or failure.
- Consistency: Consistency ensures that a transaction brings the database from one valid state to another, by adhering to all predefined rules and constraints (PRIMARY KEY, FOREIGN KEY, NOT NULL, CHECK, etc). This property prevents illegal transactions that could corrupt the database, ensuring that all data remains valid and meaningful throughout the transaction process.
- Isolation: Isolation ensures that concurrent transactions do not interfere with each other. Each transaction is executed in such a way that it appears to be the only transaction being processed at that time, even if multiple transactions are occurring simultaneously. This prevents issues such as dirty reads, where one transaction reads data modified by another uncommitted transaction.
- Durability: Durability guarantees that once a transaction has been committed, it will remain so, even in the event of a system failure. This means that the changes made by the transaction are permanently recorded in the database, ensuring data persistence and reliability.
ACID compliance is critical for applications that require high levels of data integrity, such as financial systems, healthcare databases, and any other systems where data accuracy is paramount. It helps prevent data loss, ensures consistent data states, and facilitates reliable transaction processing, which is essential for maintaining trust and accuracy in data-driven applications.