Columnar Databases

Columnar databases store data tables by column rather than by row.

By storing data in columns rather than rows, the database can more precisely access the data it needs to answer a query rather than scanning and discarding unwanted data in rows.

Data Store Mechanism

In a columnar database, the values of a data column are serialized together.

First consider a simple example of a relational database table:

In a columnar database, the data would be stored like this:

Notice that:

  • the data is grouped by column

  • each column field is in the format data:row_id

  • the primary key is the data

Performance

The table below shows examples of where columnar and relational databases work best:

Python Example

"""
columnar_database_using_bigquery.py
performs a database query using google bigquery
"""

# Import needed library.
from google.cloud import bigquery

# Set a database client.
client = bigquery.Client()

# Define an SQL query.
query = client.query("""
    SELECT
      CONCAT(
        'https://stackoverflow.com/questions/',
        CAST(id as STRING)) as url,
      view_count
    FROM `bigquery-public-data.stackoverflow.posts_questions`
    WHERE tags like '%google-bigquery%'
    ORDER BY view_count DESC
    LIMIT 10""")

# Execute the query.
results = query.result()

Commercial Systems

Major systems include:

For a more complete list, click here.

References