Document Databases

Document (aka Document-Oriented) databases store semi-structured data, which has these characteristics:

  • flexible data format schema:; different records in the same collection can have different data formats

  • flexible data format modification: the format of records can be changed without having to modify a controlling schema

  • queries without SQL: queries are performed based on data objects, such as JSON

Processes

Generalized Data Pipeline processes can be expressed as follows:

Database Related Parameters

  • h - host: IP address or URL for the database

  • p - port: server listening port

  • n - name: such as ‘sales’

  • c - collection: such as ‘customers’, ‘products’

  • o - operation: action to be performed such as create, read, update, delete

Data Related Parameters

  • q - query parameters: identify existing database records for retrieval, update, or delete

  • d - data parameters: provide input data for addition or update

  • r - return values: provide key, value pairs of return values from operations such as retrieval

  • f - field name: such as ‘street’

  • v - value: such as ‘126 Main’

  • i - number of query parameters: enough key/value pairs to find the desired database documents

  • j - number of data parameters: provide key/value pairs for data to be added, inserted

  • k - number of return parameters: returns depend on the operation performed

Python Example using PyMongo

PyMongo is a MongoDB library for Python.

"""
mongodb_using_pymongo.py
performs a database update
"""

from pymongo import MongoClient

# Define parameters
host = "localhost"
port = 27017
name = "ml_database"
collection = "customers"
query = {"customer": "1"}
data = {"first_name": "john", "last_name": "jones"}

# Open the database.
connection = MongoClient(host=host, post=port)
database = connection(name)
database_collection = database(collection)

# Update a document.
result = database_collection.update_one(query, data)

# Close the database.
connection.close