Data Pipeline

Data Pipelines capture data inputs, retain data for a period of time, and deliver data to receivers.

The implementation of a data pipeline can take a number of forms:

Processes

Generalized Data Pipeline processes can be expressed as follows:

Terminology

Depending on the specific implementation of a Data Pipeline, the terminology used can vary:

Data

  • Data

  • Records

  • Streams

  • Messages

Indexes

  • Indexes

  • Topics

  • Consumers

Loading Functions

  • Loading

  • Ingestion

  • Importing

  • Registration

  • Subscribing

  • Publishing

  • Connectors

  • Producing/Producers

Queuing Functions

  • Queuing

  • Streaming

  • Logging

  • Storing

  • Messaging

  • Brokering/Brokers

  • Threading/Threads

  • Clustering/Clusters

Indexing Functions

  • Indexing

  • Tracking

  • Topics

Catalog Functions

  • Cataloging

  • Categorizing

Retrieval Functions

  • Connecting/Connectors

  • Listening/Listeners

  • Subscribing/Subscribers

  • Exporting

  • Reading

  • Consuming/Consumers

  • Subscribing

  • Distributing

  • Producing

Key Performance Factors

Key performance factors to consider and monitor include:

  • throughput

  • real-time response times

  • batch response times

  • queuing data retrieval time period

References