• Overview
  • Calculus
    • Calculus Overview
    • Activation Functions
    • Differential Calculus
    • Euler's Number
    • Gradients
    • Integral Calculus
    • Logarithms
    • Rectifier Activation Function
    • Sigmoid Activation Function
    • Stochastic Gradient Descent
    • Tanh Activation Function
  • Computing Systems
    • Computing Systems Overview
    • Application Programming Interface
    • Big O Notation
    • Client-Server Architecture
    • Cloud Computing
    • DOM
    • Exponential Growth
    • Graphics Processing Units
    • HTML iframe
    • Hybrid Cloud Computing
    • Internet Protocol Suite
    • Machine Learning & AI Platforms
    • P Versus NP Complexity
    • Quantum Computing
    • Server
    • Software Containers
    • System Scaling
    • Web Crawler
  • Data
    • Data Overview
    • Columnar Databases
    • CSV Data
    • Data Cleaning
    • Data Discovery
    • Data ETL
    • Data Flow
    • Data Lake
    • Data Lakehouse
    • Data Pipeline
    • Data Visualization
    • Data Warehouse
    • Dimensionality Reduction
    • Document Databases
    • Extrapolation
    • Factor Analysis
    • Graph Databases
    • Interpolation
    • JSON
    • Large Data Querying
    • Normalization
    • Outliers
    • Principal Components Analysis
    • Relational Databases
    • Sampling
    • Signal Processing
    • Synthetic Data
    • Vector Databases
  • Linear Algebra
    • Linear Algebra Overview
    • Concatenation
    • Convolution
    • Eigenvalues and Eigenvectors
    • Linear Equations
    • Linear Vector Projection
    • Masking
    • Matrices
    • Pooling
    • Scalars
    • Softmax Function
    • Vectors
  • Models and Modeling
    • Models and Modeling Overview
    • AI Agents
    • Algorithm Libraries
    • Artificial General Intelligence
    • Artificial Narrow Intelligence
    • Artificial Neural Networks
    • Artificial Superintelligence
    • Artificial Universal Intelligence
    • Attention
    • Automated Machine Learning
    • Backpropagation
    • Causal Embedding
    • Classification
    • Cluster Analysis
    • Collaborative Filtering
    • Convolutional Neural Networks
    • Cross Decomposition
    • Curve Fitting
    • Decision Trees
    • Deep Learning
    • Deep Reasoning
    • Diffusion Models
    • Ensemble Learning
    • Explainability
    • Feature Selection
    • Fourier Analysis
    • Foundation Models
    • Gaussian Analysis
    • Generative Adversarial Networks
    • Generative AI
    • Gradient Boosting
    • Graphs
    • Histogram of Oriented Gradients
    • Image Processing
    • K-Means Clustering
    • Large Language Models
    • Linear Regression
    • Logistic Regression
    • Long Short-term Memory
    • Markov Chains
    • Model Alignment
    • Model Categories
    • Model Self Improvement
    • Modeling Process
    • Naive Bayes
    • Nearest Neighbors
    • Probabilistic Graphical Models
    • Prompts and Prompting
    • Random Forest
    • Recurrent Neural Networks
    • Regression Analysis
    • Regularization
    • Reinforcement Learning
    • Retrieval Augmented Generation
    • Supervised Learning
    • Support Vector Machines
    • Transformer Neural Networks
    • Unsupervised Learning
    • Word Embedding
  • Organization
    • Organization Overview
    • Agile Processes
    • Application Selection Process
    • Business Model Components
    • Chief AI Officer
    • Coding
    • Functional Groups
    • Governance
    • Implementation
    • Individuals
    • Research
    • Risks
    • Staying Current
  • Probability
    • Probability Overview
    • Central Limit Theorem
    • Cross Entropy Loss
    • Entropy
    • Independent Events
    • Law of Large Numbers
    • Mutually Exclusive Events
    • Normal Distribution
    • Poisson Distribution
    • Probability Density Function
    • Probability Measure
    • P-Value
  • Programming Constructs
    • Programming Constructs Overview
    • Abstraction
    • Array
    • Attribute
    • Best-first Search
    • Binary Search
    • Block
    • Branch
    • Callback
    • Class
    • Conditional
    • Constructor
    • Container/Collection
    • Dynamic Array
    • Dynamic Programming
    • Encapsulation
    • Exception
    • Expression
    • Function
    • Garbage Collection
    • Greedy Algorithms
    • Hash
    • HTTP Request
    • Identifier
    • Inheritance
    • Inner Class
    • Instance
    • Iterator
    • Keyword
    • Lambda
    • Libraries
    • List
    • Linked List
    • Literal
    • Metaclass
    • Method
    • Mixin
    • Object
    • Operator
    • Overloading
    • Overriding
    • Package
    • Parameter
    • Polymorphism
    • Primitive
    • Programming Process
    • Recursion
    • Reflection
    • Regular Expression
    • Reserved Word
    • Return
    • Sort
    • Statement
    • Switch
    • Table
    • This/Self
    • Token
    • Type
    • Variable
  • Statistics
    • Statistics Overview
    • Accuracy
    • A/B Testing
    • Bias
    • Bias-Variance Tradeoff
    • Confidence
    • Correlation
    • Confusion Matrix
    • Deviation
    • Dispersion
    • Estimator
    • Fairness
    • Loss (Cost) Function
    • Mean Squared Error
    • Hypothesis
    • Prediction and Inference
    • Repeatability
    • Standard Deviation
    • Statistical Power of a Test
    • Variance
  • Trigonometry
    • Trigonometry Overview
    • Cosine Similarity
    • Periodic Functions
    • Trigonometric Functions
  • Glossary and Index
  • Mathematical Symbols
  • Applications
  • Search
  • Blog
  • About the Author
  • Contact
  • Menu

The Science of Machine Learning & AI

Mathematics - Data Science - Computer Science
  • Overview
  • Calculus
    • Calculus Overview
    • Activation Functions
    • Differential Calculus
    • Euler's Number
    • Gradients
    • Integral Calculus
    • Logarithms
    • Rectifier Activation Function
    • Sigmoid Activation Function
    • Stochastic Gradient Descent
    • Tanh Activation Function
  • Computing Systems
    • Computing Systems Overview
    • Application Programming Interface
    • Big O Notation
    • Client-Server Architecture
    • Cloud Computing
    • DOM
    • Exponential Growth
    • Graphics Processing Units
    • HTML iframe
    • Hybrid Cloud Computing
    • Internet Protocol Suite
    • Machine Learning & AI Platforms
    • P Versus NP Complexity
    • Quantum Computing
    • Server
    • Software Containers
    • System Scaling
    • Web Crawler
  • Data
    • Data Overview
    • Columnar Databases
    • CSV Data
    • Data Cleaning
    • Data Discovery
    • Data ETL
    • Data Flow
    • Data Lake
    • Data Lakehouse
    • Data Pipeline
    • Data Visualization
    • Data Warehouse
    • Dimensionality Reduction
    • Document Databases
    • Extrapolation
    • Factor Analysis
    • Graph Databases
    • Interpolation
    • JSON
    • Large Data Querying
    • Normalization
    • Outliers
    • Principal Components Analysis
    • Relational Databases
    • Sampling
    • Signal Processing
    • Synthetic Data
    • Vector Databases
  • Linear Algebra
    • Linear Algebra Overview
    • Concatenation
    • Convolution
    • Eigenvalues and Eigenvectors
    • Linear Equations
    • Linear Vector Projection
    • Masking
    • Matrices
    • Pooling
    • Scalars
    • Softmax Function
    • Vectors
  • Models and Modeling
    • Models and Modeling Overview
    • AI Agents
    • Algorithm Libraries
    • Artificial General Intelligence
    • Artificial Narrow Intelligence
    • Artificial Neural Networks
    • Artificial Superintelligence
    • Artificial Universal Intelligence
    • Attention
    • Automated Machine Learning
    • Backpropagation
    • Causal Embedding
    • Classification
    • Cluster Analysis
    • Collaborative Filtering
    • Convolutional Neural Networks
    • Cross Decomposition
    • Curve Fitting
    • Decision Trees
    • Deep Learning
    • Deep Reasoning
    • Diffusion Models
    • Ensemble Learning
    • Explainability
    • Feature Selection
    • Fourier Analysis
    • Foundation Models
    • Gaussian Analysis
    • Generative Adversarial Networks
    • Generative AI
    • Gradient Boosting
    • Graphs
    • Histogram of Oriented Gradients
    • Image Processing
    • K-Means Clustering
    • Large Language Models
    • Linear Regression
    • Logistic Regression
    • Long Short-term Memory
    • Markov Chains
    • Model Alignment
    • Model Categories
    • Model Self Improvement
    • Modeling Process
    • Naive Bayes
    • Nearest Neighbors
    • Probabilistic Graphical Models
    • Prompts and Prompting
    • Random Forest
    • Recurrent Neural Networks
    • Regression Analysis
    • Regularization
    • Reinforcement Learning
    • Retrieval Augmented Generation
    • Supervised Learning
    • Support Vector Machines
    • Transformer Neural Networks
    • Unsupervised Learning
    • Word Embedding
  • Organization
    • Organization Overview
    • Agile Processes
    • Application Selection Process
    • Business Model Components
    • Chief AI Officer
    • Coding
    • Functional Groups
    • Governance
    • Implementation
    • Individuals
    • Research
    • Risks
    • Staying Current
  • Probability
    • Probability Overview
    • Central Limit Theorem
    • Cross Entropy Loss
    • Entropy
    • Independent Events
    • Law of Large Numbers
    • Mutually Exclusive Events
    • Normal Distribution
    • Poisson Distribution
    • Probability Density Function
    • Probability Measure
    • P-Value
  • Programming Constructs
    • Programming Constructs Overview
    • Abstraction
    • Array
    • Attribute
    • Best-first Search
    • Binary Search
    • Block
    • Branch
    • Callback
    • Class
    • Conditional
    • Constructor
    • Container/Collection
    • Dynamic Array
    • Dynamic Programming
    • Encapsulation
    • Exception
    • Expression
    • Function
    • Garbage Collection
    • Greedy Algorithms
    • Hash
    • HTTP Request
    • Identifier
    • Inheritance
    • Inner Class
    • Instance
    • Iterator
    • Keyword
    • Lambda
    • Libraries
    • List
    • Linked List
    • Literal
    • Metaclass
    • Method
    • Mixin
    • Object
    • Operator
    • Overloading
    • Overriding
    • Package
    • Parameter
    • Polymorphism
    • Primitive
    • Programming Process
    • Recursion
    • Reflection
    • Regular Expression
    • Reserved Word
    • Return
    • Sort
    • Statement
    • Switch
    • Table
    • This/Self
    • Token
    • Type
    • Variable
  • Statistics
    • Statistics Overview
    • Accuracy
    • A/B Testing
    • Bias
    • Bias-Variance Tradeoff
    • Confidence
    • Correlation
    • Confusion Matrix
    • Deviation
    • Dispersion
    • Estimator
    • Fairness
    • Loss (Cost) Function
    • Mean Squared Error
    • Hypothesis
    • Prediction and Inference
    • Repeatability
    • Standard Deviation
    • Statistical Power of a Test
    • Variance
  • Trigonometry
    • Trigonometry Overview
    • Cosine Similarity
    • Periodic Functions
    • Trigonometric Functions
  • Glossary and Index
  • Mathematical Symbols
  • Applications
  • Search
  • Blog
  • About the Author
  • Contact

Blog Special: The Accelerating Evolution of Artificial Intelligence

Copyright © 2016-2025 Don Cowan All Rights Reserved

Mathematical Notation Powered by CodeCogs

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Blog RSS

About the Author


From the research paper: An Image is Worth 16 x 16 Words: Transformers for Image Recognition at Scale

Transformers Expanding Scope

October 10, 2020 in Artificial Neural Network, Transformers, Attention, Normalization, Linear Vector Projection

Transformer Neural Networks (TNNs) have over the past couple of years begun to supplant Machine Learning model designs such as Recurrent Neural Networks for processing sequential data such as in language processing.

Recently, TNN use has expanded to image recognition as depicted in the diagram above, where:

  • Attention - Attention mechanisms let a Machine Learning model relate tokens, such as image patches in this case, to each other regardless of their distance between one another in a group

  • Embedding - similar to Word Embedding, the process of mapping values into vector numbers

  • Linear Projection - Linear Vector Projection of one vector onto another

  • MLP - Multi Layer Perceptron feedforward ANN

  • Multi-Head Attention - Attention mechanism that relate tokens to each other regardless of their distance between one another in a group

  • Norm - Normalization of data

  • Transformer - a Transformer Neural Networks are non-recurrent models used for processing sequential

Advancements such as image recognition TNNs are continuing the progress in improving Machine Learning model effectiveness and efficiency.

Tags: Artificial Neural Networks
Prev / Next

Blog


Featured Posts

Featured
AI Agents.png
Apr 29, 2025
Developments in AI Agents: Q1 2025 Landscape Analysis
Apr 29, 2025
Apr 29, 2025
AI in 2025.png
Apr 1, 2025
The Technical Evolution of AI in 2025
Apr 1, 2025
Apr 1, 2025
Executive Discussing AI.png
Feb 26, 2025
The Hurdles of AI Implementation: Navigating the Challenges for Enterprises
Feb 26, 2025
Feb 26, 2025
CAIO at work.png
Feb 13, 2025
The Chief AI Officer: Driving Enterprise Value in the Age of Artificial Intelligence
Feb 13, 2025
Feb 13, 2025
Worker with Robot.png
Jan 2, 2025
Thriving in the Age of Superintelligence: A Guide to the Professions of the Future
Jan 2, 2025
Jan 2, 2025
Use of AI in Medicine.jpg
Dec 20, 2024
AI in Medicine: Revolutionizing Healthcare
Dec 20, 2024
Dec 20, 2024
Model Fine Tuning.png
Nov 18, 2024
Recent Work on Large Language Model Fine Tuning
Nov 18, 2024
Nov 18, 2024
AI Spring.png
Nov 7, 2024
The New AI Spring: Why an AI Winter is Unlikely This Time
Nov 7, 2024
Nov 7, 2024
Extending Life Expectancy with AI.png
Oct 26, 2024
How AI Can Help Extend Life Expectancy
Oct 26, 2024
Oct 26, 2024
Living and Working with AI.png
Oct 25, 2024
How AI Will Change the Way We Live and Work
Oct 25, 2024
Oct 25, 2024