In previous articles, we explored a high-level overview of supervised and unsupervised neural networksA branch of computer science that focuses on creating systems capable of performing tasks that typically require human intelligence. These tasks include learning, reasoning, problem-solving, perception, and language understanding. AI can be categorized into narrow or weak AI, which is designed for specific tasks, and general or strong AI, which has the capability of performing any intellectual task that a human being can.
See More...See Less.... Today, we'll peel back another layer of this digital cortex and explore how these networks evolve from their 'naive' initial state to a 'trained' state where they can make decisions, identify patterns, or generate new creations. We previously explained concepts using coins as an analogy. We’ll continue a little further with that theme.
Supervised Learning with Coins: The Labeled Collection
In the world of machine learningA branch of computer science that focuses on creating systems capable of performing tasks that typically require human intelligence. These tasks include learning, reasoning, problem-solving, perception, and language understanding. AI can be categorized into narrow or weak AI, which is designed for specific tasks, and general or strong AI, which has the capability of performing any intellectual task that a human being can.
See More...See Less..., 'trainingThe process of teaching an artificial intelligence (AI) system to make decisions or predictions based on data. This involves feeding large amounts of data into the AI algorithm, allowing it to learn and adapt. The training can involve various techniques like supervised learning, where the AI is given input-output pairs, or unsupervised learning, where the AI identifies patterns and relationships in the data on its own. The effectiveness of AI training is critical to the performance and accuracy of the AI system.
See More...See Less...' a neural networkA neural network is an AI model inspired by the human brain's structure and function. It consists of layers of interconnected nodes (neurons) that can learn to perform tasks by adjusting the strength of these connections based on data.
See More...See Less... is akin to coaching a new employee through the intricacies of sorting coins—a task that can be approached in two distinct ways: supervised and unsupervised learningA type of machine learning where algorithms are trained on data without explicit instructions on what to do. The algorithms look for patterns and structures in the data on their own.
See More...See Less.... To grasp these concepts, it's crucial to understand 'labelsIn the context of AI, labeling is the process of identifying and marking data with labels to indicate the output or category that the data belongs to. This is crucial in supervised learning for training models to recognize patterns or make predictions.
See More...See Less....' Imagine each coin has a tag specifying its year of minting—that tag is a 'label.' It gives clear information about the coin, which can be used to guide the sorting process.
Now, let's explore how our coin analogy applies to these two foundational types of neural network training:
Aspect
Supervised LearningSupervised Learning is a type of machine learning where models are trained on labeled data to make predictions or decisions.
See More...See Less...
Unsupervised Learning
DataData, in everyday terms, refers to pieces of information stored in computers or digital systems. Think of it like entries in a digital filing system or documents saved on a computer. This includes everything from the details you enter on a website form, to the photos you take with your phone. These pieces of information are organized and stored as records in databases or as files in a storage system, allowing them to be easily accessed, managed, and used when needed.
See More...See Less... Type
Labeled Data
Unlabeled Data
Learning
The algorithm learns from the provided labels.
The algorithm infers patterns from the data.
Feedback
Direct Feedback (Corrective)
Indirect Feedback (No explicit correction)
Success Metric
Accuracy of sorting based on labels.
Quality of the discovered groupings or patterns.
In supervised learning, the 'training' involves a set of data that's already labeled—like our coins with year tags. This method is akin to giving the employee a reference guide to sort coins into pre-defined categories. They receive direct feedback—such as being corrected when a coin is placed in the wrong year—and their performance is measured by how accurately they can apply these learnings to new coins.
Conversely, unsupervised learning deals with unlabeled data, akin to a pile of coins without year tags. The employee—or algorithm in this case—learns to discern and create categories based on patterns they observe, such as color, size, and weight, sometimes employing clusteringA type of machine learning where algorithms are trained on data without explicit instructions on what to do. The algorithms look for patterns and structures in the data on their own.
See More...See Less... algorithms like K-meansK-means is a popular clustering algorithm that partitions a dataset into K distinct, non-overlapping clusters. It assigns data points to the closest cluster center, with the goal of minimizing the variance within each cluster.
See More...See Less... or using dimensionality reductionDimensionality in data refers to the number of attributes or features that represent a dataset. Dimensionality reduction techniques like PCA are used to reduce the number of features in a dataset by transforming the data into a new set of variables that retain most of the original data's variability.
See More...See Less... techniques like PCADimensionality in data refers to the number of attributes or features that represent a dataset. Dimensionality reduction techniques like PCA are used to reduce the number of features in a dataset by transforming the data into a new set of variables that retain most of the original data's variability.
See More...See Less... to uncover these patterns. They adjust their sorting criteria based on the inherent structure of the coins, without knowing if there's a 'right' or 'wrong' way to sort them. The success isn't about matching a label but about how effectively the algorithm can group coins and apply this to new sets.
Understanding these two approaches helps businesses decide how to implement machine-learning solutions. Whether sorting coins or sorting through complex data, the choice between supervised and unsupervised learning hinges on the nature of the data at hand and the specific goals of the task.
Balancing Knowledge and Flexibility in Training
When you're teaching an employee to sort coins by year, the goal is for them to grasp the broad featuresIn artificial intelligence, a feature is an individual measurable property or characteristic of a phenomenon being observed. Choosing informative, discriminating, and independent features is a crucial step for effective algorithms in pattern recognition, classification, and regression.
See More...See Less... that characterize coins from different eras, rather than memorizing the specific details of each coin in front of them. It's akin to guiding a student to understand the principles behind the lessons instead of memorizing the textbook. Just as an overzealous student might memorize facts without grasping the underlying concepts, struggling to adapt this knowledge to new problems, an employee might focus too narrowly on the coins they've already handled. To prevent this sort of "overfittingThis occurs in machine learning when a model learns the training data too well, including its noise and outliers. As a result, it performs poorly on new, unseen data because it has essentially memorized the training data rather than learning to generalize.
See More...See Less..." in training, we introduce a variety of coins from different piles or sets throughout the process, while also ensuring that the modelA model in machine learning is a mathematical representation of a real-world process learned from the data. It's the output generated when you train an algorithm, and it's used for making predictions.
See More...See Less... is not too simplistic to capture the underlying trend in the data, which could lead to “underfittingA modeling error in machine learning which occurs when a data model is too simple to capture the underlying pattern in the data. This often leads to poor predictive performance, as the model fails to generalize well from the training data to unseen data.
See More...See Less....”
In the image below, three neural network models illustrate the concepts of overfitting, ideal model complexity, and underfitting in machine learning. On the left, the model labeled "Overfitting" has an excessive number of connections and layers, representing a highly complex network that may perform exceptionally on training data but poorly on unseen data due to capturing noise rather than the underlying pattern. On the right, the "Underfitting" model has sparse connections and layers, suggesting a model that is too simple to capture the complexity of the data, resulting in poor performance on both training and new data. The middle model, marked "Ideal," shows a balanced structure with an optimal number of layers and connections, indicating a well-tuned model that generalizes well to new data. This visual metaphor helps convey the importance of model complexity in machine learning and the trade-off between a model's ability to learn from data and its capacity to generalize from that learning.
Cross-Validation Explained Through Coin Sorting
In cross-validationCross-validation is a statistical method used to estimate the skill of machine learning models. It is commonly used to validate a model's performance on an independent dataset in a robust way.
See More...See Less..., we don't just show the neural network one set of data during training. Instead, we divide our data into several parts, or 'foldsIn the context of k-fold cross-validation, 'folds' refer to the equally divided subsets of the dataset that are used to conduct multiple rounds of training and validation to assess the model's performance.
See More...See Less...'. The neural network trains on some of these folds and then validates what it has learned on a different fold, a bit like a pop quiz. We rotate which fold is used for validation, ensuring each part of the data is used for testing the model, mimicking k-fold cross-validationK-fold is a cross-validation method where a dataset is divided into k number of folds where each fold is used as a testing set at some point. This allows for a more robust model validation process.
See More...See Less... in practice.
Returning to our coin analogy, it would be like having several bags of coins and asking the employee to sort one bag at a time while using coins from the other bags to test their sorting rules. This way, you make sure they can't just memorize the coins in front of them; they need to learn the general sorting principles that can apply to any bag of coins they might encounter.
The Role of Cross-Validation in Avoiding Overfitting
By using cross-validation, you help ensure the employee doesn't overfit to one specific set of coins. They develop a robust understanding of how to sort any coin by year, not just the ones they've seen. In machine learning, cross-validation helps us catch overfitting early by showing us how well the model performs on different subsets of the data—giving us confidence that it's truly learning the patterns rather than memorizing the noise.
This balanced training, with the help of cross-validation, is crucial for creating a model—or training an employee—that performs well in the real world, ready for the variety and unpredictability of new coins or data it will encounter.
How AI Learns to Improve: The Importance of Feedback in Machine Learning
The loss functionA loss function is a method of evaluating how well specific algorithm models the given data. If predictions deviate from actual results, loss functions provide a measure of the error.
See More...See Less... in a neural network is akin to a performance review that guides the employee's improvement. It's a measure of how well the network is doing its job. If our employee incorrectly sorts a 1995 coin into the 1996 pile, the loss function is akin to the supervisor pointing out the mistake and suggesting a closer look at the coin's features.
In technical terms, the loss function calculates the difference between the neural network's predictions and the actual target values. It's the cornerstone of learning, as it provides a quantitativeQuantitative refers to a type of data that can be counted or measured, and expressed numerically, allowing for statistical analysis to identify patterns, trends, and predictions.
See More...See Less... basis for the network to adjust its weights, which means improving its sorting strategy. A good loss function will guide the network towards making fewer mistakes over time, just as constructive feedback helps our employee become more adept at sorting coins.
In essence, these components — avoiding overfitting and the wise use of a loss function — are critical for successfully training neural networks. They ensure that our digital 'employee' doesn't just memorize the data but learns to apply its rules to sort through any new coins — or data — it encounters.
Through iterations of this process, bolstered by feedback mechanisms like backpropagationBackpropagation is a method used in artificial neural networks to calculate the error contribution of each neuron after a batch of data is processed. It is a key algorithm used to train feedforward neural networks.
See More...See Less... and optimization algorithms, the neural network fine-tunes its parameters. This is how it evolves from a naïve state, with random guesses, to a trained state that makes informed predictions and decisions, much like our employee grows from a novice sorter to an expert coin classifier.
The Need for High Compute in Training: The Role of GPUs
Training an AIA branch of computer science that focuses on creating systems capable of performing tasks that typically require human intelligence. These tasks include learning, reasoning, problem-solving, perception, and language understanding. AI can be categorized into narrow or weak AI, which is designed for specific tasks, and general or strong AI, which has the capability of performing any intellectual task that a human being can.
See More...See Less... model is a resource-intensive task that involves processing vast amounts of data and performing complex mathematical operations millions or even billions of times. This is where GPUsA specialized electronic circuit designed to accelerate the processing of images and videos for output to a display. While originally designed for graphics, GPUs are now also used in a variety of computational tasks due to their parallel processing capabilities.
See More...See Less... (Graphics Processing UnitsA specialized electronic circuit designed to accelerate the processing of images and videos for output to a display. While originally designed for graphics, GPUs are now also used in a variety of computational tasks due to their parallel processing capabilities.
See More...See Less...) come into play. Originally designed for rendering graphics, GPUs are incredibly efficient at matrix and vector computations, which are fundamental to the operations in neural network training. Their architecture allows them to execute many parallel operations simultaneously, significantly accelerating the training process.
Parallel Processing Power
Unlike CPUsThe primary component of a computer that performs most of the processing. It interprets and carries out instructions, manages inputs and outputs, and communicates with all other hardware devices in the system.
See More...See Less..., which are optimized for sequential task processing and handling a broad range of computations, GPUs are composed of thousands of smaller, more efficient coresA core in a processor is an individual processing unit within a computer's CPU (Central Processing Unit). Multiple cores can handle different tasks simultaneously, improving overall computer performance.
See More...See Less... designed for parallel processingA method of performing multiple computations or processes simultaneously. This can be achieved using multiple processors in a computer system or by a single processor handling multiple tasks concurrently, often through time-slicing or multi-threading. Parallel processing is used to increase efficiency and speed in various computing tasks.
See More...See Less.... When training a neural network, a GPUA specialized electronic circuit designed to accelerate the processing of images and videos for output to a display. While originally designed for graphics, GPUs are now also used in a variety of computational tasks due to their parallel processing capabilities.
See More...See Less... can update thousands of weights at once, making it vastly faster than a CPUThe primary component of a computer that performs most of the processing. It interprets and carries out instructions, manages inputs and outputs, and communicates with all other hardware devices in the system.
See More...See Less... for this kind of task.
The Computational Heaviness of Training
During training, neural networks go through a process of backpropagation, where errors are calculated and propagated back through the networkA collection of interconnected computers, servers, and other devices that allow for the exchange and sharing of data and resources. Networks can be classified based on size, function, and access. Common types include Local Area Network (LAN), which connects devices in a localized area such as an office or home; Wide Area Network (WAN), which connects devices across large distances, possibly globally; and Virtual Private Network (VPN), which provides secure, encrypted connections over the internet. A network relies on standardized protocols, such as TCP/IP, to ensure uniform communication and data transfer between devices.
See More...See Less... to adjust the weights. This process requires a considerable amount of computation and memoryRefers to the components or devices where data is stored for immediate use in a computer or related computing device. Memory typically refers to Random Access Memory (RAM), which is the main memory used by a computer to store data temporarily while it is being processed or accessed by the CPU. This memory is volatile, meaning it loses its content when the computer is turned off.
See More...See Less... bandwidthThe capacity for transmitting data over a network connection or circuit, measured in bits per second. It indicates the maximum rate at which data can be sent, impacting the speed and efficiency of data transmission.
See More...See Less.... GPUs excel in this area due to their high number of cores and specialized design, which allows them to handle multiple calculations at lightning speeds. Additionally, tasks like gradient descent optimization and the tuning of hyperparameters are computationally expensive operations that benefit from the raw power of GPUs.
Why GPUs Aren't as Necessary for Inference
Once a model is trained, the heavy lifting has been done. The model no longer needs to learn; it simply applies what it has learned to make predictions. This task, while still computationally demanding, is less intense and can be handled efficiently by CPUs, which are more common in everyday devices. During inferenceIn the context of AI and machine learning, inference refers to the process of using a trained model to make predictions or decisions based on new, unseen data. It is applying the model to derive useful information from data.
See More...See Less..., the neural network performs a straightforward series of matrix multiplications as data passes through the trained network, a task well within the capabilities of modern CPUs, especially when optimized for these operations.
Energy Efficiency in Inference and the Role of Edge Devices
Energy efficiency becomes paramount when we talk about edge devices. But what exactly is an edge device? An edge device is a piece of hardware that processes data closer to the source of data generation (like a camera or a smartphone) rather than relying on a centralized data-processing warehouse. These devices typically have constraints on power, size, and compute capability. CPUs in these devices are engineered to provide the necessary compute power for real-time AI applications such as voice recognition or image processingThe technique of reading, manipulating, or altering digital images using a computer. It involves applying various methods and algorithms to enhance, analyze, or change the appearance of an image. Image processing is used in fields such as photography, medical imaging, and computer graphics.
See More...See Less..., while also conserving energy to maintain battery life and device longevity.
Division of Labor
In the realm of AI, GPUs play a crucial role in training models by leveraging their parallel processing capabilities to handle the computationally intense tasks of learning and optimization. Once the model is trained, however, the baton is passed to CPUs, especially in edge devices, for efficient and energy-conservative inference. This division of labor between GPUs for training and CPUs for inference is what enables AI to be both powerful in development and practical in deployment.