Unit 1
Machine Learning (ML)
Tutorial
Machine Learning, often abbreviated as ML is
a branch of Artificial Intelligence (AI) that
works on algorithm developments and statistical models that allow computers to
learn from data and make predictions or decisions without being explicitly
programmed. Hence, in simpler terms, machine learning allows
computers to learn from data and make decisions or predictions without being
explicitly programmed to do so. Essentially, machine learning
algorithms learn patterns and relationships from data, allowing them to
generalize from instances and make predictions or conclusions on new and
uncovered data.
How does Machine Learning Work?
Broadly Machine Learning process includes Project Setup, Data
Preparation, Modeling and Deployment. The following figure demonstrates the
common working process of Machine Learning. It follows some set of steps to do
the task; a sequential process of its workflow is as follows -
Stages
of Machine Learning
A detailed sequential process of Machine Learning includes some
set of steps of phases which are as –

1.
Data collection: Data collection is an initial step in the process of machine
learning. Data is a fundamental part of machine learning, the quality and
quantity of your data can have direct consequences for model performance.
Different sources such as databases, text files, pictures, sound files, or web
scraping may be used for data collection. Data needs to be prepared for machine
learning once it has been collected. This process is to organize the data in an
appropriate format, such as a CSV file or database, and make sure that they are
useful for solving your problem.
2.
Data pre-processing: Pre-processing of data is a key step in the process of
machine learning. It involves deleting duplicate data, fixing errors, managing
missing data either by eliminating or filling it in, and adjusting and
formatting the data. Pre-processing improves the quality of your data and
ensures that your machine-learning model can read it right. The accuracy of
your model may be significantly improved by this step.
3.
Choosing the right model: The next step is to select a machine learning model; once
data is prepared then we apply it to ML Models like Linear regression, decision
trees, and Neural Networks that may be selected to implement. The selection of
the model generally depends on what kind of data you're dealing with and your
problem. The size and type of data, complexity, and computational resources
should be taken into account when choosing a model to apply.
4.
Training the model: The next step is to train it with the data that has been
prepared after you have chosen a model. Training is about connecting the data
to the model and enabling it to adjust its parameters to predict output more
accurately. Overfitting and underfitting must be avoided during the training.
5.
Evaluating the model: It is important to assess the model's performance before
deployment as soon as a model has been trained. This means that the model has
to be tested on new data that they haven't been able to see during training.
Accuracy in classifying problems, precision and recall for binary
classification problems, as well as mean error squared with regression problems,
are common metrics to evaluate the performance of a model.
6.
Hyperparameter tuning and optimization: You may need to adjust its hyperparameters
to make it more efficient after you've evaluated the model. Grid searches,
where you try different combinations of parameters, and cross-validation, where
you divide your data into subsets and train your model on each subset, to
ensure that it performs well on different data sets, are techniques for
hyperparameter tuning.
7.
Predictions and deployment: As soon as the model has been programmed and optimized, it
will be ready to estimate new data. This is done by adding new data to the
model and using its output for decision-making or other analysis. The
deployment of this model involves its integration into a production environment
where it is capable of processing real-world data and providing timely information
Types of Machine Learning
Machine learning models fall into the following categories:
1.
Supervised Machine Learning (SVM): Supervised machine learning uses
labeled datasets to train algorithms to classify data or predict outcomes. As
input data is inputted into the model, its weights modify until it fits into
the model; this process is known as cross validation which ensures the model is
not overfitted or underfitted.

Supervised learning helps organizations scale real-world challenges like spam
classification in a different folder from your inbox. Different methods for
supervised learning include neural networks, naïve Bayes, linear regression,
logistic regression, random forest, and SVM.
2.
Unsupervised Machine Learning: Unsupervised machine learning analyses
and clusters unlabelled datasets using machine learning methods. The algorithms
find hidden patterns or data groupings without human interaction. This method
is useful for exploratory data analysis, cross-selling, consumer segmentation,
and image and pattern recognition.

It also reduces model features through dimensionality reduction using prominent
methods of Principal component analysis (PCA) and singular value decomposition
(SVD). Neural networks, k-means clustering, and probabilistic clustering are
some popular methods of unsupervised learning.
3.
Semi-supervised learning: As its name
implies; Semi-supervised learning is
an integration of supervised and unsupervised learning. This method uses both
labeled and unlabelled data to train ML models for classification and
regression tasks. Semi-supervised learning is a best practice to utilize to
solve the problem where a user doesn't have enough labeled data for a
supervised learning algorithm.

Hence, it's an appropriate method to solve the problem where data is partially
labeled or unlabelled. Self-training, co-training, and graph-based labeling are
some of the popular Semi-supervised learning methods.
4.
Reinforcement Machine Learning: Reinforcement machine learning is
a type of machine learning model that is similar to supervised learning but
does not use sample data to train the algorithm. This model learns by trial and
error.

A series of good results will be reinforced to create the optimal proposal or
policy for a specific problem.
Common
Machine Learning Algorithms
Several machine learning algorithms are commonly used. These
include:
·
Neural networks: Neural networks
function similarly to the human brain, comprising multiple linked processing
nodes. Neural networks excel at pattern identification and are used in
different applications such as natural language processing, image recognition,
speech recognition, and creating images.
·
Linear regression: This algorithm
predicts numerical values using a linear relationship between variables. For
example, linear regression is used to forecast housing prices based on past
data in a particular area.
·
Logistic regression: This supervised
learning method predicts categorical variables, such as "yes/no"
replies to questions. It is suitable for applications such as spam
classification and quality control on a production line.
·
Clustering: Clustering algorithms
use unsupervised learning to find patterns in data and organise it accordingly.
Computers can assist data scientists by identifying differences between data
items that humans have overlooked.
·
Decision trees: Decision trees are
useful for categorising data and for regression analysis, which predicts
numerical values. A tree structure can be used to illustrate the branching sequence
of linked decisions used in decision trees. Unlike neural networks, decision
trees can be easily validated and audited.
·
Random forests: ML predicts a value or
category by integrating results from different decision trees.
Importance of Machine Learning
Machine Learning is important in automation, extracting insights
from data, and decision-making processes. It has its significance due to the
following reasons:
·
Data processing: The main reason
machine learning has become so important is to process and make sense of large
amounts of data. Traditional methods of data analysis are becoming insufficient
given the explosion of digital information coming from social media, sensors,
and other sources. This data is important and reveals hidden patterns and provides
invaluable insight for decision-making processes, which can be exploited by
machine learning algorithms.
·
Data-driven insights: Machine learning
algorithms can find patterns, trends, and correlations in big data sets that
humans cannot. Better decisions and forecasts can be made with this
information.
·
Automation: Machine learning
automates manual activities, saving time and decreasing errors by learning from
data and improving over time, ML algorithms can perform previously manual
tasks, freeing humans to focus on more complex and creative tasks. This not
only increases efficiency but also opens up new possibilities for innovation.
Data entry, classification, and anomaly detection can be automated with machine
learning.
·
Personalization: User preferences and
behavior can be analyzed using machine learning algorithms to generate
personalized recommendations and experiences. It is most widely used in social
media like e-commerce, and streaming services by providing a way to increase
user engagement and satisfaction.
·
Predictive analytics: Models of machine
learning may be trained to predict subsequent outcomes based on past data. This
is useful for different applications like sales forecasts, risk management, and
demand planning.
·
Optimization: Machine learning
algorithms optimize systems and processes for efficiency and performance. Their
smart grid optimizations include supply chain logistics, resource allocation,
and energy consumption.
·
Pattern recognition: Machine learning is
useful in image, audio, and natural language processing because it can
recognize complicated data patterns easily and timely.
·
Healthcare: Machine learning is
used in disease diagnosis, outbreaks; personalized patient treatment plans,
personalized treatment planning, medical imaging accuracy, and drug discovery.
It accurate diagnosis, medical image processing, genomic data, and electronic
health records.
·
Finance: Machine learning is used for
credit scoring, algorithmic trading, and fraud detection.
·
Retail: Machine learning can also be used
for recommendation systems, supply chains, or customer service.
·
Fraud detection and cybersecurity: Machine
Learning algorithms can detect patterns of fraudulent behavior for financial
transactions by detecting and mitigating security threats in real-time, it is
used to enhance cybersecurity as well.
·
Continuous improvement: It is possible
to train and update machine learning models with new data at regular intervals,
enabling them to adapt to changes in the environment as well as improve over
time.
Machine Learning enables organizations to take advantage of the
power of data to gain insight, streamline processes and drive innovation
throughout a variety of sectors.
Applications of Machine Learning
Nowadays; Machine Learning is used almost everywhere. However,
some most commonly used applicable areas of Machine Learning are:
·
Speech recognition: It is also known as
automatic speech recognition (ASR), computer speech recognition, or
speech-to-text, and it is a capability that uses natural language processing (NLP)
to translate human speech into a written format. To perform voice search, such
as Siri, or improve text accessibility, a large number of Mobile Devices
incorporate speech recognition into their systems.
·
Customer service: Chatbots are replacing
human operators on websites and social media, affecting client engagement.
Chatbots answer shipping FAQs, offer personalized advice, cross-sell products,
and recommend sizes. Some common examples are virtual agents on e-commerce
sites, Slack and Facebook Messenger bots, and virtual and voice assistants.
·
Computer vision: This artificial
intelligence technology allows computers to derive meaningful information from
digital images, videos, and other visual inputs that can then be used for
appropriate action. Computer vision, powered by convolutional neural networks,
is used for photo tagging on social media, radiology imaging in healthcare, and
self-driving cars in the automotive industry.
·
Recommendation engines: AI algorithms
may help to detect trends in data that might be useful for developing more
efficient marketing strategies using past data patterns. Online retailers use
recommendation engines to provide their customers with relevant product
recommendations for the purchasing process.
·
Robotic process automation (RPA): Also
known as software robotics, RPA uses intelligent automation technologies to
perform repetitive manual tasks.
·
Automated stock trading: AI-driven
high-frequency trading platforms are designed to optimize stock portfolios and
make thousands or even millions of trades each day without human intervention.
·
Fraud detection: Machine learning is
capable of detecting suspected transactions for banks and others in the
financial sector. A model can be trained by supervised learning, based on
knowledge of recent fraudulent transactions. Anomaly detection may identify
transactions that appear unusual, and need to be followed up.
Target
Audience
This machine learning
tutorial has been prepared for those who want to learn about the
basics and advances of Machine Learning. In a broader sense; ML is a subset of
Artificial Intelligence (AI) that focuses on developing algorithms and models
that allow computers to learn from data and make predictions or decisions
without being explicitly programmed to do so. Machine learning requires data.
This data can be text, images, audio, numbers, or video. The quality and
quantity of data considerably affect machine learning model performance.
Features are data qualities used to predict or decide. Feature selection and
engineering entail selecting and formatting the most relevant features for the
model.
Prerequisites to Learn Machine Learning
You should have a basic understanding of the technical aspects of
Machine Learning. Learners should be familiar with data, information, and its
basics. Knowledge of Data, information, structured data, unstructured data,
semi-structured data, data processing, and Artificial Intelligence basics;
Proficiency in labeled / unlabelled data, feature extraction from data, and
their application in ML to solve common problems is a must.
Algorithms and mathematical models are the most essential things
to learn before exploring Machine Learning concepts. These prerequisites give a
solid basis for Machine Learning, but it's also important to understand that
the specific requirements may vary as per Machine Learning models, complexity,
cutting-edge technologies, and nature of the work.
.
Need for Machine Learning
Human
beings, at this moment, are the most intelligent and advanced species on earth
because they can think, evaluate and solve complex problems. On the other side,
AI is still in its initial stage and hasn’t surpassed human intelligence in
many aspects.
Then
the question is, what is the need to make machines learn? The most suitable
reason for doing this is “to make decisions, based on data, with efficiency and
scale”.
Lately,
organizations are investing heavily in newer technologies like Artificial
Intelligence, Machine Learning and Deep Learning to get the key information
from data to perform several real-world tasks and solve problems. We can call
it data-driven decisions taken by machines, particularly to automate the
process.
These
data-driven decisions can be used, instead of programming logic, in problems
that cannot be programmed inherently. The fact is that we can’t do without
human intelligence, but another aspect is that we all need to solve real-world
problems with efficiency at a huge scale. That is why the need for
machine learning arises.
History of Machine Learning
The history of Machine learning roots back to the year 1959, when Arthur
Samuel invented a program that calculates the winning
probability in checkers for each side.
Well, the evolution of Machine learning through decades started
with the question, "Can Machines think?". Then came the rise of neural
networks between 1960 and 1970. Machine learning continued to advance
through statistical methods such as Bayesian networks and decision
tree learning.
The
revolution of Deep Learning started off in the 2010s with the
evolution of tasks such as natural
language processing, convolution neural networks and speech recognition. Today,
machine learning has turned out to be a revolutionizing technology that has
become a part of all fields, ranging from healthcare to finance and transportation.
Machine Learning Algorithms Vs.
Traditional Programming
The
difference between machine algorithms and traditional programming depends on
how they are programmed to handle tasks. Some comparisons based on different
criteria are tabulated below:
|
Criteria |
Machine
learning algorithms |
Traditional
programming |
|
Problem solving approach |
The computer learns from training a model on large
datasets. |
Explicit rules are given to the computer to follow in the
form of code that is manually programmed. |
|
Data |
They heavily rely on data, it defines the performance of
the model. |
They rely less on data, as the output depends on the logic
encoded. |
|
Complexity of Problem |
Best suited for complex problems like image segmentation
or natural language processing, which require identifying patterns and
relationships in the data. |
Best suited for a problem with defined outcome and logic. |
|
Flexibility |
It is highly flexible and adapts to different scenarios,
especially because the model is retrained with new data. |
It has limited flexibility, as the changes should be done
manually. |
|
Outcome |
The outcome in machine learning is unpredictable, as it
depends on data trained, model and many other things. |
The outcome in traditional programming can be accurately
predicted if the problem and logic are known. |
Machine Learning Vs. Deep Learning
Deep
learning is a sub-field of Machine learning. The actual difference between
these is the way the algorithm learns.
In
Machine learning, computers learn from large datasets using algorithms to
perform tasks like prediction and recommendation. Whereas Deep learning uses a
complex structure of algorithms developed similar to the human brain.
The
effectiveness of deep learning models for complex problems is more compared to
machine learning models. For example, autonomous vehicles are usually developed
using deep learning where it can identify a U-TURN sign board using image
segmentation while if a machine learning model was used, the features of the
signboard are selected and then identified using a classifier algorithm.
Machine Learning Vs. Generative AI
Machine
learning and Generative AI are different branches with different applications.
While Machine Learning is used for predictive analysis and decision-making,
Generative AI focuses on creating content, including realistic images and
videos in existing patterns.
Future of Machine Learning
Machine
Learning is definitely going to be the next game changer in technology.
Automated machine learning and synthetic data generation, are new age
developments that make machine learning more accessible and efficient.
One big
technology that is an adoption of machine learning is Quantum computing. It uses the mechanical
phenomenon of quantum to create a system that exhibits multiple states at the
same time. These advanced quantum algorithms are used to process data at high
speed. AutoML is another technology that combines automation and machine
learning. It potentially includes each stage from raw data to developing a
model ready for deployment.
Multi-modal
AI is
an AI system used to effectively interpret and analyze multi-sensory inputs,
including texts, speech, images, and sensor data. Generative AI is another
emerging application of machine learning which focuses on creating new content
that mimics existing patterns. A few other emerging technologies that have an
impact on Machine learning are Edge computing, Robotics, and many more.
Training
Training is the
process of teaching the machine learning algorithm to make accurate predictions
or decisions. This is done by providing the algorithm with a large dataset and
allowing it to learn from the patterns and relationships in the data. During
training, the algorithm adjusts its internal parameters to minimize the
difference between its predicted output and the actual output.
Testing
Testing is the
process of evaluating the performance of the machine learning algorithm on a
separate dataset that it has not seen before. The goal is to determine how well
the algorithm generalizes to new, unseen data. If the algorithm performs well
on the testing dataset, it is considered to be a successful model.
Overfitting
Overfitting
occurs when a machine learning model is too complex and fits the training data
too closely. This can lead to poor performance on new, unseen data because the
model is too specialized to the training dataset. To prevent overfitting, it is
important to use a validation dataset to evaluate the model's performance and
to use regularization techniques to simplify the model.
Underfitting
Underfitting
occurs when a machine learning model is too simple and cannot capture the
patterns and relationships in the data. This can lead to poor performance on
both the training and testing datasets. To prevent underfitting, we can use
several techniques such as increasing model complexity, collect more data,
reduce regularization, and feature engineering.
It is important
to note that preventing underfitting is a balancing act between model
complexity and the amount of data available. Increasing model complexity can
help prevent underfitting, but if there is not enough data to support the
increased complexity, overfitting may occur instead. Therefore, it is important
to monitor the model's performance and adjust the complexity as necessary.
Machine Learning Model
Before
discussing the machine learning model, we must need to understand the following
formal definition of ML given by professor Mitchell −
“A computer
program is said to learn from experience E with respect to some class of tasks
T and performance measure P, if its performance at tasks in T, as measured by
P, improves with experience E.”
The above
definition is basically focusing on three parameters, also the main components
of any learning algorithm, namely Task(T), Performance(P) and experience (E).
In this context, we can simplify this definition as −
ML is a field
of AI consisting of learning algorithms that −
·
Improve their performance (P)
·
At executing some task (T)
·
Over time with experience (E)
Based on the
above, the following diagram represents a Machine Learning Model −

Let us discuss
them more in detail now −
Task(T)
From the
perspective of problem, we may define the task T as the real-world problem to
be solved. The problem can be anything like finding best house price in a
specific location or to find best marketing strategy etc. On the other hand, if
we talk about machine learning, the definition of task is different because it
is difficult to solve ML based tasks by conventional programming approach.
A task T is
said to be a ML based task when it is based on the process and the system must
follow for operating on data points. The examples of ML based tasks are
Classification, Regression, Structured annotation, Clustering, Transcription
etc.
Experience (E)
As name
suggests, it is the knowledge gained from data points provided to the algorithm
or model. Once provided with the dataset, the model will run iteratively and
will learn some inherent pattern. The learning thus acquired is called
experience(E). Making an analogy with human learning, we can think of this situation
as in which a human being is learning or gaining some experience from various
attributes like situation, relationships etc. Supervised, unsupervised and
reinforcement learning are some ways to learn or gain experience. The
experience gained by out ML model or algorithm will be used to solve the task
T.
Performance (P)
An ML algorithm
is supposed to perform task and gain experience with the passage of time. The
measure which tells whether ML algorithm is performing as per expectation or
not is its performance (P). P is basically a quantitative metric that tells how
a model is performing the task, T, using its experience, E. There are many
metrics that help to understand the ML performance, such as accuracy score, F1
score, confusion matrix, precision, recall, sensitivity etc.

Comments
Post a Comment