Whether it’s a self-driving car that knows when a person is a crossing road in front of it or an automatic speech recognition system that is used to generate captions in a youtube video (youtube cc), it wouldn’t be an exaggeration to say that artificial intelligence models are getting more intelligent and less artificial. But what makes all this possible? The answer is – Deep Learning.

What is Deep Learning and How is it Different from Machine Learning?

Deep learning imitates the function of the human brain to process data and create patterns to ultimately make decisions or perform certain actions. It uses artificial neural networks that are capable of unsupervised learning from unstructured or unlabeled data. 

The hierarchy of this ground-breaking technology looks like this – 

Artificial Intelligence —-> Machine Learning —-> Deep Learning

So, Deep Learning is a subset of Machine Learning and Machine Learning is a subset of Artificial Intelligence. Deep learning is, however, more often than not, confused with Machine Learning.  and is mostly confused with machine learning.

To proceed further, let’s understand the difference with an example – For example, there’s a machine that can differentiate between cherries and tomatoes.

Using Machine Learning 

To use machine learning, you’ll have to feed the machine with the features based on which the two can be differentiated.  Through feature extraction, it can learn the difference between cherries and tomatoes (based on the kind of stem or size). After learning, it will always be able to differentiate between the two.

Using Deep Learning 

With deep learning, on the other hand, the features of cherries and tomatoes are automatically picked up by neural networks without human intervention. And that is the kind of model we need today. 

But Why Deep Learning? 

The simple answer is – due to the amount of the data.

There is a humongous amount of data that is being generated. This data is present in unstructured, unlabelled, and raw form. And it is impossible for humans to structure that data for machines.The need, therefore, is for a technology that enables machines to not need data structuration and learn as humans do – through experience and patterns.
Hence the need for Deep Learning. It essentially makes use of unstructured and raw data (uses a lot of it) but does the job itself.

Achievements of Deep Learning

The achievements of Deep learning sprawl across robot navigations, self-driving cars, image recognition, etc. Today we’ll be discussing one such achievement that deep learning technology has entirely revolutionized. 

Automatic Speech Recognition

Automatic speech recognition is basically used for the conversion of spoken words into text format. The auto-generated youtube subtitles (youtube cc) is one example of speech recognition. 

How does it work?

The algorithms that decode and transcribe audio into the text are trained on vast amounts of data. 

The 2 models that work here are – 

  1. Acoustic model – It represents a mapping between audio words and phonemes (sound units of which words are composed).
  2. Language model –  the second model is a language model which represents the probability of dependency between words in a sentence. 

And then there is a phonetic dictionary that maps between acoustic model and language model. The acoustic model learns through deep learning that is a deep neural network that is trained on hours of conversation. While the language model is trained on billions of words.

Automatic Speech Recognition in Youtube Subtitles

The subtitles (youtube cc) that we see appearing on youtube videos is by the courtesy of automatic speech recognition. 

Imagine this – If youtube captions were not using automatic speech recognition algorithm, you’d be writing the captions for the entire video on your own. But relax you won’t have to do that!. Unlike earlier times where programmers had to write codes for the objects they wanted the system to recognise, now the system recognizes just by examples and continues to get smarter without human intervention. 

Youtube automatically recognises the speech and assigns captions (youtube cc) to videos using automatic speech recognition. It allows the user not only to convert the video speech into a text format but also converts the halts in between (for example, when we see ‘music playing’ written in the caption). But the technology is evolving and still needs improvement and a little human help. 

Google explains:

“We’ve combined Google’s automatic speech recognition (ASR) technology with the YouTube caption system to offer automatic captions or auto-caps for short. Auto-caps use the same voice recognition algorithms in Google Voice to automatically generate captions for video.
We’re also launching automatic caption timing, or auto-timing, to make it significantly easier to create captions manually. With auto-timing, you no longer need to have special expertise to create your own captions on YouTube. All you need to do is create a simple text file with all the words in the video and we’ll use Google’s ASR technology to figure out when the words are spoken and create captions for your video.”

Deep learning for automatic speech recognition in youtube subtitles is no longer a luxury but a necessity. With more than 500 hours of content being uploaded on youtube in different languages, it would require a superhuman to be able to write captions for all of them. Even if each creator writes his/her own captions, it would require a lot of time and efforts. 

Auto-generation of captions comes handy here. Through deep learning, automatic speech recognition models can efficiently generate subtitles with up to 95% accuracy( that will only increase in the coming times). 

Improvements in deep learning can enhance automatic speech recognition in youtube subtitles and improve their accuracy. As deep learning models work on self-learning and experience it is very much possible. The more the data, the higher the accuracy.

In conclusion

Youtube subtitles is a tiny fraction of deep learning potential.

Deep Learning models have already achieved what seemed impossible a few decades ago. It is only going to leap forward from here. In a world where our physical environment will be governed by artificial intelligence to a considerable extent, it only makes sense to deepen our understanding of the same. Not only will you be able to leverage the technology but also lead its improvement if you have a thorough understanding of the subject.

Springboard’s mentor-led, project-based data sciencedata analytics and AI/ML career track are industry-focussed job-oriented online learning programs, designed to prepare you for a meaningful and successful career in future technologies. 

Begin your vision 2020 online learning here.