Transcribe your audio files to text automatically with AWS! Learn how in this article.

Cristian Restrepo
4 min readFeb 26, 2023

--

Currently great AI models are revolutionizing the world. From image generation to text generation, most people even of other areas different to tech industry are interesting for automating and increasing their productivity. One so useful application of this kind of technology is the transcription, commonly known as Speech to Text models, that allows you transcribe audios in minutes.

Incredible advances were made in recent months. AI models like Whisper from OpenAI are changing the way you transcribe audios in just some seconds, with amazing precision and with few lines of code. All these innovations bring close Artificial Intelligence to more people. But, do you know that AWS can do it with its own model of transcription and with serverless tools?

Serverless is a cloud-native development model that allows developers to build and run applications without having to manage servers. (redhat)

Well, I would like to present you an approach to create a pipeline that connects AWS S3, AWS Lambda and Amazon Transcribe, for creating your own Speech to Text process on the cloud. This is a simple architecture design of the example:

Serverless Transcription with AWS — Arcuitecture Design

Steps:

  1. Create a S3 Bucket for loading and storage audio files
  2. Create a lambda function
  3. Add as trigger a S3 Bucket
  4. Add permissions to execution role
  5. Build the function

Prerequisites

  • AWS Account
  • Python Knowledge

1.Create a S3 Bucket for loading and storage audio files

2. Create a lambda function:
To simplify the process I used Python3.9 and created a new role for permissions

3. Add as trigger the S3 Bucket created

4. Add permissions to execution role

For this example are needed some permissions like S3 read, and access to Transcribe Service. Enable these in IAM Roles console. You can do it selecting “configuration”, then “permissions” and lastly, clicking in the “execution role” generated automatically when Lambda Function was created.

5. Build the code in the Lambda Function

  • Import libraries required: Boto3 is used as part of Software Development Kit offered by AWS for interacting in this case with S3 and Transcribe Service. Urllib and Json are used for requesting web services and formate text.
import json
import boto3
import urllib
  • Generate objects for S3 and Transcribe services
s3_client = boto3.client('s3')
transcribe_client = boto3.client('transcribe')
  • Create bucket and file name variables that receive metadata of a new file triggered from S3. Then create the URI that will be use for read the file.
bucket_name = event['Records'][0]['s3']['bucket']['name']
s3_file_name = event['Records'][0]['s3']['object']['key']
object_url = f"s3://{bucket_name}/{s3_file_name}"
  • Create transcription process. Give a job name, you could bring it from metadata for automation. Here as an example it was manually added. Then use the transcribe.start_transcription_job method for add to a queue the transcription process in Amazon Transcribe service. This method let you define the language of the audio with the atribute LanguageCode or you can give this task of identify the language to the method itself with the atribute IdentifyLanguage = True. It will process the transcription and storage it until 90 days. You could retrieve the information using the job name you created.
transcribe_job_name = 'test_transcription1'

transcribe = boto3.client('transcribe')
response = transcribe.start_transcription_job(
TranscriptionJobName=transcribe_job_name,
#IdentifyLanguage = True, # Let the code identify language itself
LanguageCode='fr-FR', # Define language of the audio to transcribe
MediaFormat='mp4', # Define media format
Media={
'MediaFileUri': object_url # URI of S3 location
}
)
  • Retrieve the transcription. Although Amazon Transcribe storage the transcription for some time, you could retrieve it and use it if you want in the same process. Create a while loop waiting that transcription process change its status by “COMPLETED”. Then get the transcription URL from Transcribe service, make a request and voilà! you have your transcription done.
while True:
final_response = transcribe.get_transcription_job(TranscriptionJobName=transcribe_job_name)
if final_response['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
break

transcription_url = final_response["TranscriptionJob"]["Transcript"]["TranscriptFileUri"]
transcription_file = urllib.request.urlopen(transcription_url).read().decode("utf-8")
transcription = json.loads(transcription_file)["results"]["transcripts"][0]["transcript"]

Full code:

import json
import boto3
import urllib

s3_client = boto3.client('s3')
transcribe_client = boto3.client('transcribe')

def lambda_handler(event, context):
bucket_name = event['Records'][0]['s3']['bucket']['name']
s3_file_name = event['Records'][0]['s3']['object']['key']
audio_file = '/tmp/{}'.format(s3_file_name)
object_url = f"s3://{bucket_name}/{s3_file_name}"

transcribe_job_name = 'Test_transcription1'

transcribe = boto3.client('transcribe')
response = transcribe.start_transcription_job(
TranscriptionJobName=transcribe_job_name,
#IdentifyLanguage = True,
LanguageCode='fr-FR',
MediaFormat='mp4',
Media={
'MediaFileUri': object_url
}
)

while True:
final_response = transcribe.get_transcription_job(TranscriptionJobName=transcribe_job_name)
if final_response['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
break

transcription_url = final_response["TranscriptionJob"]["Transcript"]["TranscriptFileUri"]
transcription_file = urllib.request.urlopen(transcription_url).read().decode("utf-8")
transcription = json.loads(transcription_file)["results"]["transcripts"][0]["transcript"]

return {
'statusCode': 200,
'body': json.dumps(f'{transcription}')
}

Now, you can test loading an audio file in the S3 bucket created and see the magic. Additional, you can also use this example and improve it in accordance to your interests, maybe transcribe your content, create a dataset from media content for test language models with AI and other thousands of possible uses.

Look at the repository for this approach

Let me know in comments other uses you could give to this tool.

--

--

Cristian Restrepo
Cristian Restrepo

Written by Cristian Restrepo

Professional blog by Cristian Restrepo | Business & Data Science passionate | Data Engineer at Mercado Libre