Few months ago I happened to work on a conceptual application that involved processing videos on Azure. One of the problem statements involved having a scalable yet economical method to allow users to upload videos that could be at least few tens of GBs. That introduced me to a nifty trick to use a blob storage as a broker to enable large file uploads. Here’s a 4 minute video by Channel 9 Studio that summarizes it well. Don’t forget to come back and continue reading because there is more that needs attention to really build this right.
Why go through a serverless approach at all? File uploads are tricky business and in the era of serverless solutions, I see little reason to route heavy HTTP transactions through the primary application infrastructure, especially when dealing with large file uploads. Anything above 5MB should not be routed to your primary application host for reasons documented at the end of this post.
This is what a video processing application might look like with serverless file uploads.
Channel 9 Studio could only do justice to the core concept in 4 minutes. In this post, I’ll document a few important considerations.
Before you take this to production
You may have noticed already that the crux of this approach lies in off-loading the complete logistics of a file upload to the storage account. You need to consider the following.
User backed SAS Keys are irrevocable
SAS keys once handed over to your clients (web, mobile, etc.) are irrevocable. If the SAS key isn’t generated correctly, your storage account is at risk of being exposed to abuse.
File validation
The storage account does not validate the uploaded file. So the container where the file is being uploaded should be assumed unsafe by all other parts of your application. I would in fact recommend maintaining a dedicate storage account and container for uploading files. These files can be moved within the perimeters of your application once validated by a logic app or a blob triggered function.
Note: Azure Advanced Thread Protection supports scanning of storage account blobs for possible malware using Hash Reputation Analysis. read more
File size limits
Storage account does not limit the size of files being uploaded. The maximum size of the file is dictated by the blob type. Keeping a short expiry on the SAS key doesn’t help either.
I tried creating a key that expires in 50 seconds and was able to upload a 1 GB file in ~9 minutes and 2GB file in ~17minutes through 2 separate tests. This likely because the HTTP transaction started within a valid timeframe.
Way forward
Does this mean that you cannot off-load file uploaded to a storage account? Certainly not. You can very well identify the risks and place mitigation measures.
Do not generate SAS keys for unauthenticated users
Allowing file uploads by anonymous users should be a strict no. You leave little opportunity to control abuse of a resource that you are getting billed for in terms of storage and bandwidth.
Generate a well scoped SAS token
Ensure that your SAS key is narrowly scoped. For a file upload use case you can ensure the following
- Start expiry few seconds before current time to prevent clock skew issues
- Keep end expiry reasonably short to prevent abuse and just long enough to support slow connections. No harm in keeping it short and increasing the duration based on user feedback.
- Provide “create” permission only. Users should NOT be able to read the file they uploaded. This opens more opportunity for abuse.
- Scope the SAS key to container and file name. Ensure that the file name stays unique for all users/sessions.
- Include requestor’s IP address in the SAS key scope to prevent abuse from other networks.
BlobSasBuilder sasBuilder = new BlobSasBuilder()
{
BlobContainerName = ContainerName,
BlobName = BlobFileName,
Resource = "b",
StartsOn = DateTimeOffset.UtcNow.AddSeconds(-10),
ExpiresOn = DateTimeOffset.UtcNow.AddSeconds(120),
IPRange = new SasIPRange(ClientIPAddress)
};
sasBuilder.SetPermissions(BlobSasPermissions.Create);
Read more about SAS best practices here.
Maintain logs of file upload sessions
Maintain logs that can help trace back a file to the user and possibly the IP address that uploaded it. In event of abuse these identifiers help you block future transactions from the same source.
Use blob triggered Function/Logic App to validate and move files to you application perimeters
Your implementation could be supporting a simple photo upload for a user’s display picture or large file uploads for a video processing/hosting service. You can implement a blob triggered function that validates the file’s size and content before moving it to primary application perimeters for consumption.
This is what the video processing application may look like with a few more measures in place.
But why not use my existing infrastructure?
It maybe a few lines code that extend your existing application to allow file uploads, but this story isn’t about code.
Memory
Depending on your file upload implementation, the file being uploaded is going to reside completely or partially in your application memory. This directly affects the scalability of your application. Especially concerning if the file upload behaviours are inconsistent and difficult to plan ahead.
Sockets
HTTP requests consume socket connections. Sockets, just like Memory or CPU are exhaustible. Large file uploads hold up sockets for longer durations impacting your server’s ability to serve other clients. There are slow networks that can upload files at speeds lower than an megabyte per minute. Such clients will hog your connection pool over long durations. This can also act as an attack vector for DoS attacks.
Security
File upload endpoints also open opportunity for buffer overflow attacks. Malicious files can only be validated once they reach your servers. This is risky especially if the same servers are being used to process sensitive information for other customers.
Do you think there more that needs attention when exposing a blob storage to act as a broker for file uploads? Drop you thoughts and suggestions in comments. Cheers.