Sitecore serves media items even with incorrect extensions

06 August 2012
Marek Musielak
Frink_Cognifide_2016_HeaderImages_0117
This post is based on Sitecore 6.5.0 (rev. 111230).

sitecore media items server with incorrect extension

While trying to figure out how Sitecore delivers a media item stream, I noticed that irrespective of the extension you use for a media item, Sitecore will return the media item anyway. So, if you have a logo image in jpeg format and if you type the URL into your browser using any other extension (e.g. pdf instead of jpg - http://localhost/~/media/sc_logo.pdf ); Sitecore by default, will still serve the media item despite the fact that the extension is incorrect.

What is more interesting is that you can actually use any text for the extension - it doesn't even have to be a proper mime type extension. You could use sc_logo.any_text_here and it will work as well. Behavior like this is obviously not good for a web application, and in fact, it could potentially ruin your website's search engine rankings.

Now, what can you do about this to protect your site from an unfortunate loss of search engine optimisation investment(s)?

Sitecore
allows you to extend the default MediaRequestHandler so you can easily check whether the requested file extension matches the media item extension. Remember the default request extension for media handler (e.g. ashx) as well. Here is the simple code of the new request handler:

using System;
using System.Linq;
using Sitecore.Configuration;
using Sitecore.Data.Items;
using Sitecore.Resources.Media;

namespace My.Assembly.MyNamespace
{
    public class ExtendedMediaRequestHandler : MediaRequestHandler
    {
        public override void ProcessRequest(System.Web.HttpContext context)
        {
            MediaRequest request = MediaManager.ParseMediaRequest(context.Request);

            if (request != null)
            {
                Media media = MediaManager.GetMedia(request.MediaUri);

                if (media != null)
                {
                    MediaItem mediaItem = media.MediaData.MediaItem;

                    if (mediaItem != null)
                    {
                        if (!IsValidExtension(context.Request.RawUrl,
                            mediaItem.Extension, Settings.Media.RequestExtension))
                        {
                            context.Response.StatusCode = 404;
                            context.Response.End();
                        }
                    }
                }
            }
            base.ProcessRequest(context);
        }

        private static bool IsValidExtension(string rawUrl, string mediaExtension,
            string defaultMediaExtension)
        {
            if (rawUrl.Contains('?'))
            {
                rawUrl = rawUrl.Substring(0, rawUrl.IndexOf('?'));
            }

            if (!String.IsNullOrEmpty(mediaExtension)
                && rawUrl.EndsWith(mediaExtension,
                    StringComparison.OrdinalIgnoreCase))
            {
                return true;
            }

            if (!String.IsNullOrEmpty(defaultMediaExtension)
                && rawUrl.EndsWith(defaultMediaExtension,
                    StringComparison.OrdinalIgnoreCase))
            {
                return true;
            }

            return false;
        }
    }
}
And here is how to register the request handler in web.config:
  <system.web>
    <httpHandlers>
      <add verb="*" path="sitecore_media.ashx"
         type="My.Assembly.MyNamespace.ExtendedMediaRequestHandler, My.Assembly" />
		...
    </httpHandlers>
  </system.web>

  <system.webServer>
    <handlers>
      <add verb="*" path="sitecore_media.ashx"
		type="My.Assembly.MyNamespace.ExtendedMediaRequestHandler, My.Assembly"
		name="Sitecore.MediaRequestHandler"/>
		 ...
    </handlers>
  </system.webServer>
Now if one tries to access any media item using not valid file extension, Sitecore will return the 404 page not found error.

I'll conclude by adding that you may want to redirect the incorrect request to a proper URL instead of returning 404 response - but will leave that up to you.

You may also want to see my other Sitecore posts if you find this post interesting.