Externally-Hosted Libraries: Good or Bad Idea?

Many developers tout the benefits of using externally-hosted libraries for various website content – most typically javascript libraries. Google, for example, hosts most of the popular javascript libraries, including jQuery, MooTools, Prototype, etc. They state on their informational page that there are a number of benefits, including “stable, reliable, high-speed, globally available access”.

This article is old and outdated

Note: This post was originally written on a WordPress-powered blog, and has been exported and imported to this blog platform. As a result, unfortunately, not all markdown, code, and output looks as pretty as it once did.

The benefits come in to play for a few reasons, notably:

  1. CDN-hosted, which likely means faster than the hosting provider your site is with.
  2. Less content for your web server to serve to your clients.
  3. Parallel connections in web browsers are often limited to several open connections per remote host. If you source content from several hosts, speeds may increase.
  4. Instantly-updated libraries, if you link to the “latest” version (such as “latest.js” as opposed to “version-1.10.5.js”).
  5. If you already visited a site that uses the same library hosted by the same distributor, your browser may have that library cached, meaning you don’t even need to grab the content remotely.

These sound like great improvements, especially for larger sites. If you can reduce the number of requests and the content to distribute, plus potentially reduce loading time for your clients (even if only by a few milliseconds), isn’t it a win-win situation?

So, what’s the issue?

Of the five items above, 1-4 sound pretty good. Sure, what if Google or another CDN got hacked? Wouldn’t we be serving up malicious content, potentially? I hear that argument fairly regularly, and it has merit. For sensitive applications, that could be a concern. These library hosting providers are certainly a large target, given the widespread impact somebody could have if they were able to compromise some of those scripts being served out. I’m not here to address that today, though. I’m here to talk about trusting user input.

We could reduce a vast number of vulnerabilities on web applications if developers had to always trust user input. The industry is getting a better (but not perfect) grasp on the “common-sense” situations of user input; SQL injection from form fields, Cross-Site Scripting through URL parameters, and similar issues. What about less-common user input fields? Browser’s reported user-agent? Browser’s requested language? Cookies? While more difficult and likely more rare to properly exploit, these are areas of user input to validate.

If you’re reading between the lines, you might figure out I am targeting item #5 in the list above in this post. Yes, I do mean that we need to validate javascript libraries that might be cached on the client’s computer. While the server itself is not processing the user input and the browser is not uploading the cached content to the server for it to process, the application likely does rely on javascript to function properly. And guess what? The client is providing the javascript, right from the hard disk. Whenever I bring this issue up, people are quick to point out that somebody could just as easily modify the javascript in runtime as opposed to modifying it in their cached content. Frankly, I don’t care what a user does to themselves. I care about what an attacker can do to the user – so throw out any ideas of a user modifying anything here.

Attack scenario

The goal of this scenario is to give me me, the attacker, the ability to execute arbitrary commands on your web browser and especially against any site using an external javascript library provider. Needless to say, in this instance, you are the victim. Pre-requisites:

  • I must intercepting your network traffic. A few options:
    — I own the access point or gateway you are using
    — The network and the environment allow for me to ARP spoof network traffic
  • You must be browsing to a site that uses the same 3rd party library host that I am targeting
  • Your browser must not have the library content cached, or the content is old enough for the browser to re-request the content, while I am intercepting your network traffic
  • You do not press “F5”, which, in most browsers, re-validates the cached content versus simply relying on the cache control times set in the HTTP headers. In standard usage, most people do not press “F5” and simply browse directly to a URL or click links, both of which allow browsers to rely on cached content. (Note: This may not be entirely accurate; further testing showed that even when the victim used “F5”, the attack still worked just fine, depending on the browser)

An attacker would likely embed javascript that hooks the browser into some sort of command and control tool, such as BeEF. BeEF allows the attacker to continually hook the user even when the user migrates away from a page – but if the browser is closed, then the BeEF session is killed. One big downfall of BeEF is that the user can re-open the browser, re-visit any site, and unless they visit the infected page where they were originally hooked, they will not be hooked again. With the method outlined below, the attacker can have a much better chance of re-hooking the browser once it is closed and re-opened. This happens because the attacker has expanded the scope of “originally hooked page” to “any page using the externally-hosted javascript library”. I’ll let you, the reader, do some research to see how many sites use jQuery’s or Google’s hosted javascript libraries; any that you find are essentially valid targets. Many top sites such as Facebook, CNN, Mint.com, and Netflix host the libraries themselves – but many “medium sized” sites rely on third party library providers.

Example attack

The payload for this attack is:

$(document).ready(function () { $(“p”).click(function () { $(this).toggleClass(“red”); }); });

Obviously an attacker would use a BeEF injection script or similar. This payload simply assigns a CSS class of “red” to any text that is clicked – just as a proof of concept.

I setup two pages on two different domains, both hosting the same code. Here’s a screenshot of a server response from Burp showing the code. The embedded javascript is only for New Relic app stat tracking and has nothing to do with this demo. As you can see, there is no embedded javascript to do anything. Adding in the payload above, through, would change text to red whenever it is clicked.

Alright, now that we have the demo setup, let’s walk through the steps that would have to happen for this to be exploited.

Step one: Attacker has to have control over the victim’s web traffic

Step two: Victim sends a request to a page with script source pointing to externally hosted libraries:


Step three: Attacker intercepts the server response from the hosted library provider. In this case, it is Google:


Step four: Attacker edits the server response and modifies the cache-control to expire the object in one year and sets the max age to much longer (add several zeros to the end). Finally, the attacker embeds the jQuery function payload noted above at the end of the script:


…and that is all the attacker has to do. Let’s see what happens on the victim’s end:

First, the request we intercepted is functioning fine – the embedded javascript didn’t mess anything up:


We can see that when the victim clicks the “hello” text, it turns red, meaning the injected code is functioning properly and was actually injected:


Now the victim visits another site with the same embedded library (if the previous one was Facebook, this different site could be LinkedIn):


We can see through Burp that the new page pulled the library from the browser’s cache and did not ask Google for a new copy. The browser also did not ask Google to verify expiration date, checksum, or filesize of the cached content. Line 294 is the original request to the first page. Line 295 is Google’s response which we injected code into. 296, 297, and 299 are inconsequential to this demo. 298, however, is the request for the second page – and we can see no response came from Google:

And when the victim clicks on the text on the second page, we see the injected code executes:

And there we have it: cross-domain script injection using externally-hosted javascript libraries. Sites using externally-hosted libraries are essentially susceptible to cross-site scripting – and they are vulnerable until the victim’s cache is cleared, regardless of whether I am intercepting their traffic or not. This means when the victim leaves the network I am intercepting their traffic on, there is virtually no impact to me as an attacker. The next time the victim browses the internet and visits a site using the script library provider, I will be able to hook the victim’s browser using BeEF, or execute any other arbitrary code I wish to.

How do I fix it?

In short: host all of your content on servers you control, and always be in control of the original content delivered to the user. Cached content is untrusted content. While it is fair to point out the victim would be susceptible to the root of this problem (man-in-the-middle attacks) even if the scripts were not cached and hosted by a third party provider, the usage of these third party providers only compounds the problems and the impact, and dramatically extends the effectiveness of man-in-the-middle attacks, by giving them a much longer lifespan, past when the original man-in-the-middle attack ceased.

I have not yet validated to see if this works between HTTP and HTTPS. If so, this would be a potential way to modify HTTPS script content, by first injecting code into the same library used for a non-HTTPS connection and later re-used for an HTTPS connection. Given that browsers typically do not like mixing secure and non-secure content, though, I think the library providers would provide a different instance of the hosted script for HTTP and for HTTPS.