Relentless Constant

03 Jun 2015

‘Embedding & X-Frame-Options Response Header’

During a recent project, I came face to face with both the power and peril of embeddable content. The premise of my recent project Placemat was to remove the friction that people encounter when sending content rich links (videos, music, articles, etc.) to one another. Sharing links is usually done through email, text messaging or on a social network. However none of these solutions allow a user to engage with all the content being sent to them in a single place. Additionally, users cannot persist these links in their own personal library or share their new knowledge short of sending another email, text, etc. and continuing the cycle. Embedding, when done right, can condense the vastness of the web into concise bits of knowledge. However, when allowing users to embed anything and everything, there are some serious hurdles to overcome.

After we began development on our project, it wasn’t long before my collaborators and I encountered a ‘Same Origin’ error when trying to generate an iframe with a generic YouTube url. After the same thing happened for a handful of other sites, I turned to StackOverflow only to realize that this is a common problem.

The “Same-Origin” security policy dictates that a web browser may only permit scripts contained in a first web page to access data in a second web page if both web pages have the ‘same origin’. This means that a second web page must match the first in protocol, host and port. This table from Wikipedia gives a good breakdown of how this works in practice.

images

Same Origin is one of three possible X-Frame-Options. These X-Frame options are included in an HTTP response header so a site can decide whether or not they want to permit their site to be embedded in a “frame”, “iframe” or “object”. A page can ‘Deny’ display in a frame or ‘Allow-From uri’, meaning a page can only be displayed on the specified origin.

Many popular sites offer embeddable versions of their content for developers to use on their own sites. The following two links reference the same video.

Non-Embeddable Embeddable

The embeddable version does not have many of the familiar features users are accustomed to. There is no like button or comment section, just a barebones player. One of the motivations for sites like YouTube to format their embeddable content in this way is to prevent ‘Clickjacking’. Wikipedia describes this as, “a malicious technique of tricking a Web user into clicking on something different from what the user perceives they are clicking on.” For example, if a regular YouTube page could be embedded on another site, visitors of the second site could be “clickjacked” to end up clicking a like button or leaving a comment. Same-Origin is one of the defenses that YouTube employs to combat this and maintain the integrity of the viewing metrics of their community.

While many popular sites offer embeddable iframes for their content, there is no standardization in their format. Given the requirements that Placemat had to take a url entered by a user and generate embeddable content, we were faced with a challenge. It was not feasible to create custom logic for every major content site. Even if we did have the time to do this, we faced another challenge where certain sites (Vevo) prevent their content from being embedded on other sites with the exception of a few whitelisted partners. Thankfully, we found a solution in Embedly.

Embedly turns any link into an embeddable ‘card’. They have over 250 content partners, which means the majority of our users’ content would be formatted in a way that would be optimized its source. Additionally, they are a whitelisted partner of many sites, that otherwise would be unembeddable. Embedly does employ a rate limit for using its API, however there is no rate limit for converting links to cards with their javascript library. It works by simply adding a class of “embedly-card” to any tag and including the their javascript file on the page (see an example included in this page below).

Looking forward….

The rise of user generated content has flooded the web with more things than ever to watch, listen and read. While there are only a few popular platforms hosting this content, there are an endless amount of taste makers embedding this content on their own sites across the web. These taste makers give a sense of direction and context to an otherwise chaotic content eco-system. Embedding will be a critical part of the web going forward and the tools that will make this possible are still very much in their infancy.

This Side of Paradise by J. MEY

21 May 2015

Scrubbing Video on Scroll

Scroll the video below (Note: only compatible with Chrome, close all tabs for best performance)

During a recent visit to Apple’s website, I noticed an amazing animation on their Macbook product page. As I scrolled down to learn more about their new Macbook, the laptop on the screen opened and changed its orientation based on my scroll. I thought it was so cool, I decided to figure out how it worked. As I dug through the site with Chrome developer tools I realized that the ‘animation’ was actually a very well produced video. I figured out that my scroll was controlling where the ‘playhead’ position was in this video. Simple enough. Now to build my own.

Creating a scrollable video can be broken down into three parts:

Capturing a value to set the playhead position to
Setting the playhead to that value
Repeating this code at a very fast time interval to create smooth video playback

Below you can see the html that creates the flower video above.

<div id="vid-container" style="overflow-y: scroll; height:400px;">
  <div id="vid-container-2" style="height:800px;position;relative">
    <video id="v0" style="position: absolute;left: 0;width: 100%;" tabindex="0" >
      <source type="video/mp4; codecs=&quot;avc1.42E01E, mp4a.40.2&quot;" src=""></source> 
    </video>
  </div>
</div>

The code creates a small div, which should be just large enough to fit the dimensions of the video. By using ‘overflow-y: scroll;’, any contents included in this div, that are larger than the height of the div, can now be scrolled. So to take advantage of this, I create a taller div inside of my original div. Finally, I include the video and source inside of this div with absolute positioning so it will no move as the content inside the div is scrolled. In order to make this video work in the context of my blog page, I used a lot of inline styling to preserve the integrity of my blog css file. I would reccomend moving all this styling to a sepearte css file if you would plan on creating a scrollable video on your own site.

Below is the javascript that makes the video come alive!

var vid = document.getElementById('v0');

// pause video on load
vid.pause();
 
// pause video on document scroll (stops autoplay once scroll started)
$('#vid-container').onscroll = function(){
    vid.pause();
};

// refresh video frames on interval for smoother playback
// dividing the totalTime scrubTime calculation by 25 determines 
// how many frames will be covered in the span of a 'normal' scroll
// the lower the number, the more frames will be covered in a scroll
setInterval(function(){
    var totalTime = 290
    var scrubTimeSelect = $('#vid-container-2').position();
    var scrubTime = scrubTimeSelect.top
    vid.currentTime = eval(totalTime+ " " + "-" + " " + scrubTime) /25;
}, 40);

I have included some comments to explain what each part of the code is doing. The setInterval function is one of the more powerful parts of this code. It finds how far the div containing the flower video has scrolled, does a calculation to inverse that number (so the flower goes from closed to open and not opened to close), and finally sets the current time of the video. It then repeats this code every 40ms. This allows for smooth playback experience. For clarity, the video is not actually ‘playing’ during any point of this code. This code simply sets a new frame based on how far a user has scrolled and repeats this process very frequently.

To get this to work in a production environment, you will need to load the video file into a blob url so it can be stored in memory instead of sending get requests to a server every 40ms.

$( document ).ready(function() {

  var xhr = new XMLHttpRequest();
  xhr.open('GET', '/images/flower-slow-bloom-4k.mp4', true);
  xhr.responseType = 'blob';

  xhr.onload = function(e) {
    if (this.status == 200) {
      // Note: .response instead of .responseText
      var blob = this.response;
      var vid_url = URL.createObjectURL(blob);
      $('video').attr('src',vid_url)
      var video = $('video');
      video[0].onload = function(){
        debugger;
        URL.revokeObjectURL(vid_url)
      }
      video[0].play()
    }
  };

  xhr.send();
    console.log( "ready!" );
});

One thing to note is the video scroll can become laggy when a user has a lot of tabs open.

What really excites me…

…is other implementations of value input to control video playback. While a scroll is an interesting way of controlling the values passed to vid.currentTime, there are many other possbilities including physical devices, audio input, etc. Additionally, I’m interested in seeing how different video sources could be used instead of being constrained to one video stored locally. Find out more about this project at my Github page to stay up to date and, as always, feel free to contribute and submit a pull request!

12 May 2015

Need for Speed: YARV v. Node

On my first day learning javascript, I kept hearing / reading how much ‘faster’ my javascript programs would run compared to the Ruby programs I was accustomed to. One of the main reasons for the increased speed is V8, Google’s open source high-performance JavaScript engine. I decided the best way to put my new found knowledge to the test was to have a drag race. I decided to use the first Project Euler problem as my drag strip, and node.js and YARV as my race cars.

The Problem

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23. Find the sum of all the multiples of 3 or 5 below 1000.

Code

I wanted to make my Ruby code and Javascript code as similar as possible. Both solutions increment a count, test if it is divisible by 3 or 5, add the current count to a total if it is divisible and finally increment the count by 1. This process is repeated until the limit (in our case 1000) is reached.

Ruby

require 'benchmark'
total = 0
execution_time = Benchmark.realtime{

  x = 3
  until x >= 1000
    if x % 3 == 0 || x % 5 == 0
      total += x
    end
    x += 1
  end}
puts "The sum is #{total}"
puts "Execution Time: #{"%1.12f" % execution_time} seconds"

Javascript

var aTimer = process.hrtime();
var sum = 0;
for (var x = 3 ; x < 1000 ; x++) {
  if (x % 3 === 0 || x % 5 === 0) {
    sum += x;
  }
}
console.log('The sum of them is: '+ sum);
var aTimerDiff = process.hrtime(aTimer);
console.info("Execution time: %dseconds", aTimerDiff[1]/1000000000);

First Race: Limit => 1000

V.

To my surprise, Ruby was faster…Much. Faster. Were all of the speed optimizations of Javascript an exageration? Had Ruby made Earth shattering speed improvements when it switched from MRI to YARV in Ruby 1.9? I needed another test, this time with a much larger calculation…

Second Race: Limit => 100000000

V.

JAVASCRIPT WINS BY A LANDSLIDE!!!!

I was fascinated by the results of these two tests. Even though they confirmed that Javascript is very fast, the tests uncovered an interesting interpreter behavior. My first guess as to why Ruby performs better on small calculations is start up time. Node may have a slower initialization time, which would lead to slower performance on small calculations, but improved performance on large calculations where the full wrath of V8 can be unleashed. I look forward to testing more Project Euler problems and taking a look under the hood of both node and YARV in the near future.

Be sure to check out the github repo for this project and submit a pull request for your race.

26 Apr 2015

Builder Beware: The Limitations of Popular APIs

This post came about during the research process of a project I’m building that involves heavy use of some of the web’s most popular media services. YouTube, Instagram, Spotify and Soundcloud all have APIs, but have very different approaches to how they let developers access them. One of my biggest concerns of working with these APIs was not the data I could request, but how much I could request and how fast. This post breaks down the current API usage for each of the aforementioned services.

YouTube:

Application Limit: 50,000,000 units per day or 30,000 units per user per second This limit describes a global quota pool. That means that any user or ip-address associated with a specific project will decrement the global quota pool.

In v3, there is a global quota pool (of 50 million units/day), and all API calls that are associated with a specific project in the Developers Console decrement quota from that pool. Therefore, it is theoretically possible for a single IP address or channel to consume all of the quota associated with an API registration, which could lead to an outage that affects other users. Source

The following is a code example, which obtains one page of a users likes, would incur a quota cost of 3 units.

GET https://www.googleapis.com/youtube/v3/activities?part=snippet&maxResults=50&mine=true&key={YOUR_API_KEY}

That means that every day a single project could get a maximum of 833,333,335 likes from the YouTube API.
(50,000,000/3) = 16,666,666 page results 16,666,666 * 50 liked videos per page =833,333,335 liked videos

The real number of likes a project could get from the API each day would be less as various users would not make requests at the optimal 50 results per page.

The take away from this section is that the YouTube API has given developers enough runway to onboard a meaningful amount of users before any quota limits are reached. Google has channels for developers to request additional limits as well, however that functionality seems to be temporarily suspended.

google api notice

Instagram

Instagram employs a different quota approach compared to Youtube by allowing a local quota pool for each authenticated user of 5,000 units per hour per token.

We recommend that you use an Oauth token for the authenticated user for each endpoint, even in cases where it’s not required, since the rate limit for authenticated calls scales as you grow the amount of people using your app. Source

Instagram usually returns between 21-23 liked photos per API call. The code below deprecates the hourly quota of 5,000 by 1.

GET/users/self/media/liked

Theoretically, a single user could call between 105,000 and 115,000 liked photos per hour. However, Instagram will return an error if the API is called too frequently. Stating

Be nice. If you’re sending too many requests too quickly, we’ll send back a 503 error code (server unavailable).

Spotify

Spotify does not have explicit rate limits defined in its API, however a few posts on their API forum yielded the following conversation with two Spotify developers. spotify developer forum

In short, Spotify has vauge rate limits regarding their various endpoints. Authenitcated requests will have higher rate limits. However, all rate limits are on a per application basis.

SoundCloud

I could not find any information regarding the rate limits of the Soundcloud API.Their documentation did have this to say…

We reserve the right, at our discretion, to impose restrictions and limitations on the number and frequency of calls made by your app to the SoundCloud® API. You must not attempt to circumvent any restrictions or limitations that we impose.

Are APIs Limitless…? …not quite.

I think a full understanding of rate limits is a pre-requisite for any developer interested in using an API. An API like Spotify can not be used to power an app that depends on individual user data as the rate limit will be hit before even 100 users are signed up. Conversley, the YouTube API, while still employing a per application rate limit, provides developers with enough runway to get significant traction and users. The most developer friendly API, out of the four discussed here is the Instagram API, as the rate limit scales linerally with every user onboarded. While there are many other factors to consider before integrating an API into one’s project, rate limits are a good place to start.

26 Apr 2015

Databases: FROM the Basics to Facebook

What is a database? If you had to guess would you say it’s like a bunch of these… excel sheet
…jammed into something that looks like this…

..that makes a magical data vortex?

Well…that’s not quite right.

In this blog post we will take a look at the basic components of a database. As a beginner it is important to understand the fundamentals of databases, but it can also be helpful and even inspirational to see how some of the best minds and companies put these technologies to work.

High Level Database Architecture

Databases: This is where you model your data. By and large, SQL based databases contain tables, indexes, views, stored procedures and triggers. This data is contained in a larger Database Management System(DBMS).

Database Management System: A DBMS is a collection of programs that enables you to store, modify, and extract information from a database. In addition to your data, these typically include programs responsible for security, query processing, a storage engine and a data dictionary. Examples of DBMS Models are Flat file, Hierarchical, Network, Relational and Object-Oriented. The relational model is the most commonly used today.

SQL Structured Query Language (SQL) is a special purpose programming language designed for handling data in a relational database management system. SQL was created in the 1970s and commercialized by Oracle. Today open source resources like MySQL, PostgreSQL, SQLite and Firebird have helped SQL become one of the most popular database resources. Go check out the 34 other query languages on Wikipedia.

NoSQL NoSQL (or Not Only SQL) refers to a variety of database modeling techniques, including key-value, graph and document, that do not rely on tabular relations (data organized into rows and columns with a unique key for each row). It does not require fixed table schemas, avoids join operations by storing denormalized data, and are designed to scale horizontally.

Now that we have a little context, lets take a look at how one of the best technology companies in the world handles its data.

Facebook

Facebook delivers highly customized, real-time content to over 1 billion users. Various information types such as comments, likes, photos and videos must not only be served to their intended user, but must able to respond to spikes in requests should a specific piece of content go viral. To do this they use a highly customized version of MySQL. A conversation with two of Facebook’s MySQL gurus details their thought process.

A conversation with FB engineers Yoshinori Matsunobu & Mark Callaghan.

Tom: Why MySQL? Wouldn’t NoSQL databases, for example, be better suited for the massive workloads seen at Facebook?

Mark: MySQL is great for many of our important workloads. We make it even better with our expertise in MySQL operations and engineering, and by working with the community and learning from their experience.

Yoshinori: I have not been able to find a transactional NoSQL database better than InnoDB. And it’s easy to understand how MySQL Replication works, which makes much easier to fix problems in production.

Using the Graph API Facebook has a graph API (TAO) that enables a user to access all of the social graph as if it were a large graph database. The power of a graph database is illustrated in the example from the FB API site below.

Here’s an example query that will retrieve up to five albums created by someone, and the last five posts in their feed.

GET graph.facebook.com /me? fields=albums.limit(5),posts.limit(5)

We can then extend this a bit more and for each album object, also retrieve the first two photos, and people tagged in each photo:

GET graph.facebook.com /me? fields=albums.limit(5){name, photos.limit(2)},posts.limit(5)

Now we can extend it again by retrieving the name of each photo, the picture URL, and the people tagged:

GET graph.facebook.com /me? fields=albums.limit(5){name, photos.limit(2){name, picture, tags.limit(2)}},posts.limit(5)

You can see how field expansion can work across nodes, edges, and fields to return really complex data in a single request. Source

I found the example above fascinating and illuminating. Up until this point I have only interacted with SQL and relied on joins to match data from one table to corresponding data in another table. The graph database provides an almost intuitive query flow to access information. Check out some of the resources below for more on how Facebook’s engineers keep the world’s largest social network up and running.

Additional Reading
Chip Turner, FB Engineering Manager DB Presentation TAO: The Power of the Graph

BONUS GLOSSARY

ACID (Atomicity, Consistency, Isolation, Durability): This acronym is a set of properties that can guarantee that database transactions are processed reliably. Atomicity refers to a database transaction where either all operations occur or none occur. This is critical to prevent partial database updates. Consistency refers to a set of defined ruls that any given database transaction must follow. While it does not mean that a transaction will be error free, it does mean that the error cannot be the result of a violation of defined rules. Isolation is a general measure of users ability to access data concurrently. Durability is the requirement that committed transactions will survive even in the event of a system crash.

NOTE: This was a very high level overview. If you have interest in diving into any of these topics some interesting subjects are Object-relational impedance mismatch, The Relational Model or any of the resources linked in the body of this post.