When I began working with API’s a few year ago I was incredibly excited. The amount of data I had at my fingertips was overwhelming. I had the ability to create a full scale application with someone else’s data, and it was was mind boggling. Today I find myself using API’s in almost every project I am a part of. They have become a vital aspect of any modern website, and a vital skill for any modern web developer.

API Requests are Slow

When you work with an API you are accessing someone else’s information. This process is generally very slow and the more information you request, the slower it gets. You are often limited to the number of times you can request data from someone. This makes server-side caching an ideal solution to dramatically improve the speed of displaying API results.

How Much Faster

I was recently working with a pretty large request to the Indeed API. The response contained about 320 jobs, and took about 4-5 seconds each time. The following caching method cut the request time down to 0.08 seconds.

Not Convinced?

The Caching Function

I’ve created this function to make it easy. It can easily be used with any API, just replace the indeed_api_request() function call.

/**
 * API Request Caching
 *
 *  Use server-side caching to store API request's as JSON at a set 
 *  interval, rather than each pageload.
 * 
 * @arg Argument description and usage info
 */
function json_cached_api_results( $cache_file = NULL, $expires = NULL ) {
    global $request_type, $purge_cache, $limit_reached, $request_limit;

    if( !$cache_file ) $cache_file = dirname(__FILE__) . '/api-cache.json';
    if( !$expires) $expires = time() - 2*60*60;

    if( !file_exists($cache_file) ) die("Cache file is missing: $cache_file");

    // Check that the file is older than the expire time and that it's not empty
    if ( filectime($cache_file) < $expires || file_get_contents($cache_file)  == '' || $purge_cache && intval($_SESSION['views']) <= $request_limit ) {

        // File is too old, refresh cache
        $api_results = indeed_api_request();
        $json_results = json_encode($api_results);

        // Remove cache file on error to avoid writing wrong xml
        if ( $api_results && $json_results )
            file_put_contents($cache_file, $json_results);
        else
            unlink($cache_file);
    } else {
        // Check for the number of purge cache requests to avoid abuse
        if( intval($_SESSION['views']) >= $request_limit ) 
            $limit_reached = " <span class='error'>Request limit reached ($request_limit). Please try purging the cache later.</span>";
        // Fetch cache
        $json_results = file_get_contents($cache_file);
        $request_type = 'JSON';
    }

    return json_decode($json_results);
}

How It Works

The function takes two arguments:

  1. The $cache_file argument stores the server location of the cache file you wish to use. By default it will look for api-cache.json in the same directory as the script using this function.
  2. The $expires argument will store the time between each API request. By default a new API request will be made every 2 hours.

You’ll need to create the cache file with CHMOD 777 permissions before this function will work. The function will check to see if the cache file exists using file_exists(), throwing an error message if it does not. If the cache file exists, the script will check to see if it’s empty using file_get_contents(). If it is empty an initial API request is made and the results are stored as JSON in the cache file using file_put_contents(). If the cache file exists and is not empty the script will check the last time the file was modified using filectime(). If it was modified longer than the $expires time an API request is made and the file is updated.

Sample Indeed API Request

/**
 * Request jobs from Indeed API
 *
 * Split the request into smaller request chunks (25 results each)
 * and then consolidate them into a single array to meet the API
 * requirements.
 */
function indeed_api_request( $split = 50, $search = 'company:("Google")', $apikey = 'XXXXXXXXXXXXXXXX' ) {

    // Get the goods for making the API request to Indeed
    $search = urlencode($search);
    $split = intval($split);
    $user_agent = urlencode( $_SERVER['HTTP_USER_AGENT'] );
    $server_ip = filter_var( $_SERVER['SERVER_ADDR'], FILTER_VALIDATE_IP );

    // Split API request into multiple queries, requesting "$split" results per request
    $xmlrpc = "http://api.indeed.com/ads/apisearch?publisher=$apikey&q=$search&userip=$server_ip&useragent=$user_agent&v=2&limit=$split";
    $fullxml = simplexml_load_file( $xmlrpc );
    $totalresults = intval( $fullxml->totalresults );
    $loop_size = $totalresults / $split;
    $feeds = array();
    for ( $i = 0; $i <= $loop_size; $i++ ) {
        $offset = $split * $i;
        $feeds[] = ( $i === 0 ) ? $xmlrpc : "$xmlrpc&start=$offset";
    }

    // For each feed, store the results as an array
    $grouped_results = array();
    foreach ( $feeds as $feed ) {
        $xml = simplexml_load_file($feed);
        if( !$xml ) return false;
        $json = json_encode($xml);
        $grouped_results[] = json_decode($json, TRUE);
    }

    // Consolidate all grouped requests into a single, final results array
    $jobs = array();
    foreach ( $grouped_results as $job ) {
        $jobs = array_merge( (array) $jobs, (array) $job['results']['result'] );
    }

    return $jobs;
}

This is a function I commonly use to retrieve the results of an XML-RPC request from the Indeed API. I can further elaborate if anyone is interested, let me know using the comments below.

Output the Results as HTML

I’ve used the function to display a nice HTML5 list of Indeed Jobs in this PHP, API & JSON caching demonstration.

/**
 * Format a basic job listing in HTML5
 */
function indeed_job_results( $api_results = NULL, $search = NULL ) {
    if( !$api_results ) return false;

    $total = count($api_results);
    $search = ( $search ) ? " found for &ldquo;<strong>$search</strong>&rdquo;" : "";

    $html = "<section class='jobs'>";
    $html .= "<header>";
    $html .= "<h2>$total jobs$search</h2>";
    $html .= "</header>";
    foreach ( $api_results as $job ) {
        $date = explode(' ', $job->date);
        $formattedDate = $date[2] . ' ' . $date[1] . ', ' . $date[3];

        $html .= "<article>";
        $html .= "<h3><a href='{$job->url}' target='_blank'>{$job->jobtitle}</a></h3>";
        $html .= "<p class='details'>{$job->formattedLocation} <em>&ndash;</em> $formattedDate</p>";
        $html .= "</article>";
    }
    $html .= "</section><!--// end .jobs -->";

    return $html;
}

Using the function in practice is simple.

$api_results = json_cached_api_results();
$jobs_output = indeed_job_results( $api_results, 'Google' );

The result is a nice HTML5 display.

<section class='jobs'>
    <header>
        <h2>650 jobs found for &ldquo;<strong>Google</strong>&rdquo;</h2>
    </header>

    <article>
        <h3><a href='http://www.indeed.com/viewjob?jk=caa97a24d40b837e&qd=TGbcPDy5GuwvL2ECGUPcRQVLz8qYsMKyox4VZPgZ7gXWjz26eFGVblZsapiU4aPT2U8gcWTUmEjcpOm0ZWfmQW51jR-X4A2GOFW7z-vRUlFe9tF3N6NENrAfMFkmRemu&indpubnum=5687187771681591&atk=16cgptih60k2g7vc' target='_blank'>Operations Technician (Temporary to Hire)</a></h3>
        <p class='details'>Lenoir, NC <em>&ndash;</em> Oct 20, 2011</p>
    </article>

    <article>
        <h3><a href='http://www.indeed.com/viewjob?jk=e4032e384764608b&qd=TGbcPDy5GuwvL2ECGUPcRQVLz8qYsMKyox4VZPgZ7gXWjz26eFGVblZsapiU4aPT2U8gcWTUmEjcpOm0ZWfmQW51jR-X4A2GOFW7z-vRUlFe9tF3N6NENrAfMFkmRemu&indpubnum=5687187771681591&atk=16cgptih60k2g7vc' target='_blank'>Administrative Assistant</a></h3>
        <p class='details'>New York, NY <em>&ndash;</em> Oct 20, 2011</p>
    </article>
</section><!--// end .jobs -->

In Conclusion

API’s provide access to incredibly amounts of data but it comes at a price. With speed and request limitations it’s important to have a solid method for caching your requests locally when working with an API’s in production. Hopefully this method helps you speed up your next API project!

As always, if you have any questions don’t hesitate to ask.