Control the data returned by a Google Spreadsheet Documents using proxy script

Google documents is a great way store online data organized as a regular Excel sheet. One the provided features is the option to publish one or more sheets as a CSV file that will be consumed later on by Tabulizer.

The idea is that you publish a specific sheet (not all document) in CSV (comma separated values) format with "Automatically republish when changes are made" option enabled.

Then you create a data source of type "CSV/Text File" type using a published URL as your source. This have been already discussed in another tutorial. In this article we will discuss the use of a proxy script that will act as a middleman between the Google document and your website. At this point you may be wondering why should I do that, instead of reading the published URL directly. Normally you don't have a reason, but sometimes you want to process the published CSV file before feeding it to Tabulizer. For instance, you may want to remove certain columns or rows, change data, or simply hide the published URL.

In such cases, you can use a proxy similar to the one below. This is a working example and the only thing you need to change is the $fileName parameter to point to your published URL. Optionally you can set the parameters $min_row, $max_row if you want to filter out certain rows, or the parameters $min_column, $max_column if you want to filter out certain columns. For instance, in the proxy below only columns 1-8 will be read and the rest of the columns will be ignored (counting starts from 0, not 1).

<?php
header('Content-Type: application/csv; charset=utf-8');


$fileName = 'https://docs.google.com/spreadsheets/d/bloGWR6zV8whxoUuUbloGWR6zV8whxoUuU/pub?gid=561928619&single=true&output=csv';
$min_row = 0;
$max_row = PHP_INT_MAX;
$min_column = 0;
$max_column = 7;
$column_separator = '^';


$csvData = file_get_contents($fileName);
$lines = explode(PHP_EOL, $csvData);

function getLimit($limit, $default_value) {
    if (isset($_REQUEST[$limit])) {
        $value = intval($_REQUEST[$limit])-1;   
    } else {
        $value = $default_value;
    }   
    if (!isset($value)) $value = 0;
    else if ($value<0) $value = 0;
    return $value;
}

foreach ($lines as $r_id => $line) {       
    if (($r_id>=$min_row)&&($r_id<=$max_row)) {       
        $row = str_getcsv($line);   
        $output = array();
        foreach ($row as $c_id => $cell) {
            if (($c_id>=$min_column)&&($c_id<=$max_column)) {
                $output[] = $cell;               
            }           
        }
        if (!empty($output)) {
            $str = implode($column_separator, $output);
            echo $str . "\r\n";
        }       
    }       
}

exit();

?>