OpusClip Engineering

GCS Transfer Speedup Guide: Optimizing Cloud Storage Performance

Cover Image for GCS Transfer Speedup Guide: Optimizing Cloud Storage Performance
By Mark Zhu

📑 Table of Contents

  • 🎯 Overview
  • ⚠️ Why GCS Upload and Download Can Be Slow
  • ✅ Optimization Solutions
  • 🔧 Code Integration Examples
  • 📊 Real-World Benchmark Results
  • ⚠️ Tradeoffs to Consider

🎯 Overview

This guide walks you through how to optimize file transfer to and from Google Cloud Storage (GCS), especially when working with large files in production environments.

Sections Included:

  • Why GCS transfers can be slow
  • Optimization solutions
  • Built-in helper functions in code
  • Real-world benchmark comparison
  • Tradeoffs to consider

⚠️ Why GCS Upload and Download Can Be Slow

🌐 VM Network Throughput Limits

GCP VMs have limits on egress speeds depending on machine type. For example:

  • Smaller VMs like e2-micro may only offer a few hundred Mbps
  • Standard VMs like n2-standard-16 and above support higher throughput

📁 File Contention: Multiple Processes Reading the Same File

If multiple processes access the same large file concurrently (e.g., for different chunk uploads), it can cause disk I/O bottlenecks.

🔄 Single-Threaded Upload or Download

Using plain single-process gsutil cp or GCS SDK (e.g., Node.js @google-cloud/storage) tends to serialize operations and is slow for large files.

✅ Optimization Solutions

Upgrade VM Instance

Use higher-spec VMs (e.g., n2-standard-8) to benefit from increased bandwidth and better parallelism.

🚀 Faster Upload: gsutil + Composite Uploads

Use the following command:

gsutil -o "GSUtil:parallel_composite_upload_threshold=150M" -m cp file.tar/bucket/

Why this works:

  • -m enables multi-threading
  • parallel_composite_upload_threshold=150M breaks large files into smaller chunks (composite uploads)
  • Each chunk is uploaded in parallel and recombined on GCS
  • Ideal for large files (150MB+), speeds up I/O significantly

📥 Faster Download: rsync from FUSE

Instead of downloading directly via API or gsutil cp, use FUSE-mounted GCS + rsync to a local directory:

rsync -aH --progress /mnt/gcs/folder/file ./local_dir/

Why this works:

  • rsync is optimized for file diffing and streaming
  • FUSE already abstracts remote GCS file, and rsync makes best use of local cache and efficient transfer

🔧 Code Integration Examples

These optimizations are already integrated in:

📁 libs/importer/src/util/gcs.ts

Function Description Optimization Used
getFile/readGSFunc() optimized FUSE-based download FUSE Download
uploadImage/uploadUtil() optimized chunked upload with gsutil Composite Upload

📊 Real-World Benchmark Results

We performed a simulated comparison using the same VM instance: n2-standard-4

# Type Method File Size Avg Speed Notes
1 Upload TypeScript SDK 1.5 GB ~30 Mb/s Single-threaded, slower
2 Upload gsutil -m w/ composite 1.5 GB ~280 Mb/s Much faster with chunk uploads
3 Download Plain + copy 1.5 GB ~60 Mb/s
4 Download FUSE + rsync 1.5 GB ~160 Mb/s Efficient streaming via copy FUSE

🏆 Result: 3-4x improvement with simple CLI-level optimization!

⚠️ Tradeoffs to Consider

  • 🧩 Composite uploads create temporary chunks in your GCS bucket during transfer
  • They increase short-term storage usage slightly
  • However, GCS automatically merges and cleans up these parts after upload

In most cases, this tradeoff is minimal and acceptable for better speed

🎯 Quick Reference

🚀 For Faster Uploads:

gsutil -o "GSUtil:parallel_composite_upload_threshold=150M" -m cp [file] gs://[bucket]/

📥 For Faster Downloads:

rsync -aH --progress /mnt/gcs/folder/file ./local_dir/

💻 In Code:

  • Use getFile/readGSFunc() for optimized downloads
  • Use uploadImage/uploadUtil() for optimized uploads

Pro Tip: These optimizations work best with files 150MB+ and can provide 3-4x performance improvements in real-world scenarios!