
Figure 1:Analysis-Ready Cloud-Optimized (ARCO) concept diagram
Analysis-Ready Cloud-Optimized Datasets¶
Overview¶
In this notebook, we will explore Analysis-Ready Cloud-Optimized (ARCO) radar datasets using Canadian weather radar data. You’ll learn:
Analysis-Ready datasets - Pre-processed data ready for immediate analysis
Cloud-Optimized formats - Efficient storage and access in cloud environments
FAIR principles - Making data Findable, Accessible, Interoperable, and Reusable
Zarr format - Modern chunked storage for large scientific datasets
We’ll use Canadian radar data from the May 2022 Ontario Derecho severe weather event.
Prerequisites¶
Table 1:Prerequisites for this tutorial
Concepts | Importance | Notes |
|---|---|---|
Necessary | Basic features | |
Necessary | Radar basics | |
Necessary | Zarr basics |
Time to learn: 30 minutes
Imports¶
Analysis-Ready Data¶
Analysis-Ready data means datasets are prepared and structured to be immediately usable for scientific analysis. Studies show that data scientists typically spend ~80% of their time preparing and cleaning data rather than doing actual analysis.
Analysis-Ready datasets solve this by providing:
Clean, pre-processed data that’s ready to use
Rich metadata that explains what the data contains
Standardized formats that work well with analysis tools
Quality control that ensures data reliability
This means more time for science and discovery! 🚀

Figure 2:Analysis-Ready data workflow visualization
Key Benefits of Analysis-Ready Data:
✅ Datasets instead of scattered files - Organized collections of related data
✅ Pre-processed and clean - No need to spend hours fixing data issues
✅ Rich metadata included - Clear documentation of what the data represents
✅ Cataloged and discoverable - Easy to find relevant datasets
✅ Immediate analysis capability - Start analyzing right away
✅ More time for science! - Focus on research questions, not data wrangling
Cloud-Optimized Data¶
Traditional radar data formats (like individual NetCDF files) work well on local computers but are slow and inefficient in cloud environments. Cloud-Optimized formats like Zarr are designed specifically for fast, efficient access from cloud storage.

Figure 3:Traditional vs Cloud-Optimized data access patterns
Why Cloud-Optimized matters:
Parallel access - Multiple users can read different parts simultaneously
Chunked storage - Only download the data you need
Fast streaming - No need to download entire files
Scalable processing - Handle datasets too large for local computers
FAIR Data Principles¶
FAIR data follows principles that make scientific data more valuable and reusable:
Findable - Easy to discover through catalogs and search
Accessible - Available through standard protocols
Interoperable - Works with different tools and systems
Reusable - Well-documented for future use by others

Figure 4:FAIR (Findable, Accessible, Interoperable, Reusable) data principles diagram
FAIR data benefits everyone:
Data producers get citations when others use their datasets
Data consumers access interesting datasets that would otherwise be unavailable
Science advances through improved data sharing and collaboration

Figure 5:FAIR data reuse and collaboration cycle
Image courtesy: Zarr illustrations
Zarr format¶
Zarr is a modern storage format designed for large scientific datasets. Instead of storing data in single large files, Zarr breaks data into small “chunks” that can be:
Compressed to save storage space
Accessed in parallel by multiple users
Streamed efficiently from cloud storage
Processed on-demand without downloading everything
Think of it like having a library where you can grab just the books you need, rather than having to check out the entire library!

Figure 6:Monolithic vs chunked data storage comparison showing Zarr’s advantage
Courtesy: Zarr illustrations
We’ll create Analysis-Ready Cloud-Optimized radar datasets using the CfRadial2.1/FM301 standard - a hierarchical structure endorsed by the World Meteorological Organization (WMO). This standard organizes radar data efficiently for both storage and analysis.
CfRadial2.1/FM301 standard¶
The DataTree structure organizes radar data hierarchically:
Root level: Contains general radar metadata (location, time, etc.)
Sweep levels: Each elevation angle gets its own dataset with radar variables
This structure mirrors how meteorologists think about radar scans
Figure 7:CfRadial2.1/FM301 hierarchical DataTree structure for radar data organization
Summary¶
We learned about Analysis-Ready Cloud-Optimized (ARCO).
🎯 Key Learning Outcomes:¶
📊 Analysis-Ready: Pre-processed, clean datasets ready for immediate scientific analysis
☁️ Cloud-Optimized: Efficient Zarr format enabling fast access from cloud storage
🌐 FAIR Principles: Making data Findable, Accessible, Interoperable, and Reusable
📈 Time Series: Combined multiple radar volumes to track storm evolution
🏗️ Standardized Structure: Used WMO-endorsed FM301 hierarchical organization
🚀 What This Enables:¶
Faster Research: No more data preprocessing - start analyzing immediately
Cloud Analytics: Process large datasets without downloading everything
Reproducible Science: Standardized formats work across different tools
Collaboration: Easy data sharing following FAIR principles
Storm Tracking: Time series analysis of severe weather events
The Ontario Derecho case study demonstrates how ARCO datasets streamline radar meteorology research and education! 🌪️