
Figure 1:Analysis-Ready Cloud-Optimized (ARCO) concept diagram
Analysis-Ready Cloud-Optimized Datasets¶
Overview¶
In this notebook, we will explore Analysis-Ready Cloud-Optimized (ARCO) radar datasets using Canadian weather radar data. You’ll learn:
- Analysis-Ready datasets - Pre-processed data ready for immediate analysis
- Cloud-Optimized formats - Efficient storage and access in cloud environments
- FAIR principles - Making data Findable, Accessible, Interoperable, and Reusable
- Zarr format - Modern chunked storage for large scientific datasets
We’ll use Canadian radar data from the May 2022 Ontario Derecho severe weather event.
Prerequisites¶
Table 1:Prerequisites for this tutorial
Concepts | Importance | Notes |
---|---|---|
Intro to Xarray | Necessary | Basic features |
Radar Cookbook | Necessary | Radar basics |
Intro to Zarr | Necessary | Zarr basics |
- Time to learn: 30 minutes
Imports¶
Analysis-Ready Data¶
Analysis-Ready data means datasets are prepared and structured to be immediately usable for scientific analysis. Studies show that data scientists typically spend ~80% of their time preparing and cleaning data rather than doing actual analysis.
Analysis-Ready datasets solve this by providing:
- Clean, pre-processed data that’s ready to use
- Rich metadata that explains what the data contains
- Standardized formats that work well with analysis tools
- Quality control that ensures data reliability
This means more time for science and discovery! 🚀

Figure 2:Analysis-Ready data workflow visualization
Key Benefits of Analysis-Ready Data:
✅ Datasets instead of scattered files - Organized collections of related data
✅ Pre-processed and clean - No need to spend hours fixing data issues
✅ Rich metadata included - Clear documentation of what the data represents
✅ Cataloged and discoverable - Easy to find relevant datasets
✅ Immediate analysis capability - Start analyzing right away
✅ More time for science! - Focus on research questions, not data wrangling
Cloud-Optimized Data¶
Traditional radar data formats (like individual NetCDF files) work well on local computers but are slow and inefficient in cloud environments. Cloud-Optimized formats like Zarr are designed specifically for fast, efficient access from cloud storage.

Figure 3:Traditional vs Cloud-Optimized data access patterns
Why Cloud-Optimized matters:
- Parallel access - Multiple users can read different parts simultaneously
- Chunked storage - Only download the data you need
- Fast streaming - No need to download entire files
- Scalable processing - Handle datasets too large for local computers
FAIR Data Principles¶
FAIR data follows principles that make scientific data more valuable and reusable:
- Findable - Easy to discover through catalogs and search
- Accessible - Available through standard protocols
- Interoperable - Works with different tools and systems
- Reusable - Well-documented for future use by others

Figure 4:FAIR (Findable, Accessible, Interoperable, Reusable) data principles diagram
FAIR data benefits everyone:
- Data producers get citations when others use their datasets
- Data consumers access interesting datasets that would otherwise be unavailable
- Science advances through improved data sharing and collaboration

Figure 5:FAIR data reuse and collaboration cycle
Image courtesy: Zarr illustrations
Zarr format¶
Zarr is a modern storage format designed for large scientific datasets. Instead of storing data in single large files, Zarr breaks data into small “chunks” that can be:
- Compressed to save storage space
- Accessed in parallel by multiple users
- Streamed efficiently from cloud storage
- Processed on-demand without downloading everything
Think of it like having a library where you can grab just the books you need, rather than having to check out the entire library!

Figure 6:Monolithic vs chunked data storage comparison showing Zarr’s advantage
Courtesy: Zarr illustrations
We’ll create Analysis-Ready Cloud-Optimized radar datasets using the CfRadial2.1/FM301 standard - a hierarchical structure endorsed by the World Meteorological Organization (WMO). This standard organizes radar data efficiently for both storage and analysis.
CfRadial2.1/FM301 standard¶
The DataTree structure organizes radar data hierarchically:
- Root level: Contains general radar metadata (location, time, etc.)
- Sweep levels: Each elevation angle gets its own dataset with radar variables
- This structure mirrors how meteorologists think about radar scans
Figure 7:CfRadial2.1/FM301 hierarchical DataTree structure for radar data organization
Summary¶
We learned about Analysis-Ready Cloud-Optimized (ARCO).
🎯 Key Learning Outcomes:¶
📊 Analysis-Ready: Pre-processed, clean datasets ready for immediate scientific analysis
☁️ Cloud-Optimized: Efficient Zarr format enabling fast access from cloud storage
🌐 FAIR Principles: Making data Findable, Accessible, Interoperable, and Reusable
📈 Time Series: Combined multiple radar volumes to track storm evolution
🏗️ Standardized Structure: Used WMO-endorsed FM301 hierarchical organization
🚀 What This Enables:¶
- Faster Research: No more data preprocessing - start analyzing immediately
- Cloud Analytics: Process large datasets without downloading everything
- Reproducible Science: Standardized formats work across different tools
- Collaboration: Easy data sharing following FAIR principles
- Storm Tracking: Time series analysis of severe weather events
The Ontario Derecho case study demonstrates how ARCO datasets streamline radar meteorology research and education! 🌪️