read-file
Read and analyze any data file (CSV, JSON, Parquet, Avro, Excel, etc.) or remote URL (S3, HTTPS) using DuckDB. Automatically detect file formats and preview/profile datasets.
Introduction
The read-file skill enables seamless data exploration directly within your development environment using the power of the DuckDB engine. It is designed for data analysts, engineers, and researchers who need to quickly inspect, preview, or profile datasets without switching to dedicated database management tools. By leveraging DuckDB’s high-performance vectorized query execution, this skill can handle a wide variety of structured and semi-structured formats, including CSV, TSV, JSON, JSONL, Parquet, Avro, Excel (XLSX/XLS), spatial data (Shapefiles, GeoPackage), SQLite databases, and even Jupyter Notebooks.
-
Automatically infers file formats based on extensions and provides an immediate summary of schema, row counts, and data samples.
-
Supports direct access to local files and remote cloud storage objects (S3, GCS, Azure Blob, HTTP/HTTPS) with built-in secret management for secure credential handling.
-
Utilizes a robust 'read_any' macro that simplifies complex multi-format ingestion into a unified SQL interface.
-
Integrates tightly with other DuckDB skills, allowing you to seamlessly transition from file inspection to advanced SQL querying, database attachment, or data conversion.
-
To use, provide the filename or remote URL followed by an optional question (e.g., 'describe the data').
-
If you encounter format-specific errors, the skill automatically suggests missing extensions (like spatial or excel) for installation.
-
Ideal for rapid 'ad-hoc' data cleaning, sanity checking, and exploratory data analysis (EDA).
-
Note: This tool is strictly for data files; it is not intended for parsing or analyzing source code files. For large or persistent data needs, consider using the attach-db skill for better session state management.
Repository Stats
- Stars
- 436
- Forks
- 22
- Open Issues
- 2
- Language
- Shell
- Default Branch
- main
- Sync Status
- Idle
- Last Synced
- May 1, 2026, 09:52 AM