Data Analysis
loading-insurance-data avatar

loading-insurance-data

Load, validate, and preprocess weekly insurance policy CSV data with intelligent period detection and standardization.

Introduction

The loading-insurance-data skill provides a robust pipeline for managing weekly insurance policy datasets. Designed for data analysts and automated agents working within the insurance sector, this skill streamlines the transition from raw CSV files to cleaned, analysis-ready pandas DataFrames. It is essential for workflows requiring historical trend analysis, weekly performance tracking, and multi-year data integration.

  • Intelligent Period Detection: Automatically scans directories for policy files, identifying available week numbers based on standardized filename patterns.

  • Data Cleaning & Standardization: Normalizes critical fields including premium amounts, claim counts, and categorical data, while handling missing values and data type conversion (e.g., forcing numeric types, handling 'utf-8-sig' encoding).

  • Quality Assurance: Includes built-in validation checks for required fields, detection of negative premium values, and handling of empty datasets.

  • Batch Processing: Supports flexible range selection for multi-week loading, enabling time-series analysis and efficient cross-period comparisons.

  • Memory Management: Implements optimizations such as selective column loading and garbage collection to handle large volumes of weekly records without crashing.

  • Input: CSV files named with the pattern {YEAR}保单第{WEEK}周变动成本明细表.csv; expects standardized fields like policy_start_year, signed_premium_yuan, and third_level_organization.

  • Output: Cleaned and structured pandas DataFrames categorized by year and week for immediate ingestion into analytical dashboards or KPI calculation modules.

  • Prerequisites: Requires pandas and pathlib for local file system traversal and data manipulation.

  • Constraints: Primarily designed for weekly CSV reports; performance may degrade with exceptionally large files, requiring the use of the provided 'usecols' optimization feature.

  • Best Practice: Use this skill at the beginning of any insurance analysis task to ensure data integrity and consistent column schema across multiple temporal slices.

Repository Stats

Stars
0
Forks
0
Open Issues
0
Language
Python
Default Branch
main
Sync Status
Idle
Last Synced
May 3, 2026, 09:31 PM
View on GitHub