datanalysis-credit-risk

Solid

Credit risk data cleaning and variable screening pipeline for pre-loan modeling. Use when working with raw credit data that needs quality assessment, missing value analysis, or variable selection before modeling. it covers data loading and formatting, abnormal period filtering, missing rate calculation, high-missing variable removal,low-IV variable filtering, high-PSI variable removal, Null Importance denoising, high-correlation variable removal, and cleaning report generation. Applicable scenarios arecredit risk data cleaning, variable screening, pre-loan modeling preprocessing.

Data & Documents 34,233 stars 4188 forks Updated today MIT

Install

View on GitHub

Quality Score: 93/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Data Cleaning and Variable Screening ## Quick Start ```bash # Run the complete data cleaning pipeline python ".github/skills/datanalysis-credit-risk/scripts/example.py" ``` ## Complete Process Description The data cleaning pipeline consists of the following 11 steps, each executed independently without deleting the original data: 1. **Get Data** - Load and format raw data 2. **Organization Sample Analysis** - Statistics of sample count and bad sample rate for each organization 3. **Separate OOS Data** - Separate out-of-sample (OOS) samples from modeling samples 4. **Filter Abnormal Months** - Remove months with insufficient bad sample count or total sample count 5. **Calculate Missing Rate** - Calculate overall and organization-level missing rates for each feature 6. **Drop High Missing Rate Features** - Remove features with overall missing rate exceeding threshold 7. **Drop Low IV Features** - Remove features with overall IV too low or IV too low in too many organizations 8. **Drop High PSI Features** - Remove features with unstable PSI 9. **Null Importance Denoising** - Remove noise features using label permutation method 10. **Drop High Correlation Features** - Remove high correlation features based on original gain 11. **Export Report** - Generate Excel report containing details and statistics of all steps ## Core Functions | Function | Purpose | Module | |------|------|----------| | `get_dataset()` | Load and format data | references.func | | `org_analysis()` | ...

Details

Author
github
Repository
github/awesome-copilot
Created
11 months ago
Last Updated
today
Language
Python
License
MIT

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

credit-risk

Audit credit risk modeling software for scoring algorithm accuracy, regulatory compliance (ECOA, FCRA, SR 11-7), bias and disparate impact testing, model governance lifecycle, and explainability. Covers logistic regression, GBM, neural net evaluation, protected class proxy detection, adverse action notice generation, SHAP/LIME explainability, champion-challenger frameworks, and PSI drift monitoring. Use when reviewing lending platforms, underwriting engines, credit scoring APIs, fintech decisioning systems, or any codebase that scores creditworthiness or generates approval/denial decisions.

4 Updated yesterday
tinh2
Data & Documents Listed

clean-data

Interactive data profiling and cleaning assistant for medical research. Three-stage workflow (profile, flag, code-generate) with user approval gates at each step. Handles missing values, outliers, duplicates, and type mismatches in CSV/Excel clinical data. Does NOT auto-clean — all decisions require researcher confirmation.

126 Updated today
Aperivue
AI & Automation Solid

data-quality-auditor

Audit datasets for completeness, consistency, accuracy, and validity. Profile data distributions, detect anomalies and outliers, surface structural issues, and produce an actionable remediation plan.

16,782 Updated 3 days ago
alirezarezvani
Data & Documents Listed

data-wrangling

Data cleaning, transformation, reshaping, joins, missing data handling, and tidy data principles. Covers the full pipeline from raw ingestion to analysis-ready datasets -- type coercion, deduplication, outlier detection, normalization, melting/pivoting, regex extraction, and reproducible transformation chains. Use when preparing, cleaning, or transforming data for analysis.

62 Updated today
Tibsfox
AI & Automation Listed

data-validation

QA an analysis before sharing with stakeholders — methodology checks, accuracy verification, and bias detection. Use when reviewing an analysis for errors, checking for survivorship bias, validating aggregation logic, or preparing documentation for reproducibility.

1 Updated today
Safen99