linear-regressionlisted
Install: claude install-skill vermapragya/analytics-skill
# Linear Regression
## When to use this skill
Use for **continuous outcome prediction or explanation** where interpretability matters. Triggers:
- "Predict revenue / session time / NPS"
- "What drives <continuous metric>?"
- "Linear regression for…"
- "OLS"
- "Explain the variation in…"
For binary outcomes use `logistic-regression`. For time-to-event use `survival-analysis`. For pure prediction with non-linear effects, fit linear first as baseline, then suggest gradient boosting.
## Required inputs
| Input | Why it matters |
|---|---|
| Continuous target | What you're predicting (numeric, ideally not heavily skewed) |
| Feature set | Predictors |
| Observation grain | Per user / per session / per geography |
| Train/test strategy | Temporal split for production, random for exploratory |
| Goal | Pure prediction, coefficient interpretation, or both? |
## Workflow
1. **Audit the data** (`data-quality-audit` skill).
2. **Inspect the target distribution.**
```python
import matplotlib.pyplot as plt
df[target].hist(bins=50)
df[target].describe()
```
- Symmetric, finite variance → OK for OLS
- Heavily right-skewed (revenue, time-on-page) → consider `log(1 + target)` transformation
- Heavy-tailed with extreme outliers → robust regression or winsorize
3. **Check for outliers** at p99 and p99.9. Decide:
- Cap at p99 (winsorize)
- Drop with documented justification
- Keep and use robust regression
4. **Verify no leakage** (same checks as logis