gemini-flash-budgetlisted
Install: claude install-skill baronguyen001/ai-automation-skills
# Gemini Flash Budget
Use this skill when you need to run a model over many rows for a simple, well-scoped task - extract a ticker, classify a sentence, pull one field. On those prompts the model's "thinking" adds latency and thinking-token cost for no quality gain. Setting `thinking_budget=0` on a Flash model gives the cheapest, fastest path; the savings compound across thousands of calls.
## When to invoke
- User says: "make these gemini calls cheaper" / "disable thinking" / "bulk classify with flash" / "high-volume extraction"
- Code in the conversation uses: a loop of many small Gemini calls for extraction or classification.
## When NOT to invoke
- The task needs multi-step reasoning, where thinking actually improves accuracy (keep a thinking budget then).
- The cost is on the input side from a repeated prefix (use [[gemini-prompt-cache]] instead).
## Concrete example
User input:
```text
I run 5,000 Gemini calls a day just to map company names to tickers. Cut the cost.
```
Output:
```python
# Copy assets/flash_call.py into your project, then:
from flash_call import flash_extract_many
prompts = [f"Ticker for: {name}?" for name in company_names]
tickers = flash_extract_many(prompts) # Flash, temperature 0, no thinking tokens
```
The helper reads `GEMINI_API_KEY` from the environment and requires `google-genai`.
## Pattern to apply
1. Confirm the task is simple and well-scoped - the kind where reasoning adds nothing.
2. Pick a Flash model and set `temperature