glean-data-handling

Featured

PII filtering: strip emails, phone numbers, SSNs from document body before indexing. Trigger: "glean data handling", "data-handling".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Glean Data Handling ## Overview Glean enterprise search ingests documents from dozens of connectors (Google Drive, Confluence, Slack, Jira, Salesforce, etc.) and builds a unified search index with permission-aware access control. Data types include indexed document content, connector metadata, user permission maps, query logs, and search analytics. All document content must be PII-filtered before indexing, permission boundaries must be preserved to prevent data leakage across teams, and retention policies must be enforced to comply with corporate governance and GDPR/CCPA obligations. ## Data Classification | Data Type | Sensitivity | Retention | Encryption | |-----------|-------------|-----------|------------| | Indexed document content | High (may contain PII) | Per source retention policy | AES-256 at rest | | User permission maps | High (access control) | Sync lifecycle | TLS + at rest | | Connector metadata | Medium | Until connector removed | AES-256 at rest | | Search query logs | Medium (reveals intent) | 90 days default | AES-256 at rest | | Search analytics/aggregates | Low | 1 year | TLS in transit | ## Data Import ```typescript interface GleanDocument { id: string; datasource: string; title: string; body: string; permissions: { allowedUsers?: string[]; allowAnonymousAccess?: boolean }; updatedAt: string; url: string; } async function indexDocuments(docs: GleanDocument[], datasource: string) { // PII strip before indexing const sanitized = docs.ma...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
7 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured

glean-core-workflow-b

Execute Glean secondary workflow: bulk document indexing, custom datasource connectors, and content lifecycle management via the Indexing API. Trigger: "glean bulk index", "glean custom connector", "glean datasource", "glean indexing".

2,266 Updated today
jeremylongshore
AI & Automation Featured

glean-performance-tuning

Optimize Glean search relevance and indexing throughput with batch sizing, datasource configuration, and content quality improvements. Trigger: "glean performance", "glean search quality", "glean indexing speed".

2,266 Updated today
jeremylongshore
AI & Automation Featured

glean-cost-tuning

Optimize Glean costs by managing indexed content volume, datasource efficiency, and connector resource usage. Trigger: "glean costs", "glean optimization", "reduce glean indexing".

2,266 Updated today
jeremylongshore
AI & Automation Featured

glean-migration-deep-dive

Migrate from Elasticsearch/Algolia: 1) Export all documents from source, 2) Transform to Glean document schema (id, title, url, body, permissions), 3) Create datasource with adddatasource, 4) Bulk index with bulkindexdocuments, 5) Validate search quality with test queries, 6) Switch search UI to use Glean Client API. Trigger: "glean migration deep dive", "migration-deep-dive".

2,266 Updated today
jeremylongshore
AI & Automation Featured

glean-hello-world

Index documents into Glean and search them back using the Indexing and Client APIs. Use when starting a new Glean custom connector, testing search quality, or learning the index/search pattern. Trigger: "glean hello world", "glean example", "glean index document", "glean search".

2,266 Updated today
jeremylongshore