homoglyph-detector

Solid

Byte-level Unicode homoglyph detection for identifying invisible character substitutions in code

AI & Automation 814 stars 53 forks Updated today MIT

Install

View on GitHub

Quality Score: 95/100

Stars 20%
97
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Homoglyph Detector Byte-level forensic analysis of code changes to detect Unicode homoglyph substitutions — characters that look identical to ASCII in every editor and diff tool but have different codepoints, silently breaking string comparisons, dictionary lookups, and identifier resolution. ## Purpose Homoglyph attacks (related to CVE-2021-42574 "Trojan Source") are the highest-stealth trojan technique. A Cyrillic `р` (U+0440) looks identical to a Latin `p` (U+0070) in every font, editor, and diff viewer. The only way to detect it is byte-level analysis via `hexdump`. This skill pipes git diffs through `hexdump -C` and scans for multi-byte UTF-8 sequences where single-byte ASCII is expected, particularly in string literals used as dictionary keys, variable names, and identifiers. ## Capabilities ### Confusable Character Detection Scans for these high-risk Unicode confusables: | Latin | Cyrillic | Greek | UTF-8 Bytes | |-------|----------|-------|-------------| | a (61) | а (D0 B0) | α (CE B1) | 1 vs 2 bytes | | c (63) | с (D1 81) | — | 1 vs 2 bytes | | e (65) | е (D0 B5) | ε (CE B5) | 1 vs 2 bytes | | o (6F) | о (D0 BE) | ο (CE BF) | 1 vs 2 bytes | | p (70) | р (D1 80) | ρ (CF 81) | 1 vs 2 bytes | | x (78) | х (D1 85) | χ (CF 87) | 1 vs 2 bytes | | y (79) | у (D1 83) | — | 1 vs 2 bytes | ### Zero-Width Character Detection - U+200B — Zero-width space - U+200C — Zero-width non-joiner - U+200D — Zero-width joiner - U+FEFF — Byte order mark (in non-BOM position) ### ...

Details

Author
a5c-ai
Repository
a5c-ai/babysitter
Created
4 months ago
Last Updated
today
Language
JavaScript
License
MIT

Related Skills