mtcute/packages/html-parser
Alina Tumanova f5976a2d74
ESM + end-to-end tests (#11)
* feat: moved tl-runtime to esm and native ArrayBuffers

* feat: migration to esm

* fix(core): web-related fixes

* test: finally, some good fucking e2e

* chore: fixed linters etc

* ci: added e2e to ci

* build(tl): fixed gen-code on node 20

* fix: codegen Uint8Array, not Buffer

never `git reset --hard` kids

* build: only do type-aware linting for `packages/*`

* build: ignore no-unresolved in ci for e2e

* fix: node 16 doesn't have subtle crypto apparently?

* fix(tests): use Uint8Array

for gods sake please can i just merge this already

* ci: don't parallel tasks in ci

because machines are utter garbage and it may just randomly break

* ci: pass secrets to e2e tests

* ci: separate cli command for ci

apparently im retarded

* fix: run codegen in e2e

im actually retarded

* ci: more fixes for e2e

* ci: debugging stuff

* ci: still debugging

* ci: hopefully fix ci???
2023-10-16 19:23:53 +03:00
..
src refactor: initial support for esm 2023-10-11 08:42:37 +03:00
tests ESM + end-to-end tests (#11) 2023-10-16 19:23:53 +03:00
package.json ESM + end-to-end tests (#11) 2023-10-16 19:23:53 +03:00
README.md fix: support <tg-emoji> and tg-spoiler in html parser 2023-09-18 03:40:20 +03:00
tsconfig.json ESM + end-to-end tests (#11) 2023-10-16 19:23:53 +03:00
typedoc.js chore: code quality improvements 2023-06-05 00:30:48 +00:00

@mtcute/html-parser

HTML entities parser for mtcute

This package implements formatting syntax based on HTML, similar to the one available in the Bot API (documented here)

NOTE: The syntax implemented here is incompatible with Bot API HTML.

Please read Syntax below for a detailed explanation

Usage

import { TelegramClient } from '@mtcute/client'
import { HtmlMessageEntityParser, html } from '@mtcute/html-parser'

const tg = new TelegramClient({ ... })
tg.registerParseMode(new HtmlMessageEntityParser())

tg.sendText(
    'me',
    html`Hello, <b>me</b>! Updates from the feed:<br>${await getUpdatesFromFeed()}`
)

Syntax

@mtcute/html-parser uses htmlparser2 under the hood, so the parser supports nearly any HTML. However, since the text is still processed in a custom way for Telegram, the supported subset of features is documented below:

Line breaks and spaces

Line breaks are not preserved, <br> is used instead, making the syntax very close to the one used when building web pages.

Multiple spaces and indents are collapsed (except in pre), when you do need multiple spaces use &nbsp; instead.

Inline entities

Inline entities are entities that are in-line with other text. We support these entities:

Name Code Result (visual)
Bold <b>text</b> text
Italic <b>text</b> text
Underline <u>text</u> text
Strikethrough <s>text</s> text
Spoiler <spoiler>text</spoiler> (or tg-spoiler) N/A
Monospace (code) <code>text</code> text
Text link <a href="https://google.com">Google</a> Google
Text mention <a href="tg://user?id=1234567">Name</a> N/A
Custom emoji <emoji id="12345">😄</emoji> (or <tg-emoji emoji-id="...">) N/A

Note

: <strong>, <em>, <ins>, <strike>, <del> are not supported because they are redundant

Note

In most cases, you can only use IDs of users that were seen by the client while using given storage.

Alternatively, you can explicitly provide access hash like this: <a href="tg://user?id=1234567&hash=abc">Name</a>, where abc is user's access hash written as a hexadecimal integer. Order of the parameters does matter, i.e. tg://user?hash=abc&id=1234567 will not be processed as expected.

Block entities

The only block entity that Telegram supports is <pre>, therefore it is the only tag we support too.

Optionally, language for <pre> block can be specified like this:

<pre language="typescript">export type Foo = 42</pre>

However, since syntax highlighting hasn't been implemented in official Telegram clients except WebA, this doesn't really matter 🤷‍♀️

Code Result (visual)
<pre>multiline\ntext</pre>
multiline
text
<pre language="javascript">
export default 42
</pre>
export default 42

Nested and overlapped entities

HTML is a nested language, and so is this parser. It does support nested entities, but overlapped entities will not work as expected!

Overlapping entities are supported in unparse(), though.

Code Result (visual)
<b>Welcome back, <i>User</i>!</b> Welcome back, User!
<b>bold <i>and</b> italic</i> bold and italic
⚠️ word "italic" is not actually italic!
<b>bold <i>and</i></b><i> italic</i>
⚠️ this is how unparse() handles overlapping entities
bold and italic

Escaping

Escaping in this parser works exactly the same as in htmlparser2.

This means that you can keep <>& symbols as-is in some cases. However, when dealing with user input, it is always better to use HtmlMessageEntityParser.escape or, even better, html helper:

import { html } from '@mtcute/html-parser'

const username = 'Boris <&>'
const text = html`Hi, ${username}!`
console.log(text) // Hi, Boris &amp;lt;&amp;amp;&amp;gt;!