2021-07-25 21:03:40 +03:00
|
|
|
# @mtqt/html-parser
|
2021-04-08 12:19:38 +03:00
|
|
|
|
2021-07-25 21:03:40 +03:00
|
|
|
> HTML entities parser for mtqt
|
2021-04-08 12:19:38 +03:00
|
|
|
|
|
|
|
This package implements formatting syntax based on HTML, similar to the one available in the Bot
|
|
|
|
API ([documented here](https://core.telegram.org/bots/api#html-style))
|
|
|
|
|
|
|
|
> **NOTE**: The syntax implemented here is not entirely compatible with Bot API _HTML_.
|
|
|
|
>
|
|
|
|
> Please read [Syntax](#syntax) below for a detailed explanation
|
|
|
|
|
|
|
|
## Usage
|
|
|
|
|
|
|
|
```typescript
|
2021-07-25 21:03:40 +03:00
|
|
|
import { TelegramClient } from '@mtqt/client'
|
|
|
|
import { HtmlMessageEntityParser, html } from '@mtqt/html-parser'
|
2021-04-08 12:19:38 +03:00
|
|
|
|
|
|
|
const tg = new TelegramClient({ ... })
|
|
|
|
tg.registerParseMode(new HtmlMessageEntityParser())
|
|
|
|
|
|
|
|
tg.sendText(
|
|
|
|
'me',
|
2021-07-02 20:20:29 +03:00
|
|
|
html`Hello, <b>me</b>! Updates from the feed:\n${await getUpdatesFromFeed()}`
|
2021-04-08 12:19:38 +03:00
|
|
|
)
|
|
|
|
```
|
|
|
|
|
|
|
|
## Syntax
|
|
|
|
|
2021-07-25 21:03:40 +03:00
|
|
|
`@mtqt/html-parser` uses [`htmlparser2`](https://www.npmjs.com/package/htmlparser2) under the hood, so the parser
|
2021-04-08 12:19:38 +03:00
|
|
|
supports nearly any HTML. However, since the text is still processed in a custom way for Telegram, the supported subset
|
|
|
|
of features is documented below:
|
|
|
|
|
2021-07-23 23:03:03 +03:00
|
|
|
## Line breaks
|
|
|
|
|
|
|
|
Line breaks are preserved, `<br>` are ignored.
|
|
|
|
|
|
|
|
> ⚠️ Warning for **Prettier** users: be aware that Prettier
|
|
|
|
> formats tagged template literals with `html` as normal HTML and may add
|
|
|
|
> unwanted line breaks.
|
|
|
|
>
|
|
|
|
> Use `htm` instead (which is just an alias):
|
|
|
|
> ```typescript
|
2021-07-25 21:03:40 +03:00
|
|
|
> import { htm } from '@mtqt/html-parser'
|
2021-07-23 23:03:03 +03:00
|
|
|
>
|
|
|
|
> await msg.answerText(htm`Hello, <b>${msg.sender.username}</b>`)
|
|
|
|
> ```
|
|
|
|
|
2021-04-08 12:19:38 +03:00
|
|
|
## Inline entities
|
|
|
|
|
|
|
|
Inline entities are entities that are in-line with other text. We support these entities:
|
|
|
|
|
|
|
|
| Name | Code | Result (visual)
|
|
|
|
|---|---|---|
|
|
|
|
| Bold | `<b>text</b>` | **text**
|
|
|
|
| Italic | `<b>text</b>` | _text_
|
|
|
|
| Underline | `<u>text</u>` | <u>text</u>
|
|
|
|
| Strikethrough | `<s>text</s>` | ~~text~~
|
|
|
|
| Monospace (code) | `<code>text</code>` | `text`
|
|
|
|
| Text link | `<a href="https://google.com">Google</a>` | [Google](https://google.com)
|
|
|
|
| Text mention | `<a href="tg://user?id=1234567">Name</a>` | N/A
|
|
|
|
|
|
|
|
> **Note**: `<strong>`, `<em>`, `<ins>`, `<strike>`, `<del>` are not supported because they are redundant
|
|
|
|
|
|
|
|
> **Note**: It is up to the client to look up user's input entity by ID for text mentions.
|
|
|
|
> In most cases, you can only use IDs of users that were seen by the client while using given storage.
|
|
|
|
>
|
|
|
|
> Alternatively, you can explicitly provide access hash like this:
|
|
|
|
> `<a href="tg://user?id=1234567&hash=abc">Name</a>`, where `abc` is user's access hash
|
|
|
|
> written as a base-16 *unsigned* integer. Order of the parameters does matter, i.e.
|
|
|
|
> `tg://user?hash=abc&id=1234567` will not be processed as expected.
|
|
|
|
|
|
|
|
## Block entities
|
|
|
|
|
|
|
|
The only block entity that Telegram supports is `<pre>`, therefore it is the only tag we support too.
|
|
|
|
|
|
|
|
Optionally, language for `<pre>` block can be specified like this:
|
|
|
|
|
|
|
|
```html
|
|
|
|
<pre language="typescript">export type Foo = 42</pre>
|
|
|
|
```
|
|
|
|
|
|
|
|
> However, since syntax highlighting hasn't been implemented in
|
|
|
|
> official Telegram clients, this doesn't really matter 🤷♀️
|
|
|
|
|
|
|
|
| Code | Result (visual)
|
|
|
|
|---|---|
|
|
|
|
| <pre><pre>multiline\ntext</pre></pre> | <pre>multiline<br>text</pre>
|
|
|
|
| <pre><pre language="javascript"><br> export default 42<br></pre></pre> | <pre>export default 42</pre>
|
|
|
|
|
|
|
|
## Nested and overlapped entities
|
|
|
|
|
|
|
|
HTML is a nested language, and so is this parser. It does support nested entities, but overlapped entities will not work
|
|
|
|
as expected!
|
|
|
|
|
|
|
|
Overlapping entities are supported in `unparse()`, though.
|
|
|
|
|
|
|
|
| Code | Result (visual)
|
|
|
|
|---|---|
|
|
|
|
| `<b>Welcome back, <i>User</i>!</b>` | **Welcome back, _User_!**
|
|
|
|
| `<b>bold <i>and</b> italic</i>` | **bold _and_** italic<br>⚠️ <i>word "italic" is not actually italic!</i>
|
|
|
|
| `<b>bold <i>and</i></b><i> italic</i>`<br>⚠️ <i>this is how <code>unparse()</code> handles overlapping entities</i> | **
|
|
|
|
bold _and_** _italic_
|
|
|
|
|
|
|
|
## Escaping
|
|
|
|
|
|
|
|
Escaping in this parser works exactly the same as in `htmlparser2`.
|
|
|
|
|
|
|
|
This means that you can keep `<>&` symbols as-is in some cases. However, when dealing with user input, it is always
|
2021-07-02 20:20:29 +03:00
|
|
|
better to use [`HtmlMessageEntityParser.escape`](./classes/htmlmessageentityparser.html#escape) or, even better,
|
|
|
|
`html` helper:
|
|
|
|
|
|
|
|
```typescript
|
2021-07-25 21:03:40 +03:00
|
|
|
import { html } from '@mtqt/html-parser'
|
2021-07-02 20:20:29 +03:00
|
|
|
|
|
|
|
const username = 'Boris <&>'
|
|
|
|
const text = html`Hi, ${username}!`
|
|
|
|
console.log(text) // Hi, Boris &lt;&amp;&gt;!
|
|
|
|
```
|