Files
blog/content/posts/gitea-mirror.md
Balakrishnan Balasubramanian 88ab4756e4 Update home page and asciinema poster
Also remove trailing whitespace in all files
2025-05-28 17:31:56 -04:00

210 lines
5.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Mirroring git repositories with Gitea"
date: 2024-06-27T10:05:46-04:00
draft: true
tags:
- gitea
- curl
- yq
categories:
- development
---
[Gitea][1] is an awesome self hosted git forge. I use the [pull-mirror][2]
feature to mirror many git repos (mostly from github). In this post, I want to
share few maintenance scripts I ran that connects to [gitea-api][14] using
[yq][] and [curl][].
<!--more-->
# TODO
* Change mirror interval
* Disable for mirror repos by default actions
* use jo https://github.com/jpmens/jo
* get just latest mirror and apply the settings
## Setup
### API token
Go to `https://gitea.balki.me/user/settings/applications` and Generate new
Token with write permission for repositories.
# TODO
set base url
## Why Mirror?
1. Remote sites may disappear one day.
2. Better local code search. Github does not allow code search without signing
in.
3. Tools like vim-plugins can use the local urls which is better for privacy.
4. Setup notification when a repository creates a new tag.
## Dumb crawlers problem
My gitea [instance][3] is public and had all mirror repos public as well. This
caused a huge network traffic from bots.
I created an [organization][4] without public visibility and made it own all
the mirror repos.
```
yq --version
yq (https://github.com/mikefarah/yq/) version v4.44.1
curl -V | head -c 11
curl 8.8.0
```
## API token
## Hooks for notification
[Hooks][5]
Lets first download the list of all mirror repos
```bash
TOKEN=d88446542e844f4da4ba75bbb85bd694a71907b5
curl "https://gitea.balki.me/api/v1/repos/search?limit=100&mode=mirror" \
-H "accept: application/json" \
-H "Authorization: token $TOKEN" \
-o mirror-repos.json
```
References
1. API doc: https://gitea.balki.me/api/swagger#/repository/repoSearch
Create a [hook][5] manually in one repo and get the hook using the API
```bash
curl -s "https://gitea.balki.me/api/v1/repos/MirrorWatch/snac2/hooks?page=1&limit=10" \
-H "Authorization: token $TOKEN" | yq -P -oj
```
Sample Output:
```json
[
{
"id": 32,
"type": "telegram",
"branch_filter": "tag",
"config": {
"content_type": "json",
"url": "https://api.telegram.org/bot1169894068:J1JVbV3f2vEQpdnPqFANfhjWZrFuUCJs1EW/sendMessage?chat_id=-1008910751069"
},
"events": [
"create"
],
"authorization_header": "",
"active": true,
"updated_at": "2024-06-20T20:54:33-04:00",
"created_at": "2024-06-20T20:54:33-04:00"
}
]
```
Now loop through all mirror repos and add the same webook. Remove unwanted fields like `id`, `created_at`, etc.,
```bash
yq -r '.data[] | .full_name' mirror-repos.json | while read -r repo; do
echo "$repo"
curl "https://gitea.balki.me/api/v1/repos/$repo/hooks" \
-H "Authorization: token $TOKEN" \
--json @- <<-EOM
{
"active": true,
"branch_filter": "tag",
"config": {
"content_type": "json",
"url": "https://api.telegram.org/bot1169894068:J1JVbV3f2vEQpdnPqFANfhjWZrFuUCJs1EW/sendMessage?chat_id=-1008910751069"
},
"events": [
"create"
],
"type": "telegram"
}
EOM
echo "============"
done
```
## Fixing issues and pr links
Fix the issue url setting in first repo as shown [here][7].
Get the json representataion.
```bash
yq '.data[] | .external_tracker ' mirror-repos.json | head
```
Sample output
```json
{
"external_tracker_url": "https://github.com/caddyserver/caddy/issues",
"external_tracker_format": "https://github.com/caddyserver/caddy/issues/{index}",
"external_tracker_style": "numeric",
"external_tracker_regexp_pattern": ""
}
```
Now loop throug all repos and update. Making sure only add to github repos and
they are not already updated
```bash
yq -r '.data[]
| select(.original_url == "*github*" and has("internal_tracker") )
| "\(.full_name) \(.original_url)"' mirror-repos.json | while read -r repo og; do
echo "Repo is $repo and github origin url is $og"
curl "https://gitea.balki.me/api/v1/repos/$repo" \
-H "Authorization: token $TOKEN" \
-X PATCH \
--json @- <<-EOM
{
"external_tracker": {
"external_tracker_url": "${og%.git}/issues",
"external_tracker_format": "${og%.git}/issues/{index}",
"external_tracker_style": "numeric",
"external_tracker_regexp_pattern": ""
}
}
EOM
done
yq -r '.data[] | .full_name' mirror-repos.json | while read -r repo; do
echo "$repo"
jo has_actions=false | curl "https://gitea.balki.me/api/v1/repos/$repo" \
-H "Authorization: token $TOKEN" \
-X PATCH \
--json @-
done
```
### Doc links
* yq: [select][8], [has][9], [string interpolation][10]
* bash: [parameter expansion][11], [here-doc][12]
* curl: [`--json`][13]
[1]: https://github.com/go-gitea/gitea
[2]: https://docs.gitea.com/usage/repo-mirror#pulling-from-a-remote-repository
[3]: https://gitea.balki.me
[4]: https://docs.gitea.com/usage/permissions#organization-repository
[5]: https://docs.gitea.com/usage/webhooks
[6]: https://docs.gitea.com/development/api-usage
[7]: https://github.com/go-gitea/gitea/issues/18986
[8]: https://mikefarah.gitbook.io/yq/operators/select
[9]: https://mikefarah.gitbook.io/yq/operators/has#select-checking-for-existence-of-deep-paths
[10]: https://mikefarah.gitbook.io/yq/operators/string-operators#interpolation
[11]: https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html
[12]: https://www.gnu.org/software/bash/manual/html_node/Redirections.html#Here-Documents
[13]: https://everything.curl.dev/http/post/json.html
[14]: https://gitea.balki.me/api/swagger#/repository/repoSearch