--- title: "Mirroring git repositories with Gitea" date: 2024-06-27T10:05:46-04:00 draft: true tags: - gitea - curl - yq categories: - development --- [Gitea][1] is an awesome self hosted git forge. I use the [pull-mirror][2] feature to mirror many git repos (mostly from github). In this post, I want to share few maintenance scripts I ran that connects to [gitea-api][14] using [yq][] and [curl][]. # TODO * Change mirror interval * Disable for mirror repos by default actions * use jo https://github.com/jpmens/jo * get just latest mirror and apply the settings ## Setup ### API token Go to `https://gitea.balki.me/user/settings/applications` and Generate new Token with write permission for repositories. # TODO set base url ## Why Mirror? 1. Remote sites may disappear one day. 2. Better local code search. Github does not allow code search without signing in. 3. Tools like vim-plugins can use the local urls which is better for privacy. 4. Setup notification when a repository creates a new tag. ## Dumb crawlers problem My gitea [instance][3] is public and had all mirror repos public as well. This caused a huge network traffic from bots. I created an [organization][4] without public visibility and made it own all the mirror repos. ``` ❯ yq --version yq (https://github.com/mikefarah/yq/) version v4.44.1 ❯ curl -V | head -c 11 curl 8.8.0 ``` ## API token ## Hooks for notification [Hooks][5] Lets first download the list of all mirror repos ```bash TOKEN=d88446542e844f4da4ba75bbb85bd694a71907b5 curl "https://gitea.balki.me/api/v1/repos/search?limit=100&mode=mirror" \ -H "accept: application/json" \ -H "Authorization: token $TOKEN" \ -o mirror-repos.json ``` References 1. API doc: https://gitea.balki.me/api/swagger#/repository/repoSearch Create a [hook][5] manually in one repo and get the hook using the API ```bash curl -s "https://gitea.balki.me/api/v1/repos/MirrorWatch/snac2/hooks?page=1&limit=10" \ -H "Authorization: token $TOKEN" | yq -P -oj ``` Sample Output: ```json [ { "id": 32, "type": "telegram", "branch_filter": "tag", "config": { "content_type": "json", "url": "https://api.telegram.org/bot1169894068:J1JVbV3f2vEQpdnPqFANfhjWZrFuUCJs1EW/sendMessage?chat_id=-1008910751069" }, "events": [ "create" ], "authorization_header": "", "active": true, "updated_at": "2024-06-20T20:54:33-04:00", "created_at": "2024-06-20T20:54:33-04:00" } ] ``` Now loop through all mirror repos and add the same webook. Remove unwanted fields like `id`, `created_at`, etc., ```bash yq -r '.data[] | .full_name' mirror-repos.json | while read -r repo; do echo "$repo" curl "https://gitea.balki.me/api/v1/repos/$repo/hooks" \ -H "Authorization: token $TOKEN" \ --json @- <<-EOM { "active": true, "branch_filter": "tag", "config": { "content_type": "json", "url": "https://api.telegram.org/bot1169894068:J1JVbV3f2vEQpdnPqFANfhjWZrFuUCJs1EW/sendMessage?chat_id=-1008910751069" }, "events": [ "create" ], "type": "telegram" } EOM echo "============" done ``` ## Fixing issues and pr links Fix the issue url setting in first repo as shown [here][7]. Get the json representataion. ```bash yq '.data[] | .external_tracker ' mirror-repos.json | head ``` Sample output ```json { "external_tracker_url": "https://github.com/caddyserver/caddy/issues", "external_tracker_format": "https://github.com/caddyserver/caddy/issues/{index}", "external_tracker_style": "numeric", "external_tracker_regexp_pattern": "" } ``` Now loop throug all repos and update. Making sure only add to github repos and they are not already updated ```bash yq -r '.data[] | select(.original_url == "*github*" and has("internal_tracker") ) | "\(.full_name) \(.original_url)"' mirror-repos.json | while read -r repo og; do echo "Repo is $repo and github origin url is $og" curl "https://gitea.balki.me/api/v1/repos/$repo" \ -H "Authorization: token $TOKEN" \ -X PATCH \ --json @- <<-EOM { "external_tracker": { "external_tracker_url": "${og%.git}/issues", "external_tracker_format": "${og%.git}/issues/{index}", "external_tracker_style": "numeric", "external_tracker_regexp_pattern": "" } } EOM done yq -r '.data[] | .full_name' mirror-repos.json | while read -r repo; do echo "$repo" jo has_actions=false | curl "https://gitea.balki.me/api/v1/repos/$repo" \ -H "Authorization: token $TOKEN" \ -X PATCH \ --json @- done ``` ### Doc links * yq: [select][8], [has][9], [string interpolation][10] * bash: [parameter expansion][11], [here-doc][12] * curl: [`--json`][13] [1]: https://github.com/go-gitea/gitea [2]: https://docs.gitea.com/usage/repo-mirror#pulling-from-a-remote-repository [3]: https://gitea.balki.me [4]: https://docs.gitea.com/usage/permissions#organization-repository [5]: https://docs.gitea.com/usage/webhooks [6]: https://docs.gitea.com/development/api-usage [7]: https://github.com/go-gitea/gitea/issues/18986 [8]: https://mikefarah.gitbook.io/yq/operators/select [9]: https://mikefarah.gitbook.io/yq/operators/has#select-checking-for-existence-of-deep-paths [10]: https://mikefarah.gitbook.io/yq/operators/string-operators#interpolation [11]: https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html [12]: https://www.gnu.org/software/bash/manual/html_node/Redirections.html#Here-Documents [13]: https://everything.curl.dev/http/post/json.html [14]: https://gitea.balki.me/api/swagger#/repository/repoSearch