Files
blog/content/posts/gitea-mirror.md

5.5 KiB
Raw Blame History

title, date, draft, tags, categories
title date draft tags categories
Mirroring git repositories with Gitea 2024-06-27T10:05:46-04:00 true
gitea
curl
yq
development

Gitea is an awesome self hosted git forge. I use the pull-mirror feature to mirror many git repos (mostly from github). In this post, I want to share few maintenance scripts I ran that connects to gitea-api using [yq][] and [curl][].

TODO

  • Change mirror interval
  • Disable for mirror repos by default actions
  • use jo https://github.com/jpmens/jo
  • get just latest mirror and apply the settings

Setup

API token

Go to https://gitea.balki.me/user/settings/applications and Generate new Token with write permission for repositories.

TODO

set base url

Why Mirror?

  1. Remote sites may disappear one day.
  2. Better local code search. Github does not allow code search without signing in.
  3. Tools like vim-plugins can use the local urls which is better for privacy.
  4. Setup notification when a repository creates a new tag.

Dumb crawlers problem

My gitea instance is public and had all mirror repos public as well. This caused a huge network traffic from bots.

I created an organization without public visibility and made it own all the mirror repos.

 yq --version
yq (https://github.com/mikefarah/yq/) version v4.44.1

 curl -V | head -c 11
curl 8.8.0 

API token

Hooks for notification

Hooks

Lets first download the list of all mirror repos

TOKEN=d88446542e844f4da4ba75bbb85bd694a71907b5

curl "https://gitea.balki.me/api/v1/repos/search?limit=100&mode=mirror" \
	-H "accept: application/json" \
	-H "Authorization: token $TOKEN" \
	-o mirror-repos.json

References

  1. API doc: https://gitea.balki.me/api/swagger#/repository/repoSearch

Create a hook manually in one repo and get the hook using the API

curl -s "https://gitea.balki.me/api/v1/repos/MirrorWatch/snac2/hooks?page=1&limit=10" \
        -H "Authorization: token $TOKEN" | yq -P -oj

Sample Output:

[
  {
    "id": 32,
    "type": "telegram",
    "branch_filter": "tag",
    "config": {
      "content_type": "json",
      "url": "https://api.telegram.org/bot1169894068:J1JVbV3f2vEQpdnPqFANfhjWZrFuUCJs1EW/sendMessage?chat_id=-1008910751069"
    },
    "events": [
      "create"
    ],
    "authorization_header": "",
    "active": true,
    "updated_at": "2024-06-20T20:54:33-04:00",
    "created_at": "2024-06-20T20:54:33-04:00"
  }
]

Now loop through all mirror repos and add the same webook. Remove unwanted fields like id, created_at, etc.,

yq -r '.data[] | .full_name' mirror-repos.json | while read -r repo; do
	echo "$repo"

	curl "https://gitea.balki.me/api/v1/repos/$repo/hooks" \
		-H "Authorization: token $TOKEN" \
		--json @- <<-EOM
			{
				"active": true,
				"branch_filter": "tag",
				"config": {
					"content_type": "json",
					"url": "https://api.telegram.org/bot1169894068:J1JVbV3f2vEQpdnPqFANfhjWZrFuUCJs1EW/sendMessage?chat_id=-1008910751069"
                },
				"events": [
					"create"
				],
				"type": "telegram"
			}
		EOM

	echo "============"
done

Fix the issue url setting in first repo as shown here.

Get the json representataion.

yq '.data[] | .external_tracker ' mirror-repos.json | head 

Sample output

{
  "external_tracker_url": "https://github.com/caddyserver/caddy/issues",
  "external_tracker_format": "https://github.com/caddyserver/caddy/issues/{index}",
  "external_tracker_style": "numeric",
  "external_tracker_regexp_pattern": ""
}

Now loop throug all repos and update. Making sure only add to github repos and they are not already updated

yq -r '.data[] 
| select(.original_url == "*github*" and has("internal_tracker") )
| "\(.full_name) \(.original_url)"' mirror-repos.json | while read -r repo og; do
	echo "Repo is $repo and github origin url is $og"
	curl "https://gitea.balki.me/api/v1/repos/$repo" \
		-H "Authorization: token $TOKEN" \
		-X PATCH \
		--json @- <<-EOM
			{
					"external_tracker": {
					"external_tracker_url": "${og%.git}/issues",
					"external_tracker_format": "${og%.git}/issues/{index}",
					"external_tracker_style": "numeric",
					"external_tracker_regexp_pattern": ""
					}
			}
		EOM
done
yq -r '.data[] | .full_name' mirror-repos.json | while read -r repo; do
	echo "$repo"
	jo has_actions=false | curl "https://gitea.balki.me/api/v1/repos/$repo" \
		-H "Authorization: token $TOKEN" \
		-X PATCH \
		--json @-
done