Cloning all repositories from a GitHub organization

Recently I was tasked with finding out how many teams used a feature that we (the developer-productivity team) wanted to deprecate, so naturally we'd just rather grep for FOO=60 throughout the entire org rather than asking around.

Disclaimer: I know I could have just used the GitHub search - but I like poking at their api to see what it does, and it resulted in this post which might be useful for other mass download needs.

The setup

The main thing behind this is the gh api client - it allows one to quickly make (authenticated!!) api calls to so you don't get rate limited as quickly. So having that installed is a pre-requisite.

With that done, it's easy enough to run gh api /users/#{username} to get a listing of your output:

With this we can expand the logic to fetching repository listings:

gh api /users/lindgrenj6/repos

Output with a random repo selected here:

(switched to JSON output since this is rather tough)

Great, we have a clone_url attribute, how can we loop over potentially hundreds of repos?

Well, you can either use jq like so:

gh api /users/lindgrenj6/repos | jq .[].clone_url

or use a real programming language and parse the JSON into a hash or dictionary like normal. But this doesn't fix the fact that github seems to only return 100 results at max - without any metadata on how many are available, what gives?

Enter: not quite obvious URL query parameters

After some searching around I found that the api has two important params:

  • page - which is kind of like offset
  • page_size - which is kind of like limit

Perfect, now we can paginate. So I threw together a quick little ruby script to do the dirty work for us:

#!/usr/bin/env ruby
require 'json'

USER = "RedHatInsights"

page = 1

loop do
  output = `gh api "/users/#{USER}/repos?per_page=#{PAGE_SIZE}&page=#{page}"`
  lines = JSON.parse(output).map {|e| e["clone_url"]}
  break if lines.count == 0

  threads = do |repo|
    puts "cloning #{repo}" { system("git clone #{repo}") }

  page += 1

This script basically paginates 100 repositories at a time - downloading them in parallel via Ruby's Thread class. It only does 100 at a time so hopefully it won't get throttled too badly.

Caveat: If any of the repos here are private and require basic auth or something cloning will prompt for a password - so you may have to mash enter to get past those. Other than that this worked great.

Ran ag DEPLOY_TIMEOUT for the ENV var we're going to use and found the people I need to talk to. Huzzah!

Jacob Lindgren

Jacob Lindgren

Nebraska, USA