Working with GitHub: Collaboration, Workflow, and the Wider Ecosystem

Git is the version control software managing local files, while GitHub is the cloud platform offering collaboration tools. This guide explores using GitHub effectively, including repository setup, file management, and collaboration features.

Git and GitHub get spoken about as if they were the same thing, but they are not. Git is the version control software that runs on your own machine, quietly tracking every change you make to your files. GitHub is the cloud platform built on top of Git, hosting your repositories and adding a whole layer of collaboration tools that Git alone does not provide. Git is the engine; GitHub is the place where teams gather around that engine to build software together. This guide walks through how to actually work with GitHub day to day, and then steps back to look at its plans, tools, and billing so you can make sense of the wider ecosystem.

What GitHub Actually Adds

At its heart, a repository, or repo, is a project folder that carries its full version history along with it, plus issues and pull requests for coordinating work. That repo can live in two places at once. The local repo sits on your computer, where you do your actual work. The remote repo lives on GitHub, where your collaborators can reach it. The skill of using GitHub well is really the skill of keeping these two in sync and using GitHub’s collaboration features to work alongside other people without stepping on each other.

It is worth remembering that GitHub is also a social network for developers. Public repositories are discoverable and forkable by anyone, which is what makes open-source collaboration possible at the scale it happens.

Setting Up a Repository

Creating a repository on GitHub takes a minute. You click the plus icon in the top right and choose New repository, give it a name and a short description, and decide whether it should be public, visible to everyone, or private, visible only to people you invite. From there a couple of small choices save you trouble later. Ticking “Initialize with a README” creates your first commit automatically, so the repo is not empty. Adding a .gitignore template tells Git which files to leave untracked, things like __pycache__.env, or .DS_Store that have no business being in version control. Once you create the repo, you land on a page with tabs for Code, Issues, Pull Requests, and Settings, which together cover almost everything you will do.

The README and Markdown

The README is the front page of your repository, the first thing anyone sees when they arrive. It is written in Markdown, a lightweight formatting syntax that GitHub renders into a clean, styled page. You can edit it directly on GitHub by clicking the pencil icon on the file.

Markdown is quick to learn. Hash symbols create headings, with one hash for the largest and more hashes for smaller ones. Asterisks around text make it italic, and double asterisks make it bold. Links follow the pattern [link text](url), and images are the same with an exclamation mark in front. You make bullet points with dashes and numbered lists with numbers, wrap inline code in backticks, and create a full code block with triple backticks:

# Heading 1
## Heading 2
*italics* and **bold**
[Link text](https://url.com)
![Alt text](https://image-url.com)
- bullet point
1. numbered list
`inline code`
```python
code block
```

A good README does more than name the project. It explains what the project does and why it exists, describes how it works, and tells someone how to install and run it, including any dependencies they will need. Rounding it off with the licence makes the usage rights clear. The aim is that a stranger landing on your repo can understand it and get it running without having to ask you.

Working with Files and Branches

You can create, edit, upload, and delete files directly through the GitHub interface using the Add file and pencil and trash icons, and every single change requires a commit message, a short note describing what you changed and why. Those messages add up into a readable history of the project, so it is worth writing them with a little care rather than typing “update” every time. One quirk to know: GitHub will not let you create an empty folder, so to make a directory you name a file inside it, like folder-name/README.md.

Branches are where GitHub’s collaborative power really begins. A branch lets you develop a feature or a fix in parallel without touching the stable codebase that everyone depends on. The main branch holds the production-ready code that should always be deployable. When you start new work, you create a feature branch off main, make your changes there, and merge it back once it is ready and reviewed. Emergency fixes follow the same idea on a hotfix branch. This separation means several people can work on different things at the same time, each in their own branch, without their half-finished work colliding.

On repositories that matter, you protect main with branch protection rules, found under Settings then Branches. You can require that all changes come through a pull request rather than being pushed directly, require a certain number of reviewers to approve before a merge, require that automated tests pass first, and prevent the branch from being deleted by accident. These rules are what turn a casual workflow into a disciplined one, and they are the main thing standing between a team and someone accidentally breaking production.

Adding Collaborators and Controlling Access

A private repository is visible only to its owner and the people invited to it, so collaboration starts with adding those people. Under Settings and Manage access, you click Add people, search for someone by username, name, or email, choose their permission level, and send the invite, which they must accept before they can do anything.

The permission levels form a ladder. Read access lets someone view and clone the repo. Triage adds the ability to manage issues and pull requests without touching code. Write adds the ability to push branches and create pull requests, which is what most active contributors need. Maintain adds management of repository settings short of destructive actions, and Admin grants full access including deletion. The principle is to grant the least access that lets someone do their job, rather than handing out Admin to everyone.

Authenticating from the Terminal

A point of confusion for many newcomers is that, since August 2021, GitHub no longer accepts your account password for Git operations in the terminal. If you try to clone or push using your password, it simply fails. Instead, you authenticate with a Personal Access Token, or PAT.

You create one under Settings, Developer settings, Personal access tokens, choosing the classic tokens option. You give it a descriptive name and an expiry date, then select its scopes, which are the permissions it grants. The repo scope gives full access to your repositories, and workflow lets it update GitHub Actions. When you generate the token, GitHub shows it to you exactly once, so you must copy it immediately. If you lose it, you cannot recover it, only generate a new one.

Using it is simple: when Git asks for your username, you enter your GitHub username, and when it asks for your password, you paste the token instead.

git clone https://github.com/username/repo.git
# Username: your-github-username
# Password: <paste your PAT here>

Retyping a long token every time would be tedious, so you tell Git to remember your credentials. The right command depends on your operating system:

# macOS — store in Keychain
git config --global credential.helper osxkeychain
# Linux — cache for one hour
git config --global credential.helper 'cache --timeout=3600'
# Windows — Git Credential Manager
git config --global credential.helper manager

Cloning Versus Forking

There are two ways to get your own copy of a repository, and which one you use depends on whether you have permission to push changes back.

Cloning makes a local copy on your machine that stays linked to the original remote. You clone when you already have push access and intend to work directly on the project. Forking, by contrast, creates your own copy of the repository on your GitHub account. You fork when you do not have push access but want to propose changes anyway, which is the standard route for contributing to open-source projects you do not own.

# Clone a repo directly to your machine
git clone https://github.com/username/repo.git
# Clone a specific branch only
git clone -b branch-name https://github.com/username/repo.git

When you fork, the contribution path has an extra step. You fork on GitHub, clone your fork to your machine, push your changes to your fork, and then open a pull request from your fork back to the original. Because the original project keeps moving while you work, you add it as a remote called upstream so you can pull in its latest changes and stay in sync:

# After forking, clone your own copy
git clone https://github.com/YOUR-username/repo.git
# Track the original to stay up to date
git remote add upstream https://github.com/ORIGINAL-username/repo.git
git fetch upstream
git merge upstream/main

Issues: The Conversation Layer

Issues are how communication happens on GitHub. They serve as bug reports, feature requests, task lists, and general discussion, all attached to the repository they concern. You create one from the Issues tab, write a title and a description in Markdown, and then make it actionable by assigning it to whoever is responsible, adding labels like bug or enhancement to categorise it, and linking it to a milestone if it belongs to a release.

A few features make issues genuinely useful for coordination. Typing @username mentions someone and sends them a notification. Typing #123 creates a clickable link to another issue, weaving related work together. And, neatly, writing Closes #123 in a commit message or pull request automatically closes that issue when the change merges, so your code and your task tracking stay in step without manual updating.

A well-formed issue is specific enough that someone else can act on it:

## Bug description
The data pipeline crashes when the input CSV contains null
values in the `price` column.
## Steps to reproduce
1. Load `data/sales.csv`
2. Run `pipeline.py`
3. Observe `KeyError: price`
## Expected behaviour
Pipeline should handle nulls gracefully.
Closes #45

The discipline worth keeping is to never open an issue without reproduction steps and a clear statement of expected versus actual behaviour. A vague issue costs everyone time.

Pull Requests: Where Code Gets Reviewed and Merged

The pull request, or PR, is the centrepiece of collaborative GitHub. It is a request asking the maintainer to review your branch and merge it into another branch, usually main. After you push your feature branch, GitHub often shows a “Compare & pull request” banner to start one, or you can open it manually from the Pull requests tab. You set the base branch, where your changes will go, and the compare branch, which holds your work, then write a title and description explaining what changed and why.

Two roles attach to a PR and are easy to confuse. The assignee is responsible for the pull request making it to a merge, typically the author. The reviewer is the person who examines the code and gives feedback. The two are distinct jobs, and on a healthy team they are often different people.

Reviewing happens in the Files changed tab, where the reviewer can click next to any line to leave an inline comment, then submit the review with one of three outcomes. A plain comment is general feedback that does not block anything. Request changes means the work needs fixing before it can merge. Approve means it is ready to go. This three-way decision is what keeps a quality bar on what enters the main codebase.

Once a PR is approved and merged through GitHub, a little cleanup keeps things tidy. You delete the now-merged branch on GitHub, then bring your local repository up to date and remove the local copy of the branch:

git checkout main
git pull # bring in the merged changes
git branch -d feature_branch # delete the local branch

One reassuring detail: if a problem surfaces after merging, GitHub lets you restore a deleted branch for up to 30 days, so deleting branches is safe rather than final.

Avoiding the Common Mistakes

A handful of mistakes come up again and again. Pushing directly to main is the big one, and the fix is to always work on a feature branch and protect main with branch rules. Trying to use your GitHub password in the terminal will simply fail; use a Personal Access Token instead, and copy it the moment it appears since it is shown only once. Opening a pull request against the wrong base branch is an easy slip, so check the base dropdown before submitting. If you are working from a fork, sync it with the upstream regularly so it does not drift out of date. And delete branches after they merge, because stale branches pile up and clutter the repository.

A Quick Command Reference

Most day-to-day work boils down to a small set of commands:

# Clone and track the original
git clone https://github.com/user/repo.git
git remote add upstream https://github.com/ORIGINAL/repo.git
git fetch upstream && git merge upstream/main
# Daily workflow
git checkout -b feature/my-feature # create and switch to a new branch
git add .
git commit -m "Add feature X"
git push origin feature/my-feature # push, then open a PR on GitHub
# Stay in sync
git pull # fetch and merge the tracked branch
git fetch origin # fetch without merging
# Clean up after a merge
git checkout main
git pull
git branch -d feature/my-feature

See you soon.

View Comments (2)

Leave a Reply

Prev Next

Subscribe to My Newsletter

Subscribe to my email newsletter to get the latest posts delivered right to your email. Pure inspiration, zero spam.

Discover more from Datalad - Data Science and ML

Subscribe now to keep reading and get access to the full archive.

Continue reading