85

I have a Dockerfile trying to package and deploy a web app to a container. The code of app fetches from git repository during Docker image building. Here's the Dockerfile snapshot:

........
RUN git clone --depth=1 git-repository-url $GIT_HOME/
RUN mvn package -Dmaven.test.skip
........

I want the docker do not cache the step of RUN git clone --depth=1 git-repository-url $GIT_HOME/ so that the on-going updated on the the repository can be reflected on the Docker image building. Is it possible to a achieve that?

8 Answers 8

94

Another workaround:

If you use GitHub (or gitlab or bitbucket too most likely) you can ADD the GitHub API's representation of your repo to a dummy location.

ADD https://api.github.com/repos/$USER/$REPO/git/refs/heads/$BRANCH version.json
RUN git clone -b $BRANCH https://github.com/$USER/$REPO.git $GIT_HOME/

The API call will return different results when the head changes, invalidating the docker cache.

If you're dealing with private repos you can use github's x-oauth-basic authentication scheme with a personal access token like so:

ADD https://$ACCESS_TOKEN:[email protected]/repos/$USER/$REPO/git/refs/heads/$BRANCH version.json

(thx @captnolimar for a suggested edit to clarify authentication)

Sign up to request clarification or add additional context in comments.

9 Comments

Worked perfect for GitHub. Any example for private GitLab?
Nice solution. Heres how to do it for bitbucket. ADD https://[USER]:[PASS]@api.bitbucket.org/2.0/repositories/[ORG-NAME]/[REPO-NAME]/commit/[BRANCH] /info RUN git clone --depth 1 --branch [BRANCH] https://[USER]:[PASS]@bitbucket.org/[ORG-NAME]/[REPO-NAME].git /repo I recommend creating a specific "Integration" user and using "App passwords" for bitbucket. confluence.atlassian.com/bitbucket/app-passwords-828781300.html Combine this with multi-stage build, and it seems like the perfect solution so far.
Do do it in a completely generic provider independent way, here's an option ADD http://worldtimeapi.org/api/ip /time.tmp
As in 2020, the github api appears to have changed and the URL https://api.github.com/repos/$USER/$REPO/git/refs/heads/$BRANCH doesn't work anymore, due to api updates? Please suggest the updated URL.
For GitLab: ADD https://gitlab.example.com/api/v4/projects/$PROJECT_ID/repository/branches/$BRANCH_NAME/ version.json
|
23

I ran into this same issue myself, and I just decided to use the --no-cache option when I build the image, rather than trying to single out the git repo.

docker build --no-cache -t my_image .

1 Comment

For my simple use case, this was the simplest/easiest of the proposed solutions. Thank you, @tomgsmith99.
17

Issue 1996 is not yet available, but you have the following workaround:

FROM foo
ARG CACHE_DATE=2016-01-01
RUN git clone ...

docker build --build-arg CACHE_DATE=$(date) ....

That would invalidate cache after the ARG CACHE_DATE line for every build.

Or:

ADD http://www.convert-unix-time.com/api?timestamp=now /tmp/bustcache
RUN git pull

That would also invalidate cache after this ADD line.

Similar idea:

Add ARG command to your Dockerfile:

# Dockerfile
# add this and below command will run without cache
ARG CACHEBUST=1

When you need to rebuild with selected cache, run it with --build-arg option

$ docker build -t your-image --build-arg CACHEBUST=$(date +%s) .

then only layer below ARG command in Dockerfile will rebuild.

3 Comments

Thanks for your advise. But it doesn't work for me as I use a cloud CaaS to build the Docker image. There's no way to pass docker arguments.
doesn't work for me. Docker version 17.05.0-ce, build 89658be
9

for anyone who has this problem with Gitlab repositories:

Gitlab has this annoying branch id method when calling their API, the ID will appear under your repository name enter image description here

# this will copy the last change from your brach and it'll invalidate the cache if there was a new change
ADD "https://gitlab.com/api/v4/projects/${PROJECT_ID}/repository/branches/master?private_token=${GIT_TOKEN}" /tmp/devalidateCache

# the actual clone
RUN git clone --depth=1 https://${GIT_USER}:${GIT_TOKEN}@gitlab.com/${git_file_uri} ${BASE_BUILD_PATH}

Comments

7

If you use github you can use github API to not cache specific RUN command. You need to have jq installed to parse JSON: apt-get install -y jq

Example:

docker build --build-arg SHA=$(curl -s 'https://api.github.com/repos/Tencent/mars/commits' | jq -r '.[0].sha') -t imageName .

In Dockerfile (ARG command should be right before RUN):

ARG SHA=LATEST
RUN SHA=${SHA} \
    git clone https://github.com/Tencent/mars.git

or if you don't want to install jq

SHA=$(curl -s 'https://api.github.com/repos/Tencent/mars/commits' | grep sha | head -1)

If repository has new commits, git clone will be executed.

Comments

2

you can also use:

ADD "https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h" skipcache

RUN git reset --hard ~Whatever~

as mentioned here: https://stackoverflow.com/a/58801213/8953378

Comments

0

Thanks for the idea of @anq , but the provided api is outdated, the latest api documentation is in Get a reference

use https://api.github.com/repos/{owner}/{repo}/git/ref/heads/{ref}

example https://api.github.com/repos/wind8866/hello-react/git/ref/heads/main

Comments

-5

For github private repos, you could also pass in your username and password:

RUN git clone -b$BRANCH https://$USER:[email protected]/$USER/$REPO.git $GIT_HOME/

1 Comment

I wouldn't like to pass my user password if I'm letting my admins clone just one repo.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.