Wednesday, October 13, 2021

Upload large files to GitHub using Piggyback

Goal: Upload large files to GitHub without having to use (and pay) for Git LFS.

Solution: Use the R package `piggyback` (CRAN)

1. First identify files that have more than 100 Mb in your GitHub repo
find . -type f -exec du -a {} + |  grep -v .git | awk '$1 > 1e5'
295840  ./input/genes.gtf
> 410016  ./utils/msigdb_v7.4.xml

2. install and load package piggyback
R> install.packages("piggyback")
R> library(package = "piggyback")

3. Generate GitHub personal token
a. GitHub > Settings












b. Settings > Developer settings

c. Developer settings > Personal access tokens







4. Set your GitHub personal token for piggyback
R> Sys.setenv(GITHUB_TOKEN="...")

5. Create a new release of your package (works also for private 
R> pb_new_release(repo = "sekalylab/fluomics.hypertension", 
                  tag  = "v0.0.1")

6. upload the large file to GitHub
R> pb_upload(file = "input/genes.gtf", 
             repo = "sekalylab/fluomics.hypertension",
             tag  = "v0.0.1")
R> uploading genes.gtf ...
R> pb_upload(file = "utils/msigdb_v7.4.xml", 
             repo = "sekalylab/fluomics.hypertension",
             tag  = "v0.0.1")
R> uploading genes.gtf ...

7. add large files to gitignore
> echo "input/genes.gtf" > .gitignore
> echo "utils/msigdb_v7.4.xml" >> .gitignore

8. check on GitHub that the upload was done and that the files are available




Transfer directory from EFS to S3 Glacier

1. Create an S3 bucket > aws s3 mb s3://rv398-20220712 2. Copy EFS files to the S3 bucket > aws s3 cp /mnt/efs/Joana3/Data s3://rv398-...