hckrnws
The real problem is keeping sensetive information in .git directory. Like WTH would you put your password, in plaintext, in some general ini file? (or into a source file for that matter)?
When I see things like those, they look so wrong to me. But sadly it's apparently uncommon nowadays: not only random bloggers, even my coworkers see nothing wrong with putting passwords or tokens into general config or source code files. "it's just for a quick test"1 they say and then they forget about it and the password is getting checked in, or shown at screenshare meeting.
Maybe that's why there are so many security problems in industry? /rant
(For those curious: for git specifically, use ssh with key auth. If for some reason you don't want this, you can set up git's credential helper to use your OS key store; or use plaintext git-crendetials, or even just good-old .netrc. For source code, something like "PASSWORD = open("/home/user/.config/mypass.txt").read().strip()" is barely longer than hardcoding it, but 100% eliminates chance of accidental secret checkin or upload)
>The real problem is keeping sensetive information in .git directory. Like WTH would you put your password, in plaintext, in some general ini file? (or into a source file for that matter)?
People & organisations tend to follow the path of least resistance. If it's easier to put passwords into a plaintext config file than not, passwords will invariably end up in plaintext config files in some projects. `PASSWORD = open("/home/user/.config/mypass.txt").read().strip()` will work right up until a colleague without `"/home/user/.config/mypass.txt"` attempts to run the project - at which point it'll be replaced with `PASSWORD = "the_password123"`.
The only pragmatic solution is to make it easier to handle passwords securely than to handle them insecurely.
"Security at the expense of usability, comes at the expense of security." strikes once again (if you squint and see usability to include following the path of least resistance, which I think counts).
Good security is expensive. Bad security is cheap (be it the example you mentions or a multitude of other ways). Management will favor the bad security done cheaply because the cost of bad security is extremely rare, and when it does happen, it rarely falls on the managers head. Either no one gets blamed (general blame the company, if at all these days), or the developer who made the choice to go with the cheap option gets blamed.
> 100% eliminates chance of accidental secret checkin or upload
You've never worked with humans, have you?
At my work, I often see these 2 things throughout the codebase:
- an identifier for an environment variable that gives us the azure key vault scope (another identifier) - an identifier for the token to pull from that scope
Then the scope name and token name are used to pull the token secret value using the secrets api.
I am not experienced in how this is "supposed to be". Would it make sense to make both of these environment variables so neither identifier appears directly in code? (scope name and token name)
Thank you for the insight :)
The question you want is, "will anything bad happen if this source code is widely shared / leaks on the web" - and the answer in your case seem to be "no", the identifiers/token names are pretty useless without accompanying machine auth. You are fine.
> The real problem is keeping sensetive information in .git directory. Like WTH would you put your password, in plaintext, in some general ini file? (or into a source file for that matter)?
Sometimes it's not "you":
You know, Sun fixed this almost 30 years ago in the J2EE standard.
Gitleaks is the easiest way to deal with this. I make a point to include it in my build pipelines and have dev teams set it up as a precommit hook to prevent the problem.
Maybe they're use Google AppEngine and don't want to deal with storing config secrets the right way for it.
Am I missing something, or does the step in
> Pushing Malicious Changes to the Pipeline
mean that they already have full access to the repository in the first place? Normally I wouldn't expect an attacker to be able to push to master (or any branch for that matter). Without that, the exploit won't work. And with that access, there's so many other exploits one can do that it's really no longer about ci/cd vulns.
From TFA:
> A surprising number of websites still expose their .git directories to the public. When scanning for such exposures on a target, I noticed that the .git folder was publicly accessible.
[...]
> With access to .git/config, I found credentials, which opened the door to further exploitation. I could just clone the entire repository using the URL found inside the config file.
The URL with credentials was found in the `.git/config` file, defined in the [remote "origin"] section. This is the way they won full access to the repo.
I don't see how this is specific to "exploiting CI / CD Pipelines" when he's really just exploiting someone encoding their github username AND password credentials (unorthodox af) into the url for remote.
Yes, that first part was not. But the article continues like this:
- they use that credentials to make a commit adding malicious code to the CI pipeline
- The rouge pipeline job adds their public SSH key to the `.allowed_keys` file in the production server
As the pipeline is run automatically on push, they get ssh access to the remote server.
That is the "CI / CD Pipelines" bit. That being said, it's a bit underwhelming, because given the title I though they were going to exploit a bug in the CI/CD software itself. I don't know if I'd call that "exploiting" CI/CD software.
Because 1) the .git directory was deployed with the app code (the exploit vector), and 2) the deployment pipeline automatically integrated and pushed the attacker’s commit to a production system (completing the exploit), I’d say that claim is accurate. These are both defects in the thing the attacker claims to have exploited.
It sure wasn’t a good decision to use git-config to store creds for CI though! I wonder if OP found a developer’s old cached creds in the history that weren’t used anymore but happened to still be valid?
You're right, there are other avenues of exploitation. This particular approach was interesting to me because it is easily automatable (scour the internet for exposed credentials, clone the repo and detect if Pipelines are being used, profit).
Other exploits might need more targeted steps to achieve. For example, embedding a malware into the source code might require language / framework fingerprinting.
I am not sure, but it sounds like the pipeline runs for any pushed branch/PR, and it runs the pipeline configuration of that branch (so you can run a pipeline configuration without having to merge to master).
I'm not saying that this is fine, just that access to master is probably protected, but it's still vulnerable.
edit: Credentials for modifying the piepline were found in the .git/config file
With Bitbucket, as well as Gitlab and likely others that I haven't used, the CI pipelines are stored as a plaintext configuration in the repo itself. So, repo commit access automatically gives you the ability to modify the pipeline.
This is why things like codeowners files are so important
It's right at the start of the post - the git remote including credentials was exposed via the .git directory
100% of the script kiddies moved to .env and .git. My logs are filled with request for GET /.env 404. All the kiddies focus mainly on those two, I think the return is the best for their effort. The .env file is super trendy now and used across languages now.
A super easy way to protect yourself is to just block any IP that hits `/.env` or `/wp-admin`. I've taken this as far as to ban any IP that hits my default vhost (hitting the IP instead of actual hostname) more than ten times, and I get about about 99% less scanners and spam as a result.
I don’t understand why some authentication mechanisms, like Github Tokens don’t use a refresh token mechanism. So the token can be handed in once to create a refresh token, and then with that expiring access token can be requested. Now we (as users) have to bother with constantly expiring long-term tokens, not nothing in which of the hunderds of places we’ve might have put them.
Does this actually occur with real or high-value targets? I'm genuinely curious, as I can only envision this happening with smaller side projects. However, I'd be interested to hear any stories of encountering this in the wild. It's a good reminder to stay mindful of what might accidentally be exposed.
I’ve never deployed a .git folder and wonder what systems/approaches lead to such a thing. How does that happen?
It's pretty common in systems where the final output to be deployed is the same as the root of the source tree. More often than not, lazy developers tend to just git clone the repo and point their web server's document root to the cloned source folder. In default configurations, .git is happily served to anyone asking for it.
This seems to be automatically mitigated in systems which might have a "build" / "compilation" phase, because for the application to work in the first place, you only need the compiled output to be deployed. For instance, Apache Tomcat.
Or you do a git clone/docker build and that grabs it.
Hidden files only attract the attention of pessimists.
Its easy to miss that you need to duplicate your .gitignore into your .dockerignore
To not have to remember that one can not use a .dockerignore at all and instead explicitly pick the files and directories from the build context that need to be in the image.
This is annoying and error prone if you use an interpreted language that relies on loading source files at runtime.
For a Go service? Sure, that's easy and makes a ton of sense.
> This is annoying and error prone if you use an interpreted language that relies on loading source files at runtime.
That's a bit of misconception. (Not to mention terminology mish-mash) what you probably call "interpreted" is languages like Python, JavaScript or Ruby. In all these cases, the projects are supposed to be compiled into an installable package, and then that package is supposed to be installed. So, compilation step is very similar to languages like C or Go.
Regrettably, developers rarely follow the due process and "deploy" what essentially amounts to the project's source code... This has a bunch of other problems, beside the security issues with potentially copying passwords into the deployment environment. This whole process reminds me of the early days of PHP where Web was full of examples of "guest books" which taught a generation of PHP programmers to interpolate values straight from the URL request into SQL queries. And then, PHP took the blame for "allowing it".
Yeah, this is a bit of a pet peeve of mine. I've seen a lot of Python app Docker images which are built by just copying the git repo straight into the image. Better to build a package from the source in one CI step, and then install that package into the deploy image in another.
Overall, two-phase docker build is a much better model that avoids a lot of miscellaneous issues, including this one. But its also not super well-known for devs who just touch docker every now and then.
For a go service you have a multi stage build that copies just the built executable into a clean base image.
100% this!
For scripting languages, I sometimes clone a readonly repo to prod. Then I use git pull && systemctl --user restart srv to deploy. For compiled programs, I do it with rsync or docker pull.
index.html and rest of project is rooted in well.. root. And simple push deploy your root git repo to your /var/www or whatever.
Ex. you use github pages and do homemade brew html or whatever that is not static-generator that outputs to a subfolder.
It's because dockerignore file does not take this into consideration.
But in what cases would your root/source directory be part of what you want to shove into a dockerimage? Is it like if you have a static website or php site with its root at the root of your source repo?
naive question: Doesn't github secret scan kind of thing wont catch this?
No, in my YAML example, you could see that there were no credentials directly hard-coded into the pipeline. The credentials are configured separately, and the Pipelines are free to use them to do whatever actions they want.
This is how all major players in the market recommend you set up your CI pipeline. The problem here lies in implicit trust of the pipeline configuration which is stored along with the code.
Even with secrets if the CICD machine can talk to the internet, you could just broadcast the secrets to wherever (assuming you can edit the yaml and trigger the CICD workflow).
I was thinking maybe a better approach instead of CICD SSH into prod machine is to have the prod machine just listen to changes in git.
It was deployed using a Bitbucket pipeline which does have a secret scanner available. However the scanner would need to be manually configured to be fully effective.
Crafted by Rajat
Source Code