Incident: Re-generate API keys due to open Elasticsearch server

3 years ago 290
BOOK THIS SPACE FOR AD
ARTICLE AD

TL;DR: On Monday, June 29, 2020 we were notified by a security researcher that one of our Elasticsearch clusters was exposed to the Internet without any authentication. The configuration issue is resolved, but API usage logs may have been exposed.

Whose data is affected?

Customers and free users who have used our API or services built on top of our API since November 11, 2019. We are requesting all users to re-generate their API keys. Instructions are here.

What is affected?

A log of API calls made from November 11, 2019 until June 29, 2020 was exposed including the following relevant information: Timestamp, IP address, endpoint requested, parameters, status code & HTTP method, and unique user identifier (a number, not an email address or similar). Current IP and current domain data that powers parts of our site was potentially exposed. This data is considered ephemeral and a snapshot of how the Internet looked on that given day and gets updated at least daily.

Outline of Incident (EST):

June 11, 2020: An engineer fixed an VPN mesh issue between our app servers and Elasticsearch Cluster and accidentally exposed the cluster outside of the VPN. More complete details are below. June 28, 2020: Investigation started about the deletion of an Elasticsearch index for the root cause of this incident. June 29, 2020 - 7:33 AM EST: We were contacted by the first security researcher who pointed out the exposed IP. June 29, 2020 - 8:12 AM EST: Misconfiguration was fixed. June 30, 2020: Customers and users notified.

How could that happen?

On June 11, 2020, we had a brief outage situation where the IPsec tunnels between our Elasticsearch gateway at AWS and the Elasticsearch cluster at Azure stopped working. They had been running without incident for more than a year. An engineer started working on fixing the issue.

While attempting to resolve the outage, the engineer set up an Azure Load Balancer. The mechanisms on how to reliably restrict access were not immediately apparent so that approach was abandoned minutes later and a different resolution was taken to restore service.

The engineer attempted to revert the changes of the Azure Load Balancer approach. He clicked in the Azure control panel to delete the external IP of the load balancer. Azure seemingly accepted the request to delete it; however, it later asynchronously failed the request since the IP address was still attached to the load balancer.

The engineer had already moved on to other attempts and made the wrong assumption that the IP address had been removed from the load balancer and our account. This IP address had not been considered at the subsequent security checks/hardening. Since the requests were going directly to the Elasticsearch nodes and not through our main gateway, not all requests were being logged.

Simplified sketch of the situation

On June 28, 2020, an engineer realized that we had an increased failure rate with Elasticsearch. Upon realization that an index has been deleted, the engineering team restored it and started an investigation into how that could have happened. It was initially unclear since there were no entries in the gateway log which could have caused the deletion.

During this investigation we were contacted by the first security researcher who was notifying us about the Elasticsearch cluster being open. We immediately evaluated the report and realized that there was an additional external IP address on the Azure Load Balancer that was not intended to be there anymore. The problem was then fixed by removing this IP address.

We currently can not determine with certainty if the described data above has been accessed which is why we are circulating this information.

Mitigations

While the immediate issue has been resolved immediately after removing the additional external IP address on the Elasticsearch cluster, we will continue to improve our security processes and logging mechanisms. In addition, we have already introduced the ability for customers to re-generate their API keys without the need to contact support.

Short term plan

Notifying all affected users through this blog post and an email. (today) Distribute instructions and functionality to re-key API keys. (today) Implement IP whitelisting on a per API key basis (coming this week)

Medium term plan

Removing API keys from all logs and creating logging controls Reviewing and modifying VPN patterns cross cloud Ensuring log points exist at every applicable ingress/egress spot

Vision/long term plan

The entire reason our company was founded was to help companies prevent and notice incidents like this. This doubles our resolve and validates our mission to solve these types of problems. It’s critical we get tools in the hands of as many people as possible so this doesn’t happen to them.

We apologize for any impact this issue caused to our users. Keeping our company value of transparency, we also believe that communicating with our users about such incidents in a clear and prompt manner is absolutely necessary to build a trustful partnership. In case you have any questions, comments or further concerns please reach out to support@securitytrails.com.

FAQ

Q: Doesn’t SecurityTrails make tools to prevent and notify about these kinds of issues? Shouldn’t you know better?

Yes. Our company was founded to try to harness large scale open data to help companies prevent and detect these kinds of issues. We’re frustrated that one wrong click has caused these issues and will do everything in our power to make sure nothing like this happens again. We also will work as hard as possible to build and distribute tools to others to detect and prevent these kinds of issues.

Q: Why did the open Elasticsearch not have more security and tooling?

The misconfiguration allowed visitors to bypass our main gateway that does logging and access restrictions. We will re-architect this to make sure the same thing can’t happen again.

Q: Why aren’t you using the auth features on Elasticsearch directly?

Last year, we had set up the Open Distro version of Elasticsearch for the security features, but we were having some stability and memory issues that we didn’t have under load on the regular Elasticsearch. We were unable to identify the cause after a lot of debugging with JVM tuning, and ES config. It was also the first release of open distro so that wasn’t entirely surprising.

We shifted back to regular Elasticsearch and instead set up access so that the ES nodes have no internet access at all and can only be accessed through a gateway that we can restrict access to. Unfortunately the Azure Load Balancer that we set up while trying to bypass the IPSec issue also bypassed our gateway (the gateway our network uses lives in AWS).

Elastic now makes the security feature included on more recent open source versions of Elasticsearch to which we are transitioning.

Q: Is there any concern that data was manipulated?

No. Elasticsearch is not ground truth for us. We generate data and update Elasticsearch daily. Any changes would quickly be replaced.

Q: Is SurfaceBrowser user-data affected?

No. This data is stored in separate, unrelated data stores. Some older, generalized aggregate caching was in one of the indices.

Q: Were Login (email address, password) or Payment data exposed?

No. This data is stored in different data stores.

Q: Will I be responsible in the unlikely event someone uses my API key because of this incident?

No. We kindly ask all users to switch to a new API key and delete the old one. If you noticed that someone used or uses your API key please let us know at support@securitytrails.com

Read Entire Article