Developers
July 27, 2020

Google Kubernetes Engine With Clusters That Support 15,000 Nodes

GKE offers high-performance clusters for small to big corporations. Scalability at its peak.

Today we will talk about Google Kubernetes Engine clusters. As they are used by many big corporations worldwide, it is important to know what they are made up of.  

Google has a strong point of view on providing scalable services and products. Based on this premise, they have increased the limits of the GKE cluster beyond expectation. Clusters now count with up to 15,000 nodes.

If you are running a large internet service, if you want to simplify your infrastructure by having fewer clusters but more powerful ones, or if you need to shorten the time to process data, then these clusters fit your needs.

The service is currently being used in software launches, or e-commerce campaigns. You can resize your actual cluster to the size you need, all without having to start from scratch. This is time and cost-effective.  

15,000 Nodes, Scalability with no limits. 

The fact that the service provides clusters of up to 15,000 nodes, is relevant, but is not the only important thing. As Google and every service provider focuses on scalability, the fact that you can scale without replacing gives comfort and security. 

Scaling the Kubernetes cluster is like scaling an object to another dimension. You have to consider many points at the time of scaling this service. The number of working services, the number of endpoints, the number of pods and containers, and the frequency of events. 

When scaling, the control plane and the workloads must remain available and active. The hard part of scaling and operating at such a large scale is the number of dependencies between the different scales. It is recommended to only scale to the level you need, it is pointless to have a bigger infrastructure than what your company or personal service currently requires. 

The scalability is not designed only for big services, it applies to small services too. The ability to push to limits to another scale, allows small services to become medium-sized, and help them grow from the place they currently stand.

To better understand the entire process, we will talk about a specific use case. The Bayer Crop Science case.

Bayer Crop Science 

They have put together a team of design partners in a closed early access program so that GKE users can run workloads of over 5,000 nodes per cluster.

Bayer Crop Science is currently one of the biggest users of GKE. They currently hold the largest clusters in the entire Google Cloud ecosystem.  What does BCS use GKE for? It uses it to decide which seeds to advance for its research and development, finally leading to the final decision, which seeds to offer to the farmers.

They can only do this job if they have accurate and plentiful genotype data. They count with 60,000 germplasm in its corn catalog alone. As they can't test each seed independently because it takes out efficiency, they base their work on datasets. Pedigree and ancestral genotype observations.

In 2019, Bayer Crop Science migrated their calculations to GKE services. This way, counting with 5,000 node clusters, data scientists could forecast and calculate the needed data for the month. Before this, scientists had to request the specific genotype data needed for their job, waiting weeks for results. Their wait time decreased from weeks to one single day, where multi-day batch jobs take place.  

Each time new information arrives from the genotyping labs, and the observation passes the quality controls, it's written down to a service. An event is published to the Cloud. There is an inference engine which monitors this process. If the events that enter match the requirements so that there is inference, a job request is put on another topic.

The inference engine nodes are deployed on the Kubernetes Cluster. This is the largest cluster of all, and it uses a Horizontal Pod Autoscaler. When a worker selects a job from the topic list, it shows all the required inputs. The genotype observations that caused the trigger on the job are shown too. The genotype inference algorithm runs and the results are written into the service. Lastly, when the event is received trigger the start of the decision making (based on the inferred genotypes).

In conclusion, Google Kubernetes Engine is offering clusters of up to 15,000 nodes. This amount is considered immense, and it works for big corporations. The fact that Google provides such a big size doesn't mean it is suitable only if you are going to use 15,000 nodes. We have seen the case of Bayer Crop Science, which uses 5,000 nodes, but as they will continue to grow, they have the possibility to reach higher levels. The scalability is offered for small and big organizations. It is mostly being used for software launches and e-commerce campaigns.

TagsGoogle Kubernetes EngineScalability
Lucas Bonder
Technical Writer
Lucas is an Entrepreneur, Web Developer, and Article Writer about Technology.

Related Articles

Back
DevelopersJuly 27, 2020
Google Kubernetes Engine With Clusters That Support 15,000 Nodes
GKE offers high-performance clusters for small to big corporations. Scalability at its peak.

Today we will talk about Google Kubernetes Engine clusters. As they are used by many big corporations worldwide, it is important to know what they are made up of.  

Google has a strong point of view on providing scalable services and products. Based on this premise, they have increased the limits of the GKE cluster beyond expectation. Clusters now count with up to 15,000 nodes.

If you are running a large internet service, if you want to simplify your infrastructure by having fewer clusters but more powerful ones, or if you need to shorten the time to process data, then these clusters fit your needs.

The service is currently being used in software launches, or e-commerce campaigns. You can resize your actual cluster to the size you need, all without having to start from scratch. This is time and cost-effective.  

15,000 Nodes, Scalability with no limits. 

The fact that the service provides clusters of up to 15,000 nodes, is relevant, but is not the only important thing. As Google and every service provider focuses on scalability, the fact that you can scale without replacing gives comfort and security. 

Scaling the Kubernetes cluster is like scaling an object to another dimension. You have to consider many points at the time of scaling this service. The number of working services, the number of endpoints, the number of pods and containers, and the frequency of events. 

When scaling, the control plane and the workloads must remain available and active. The hard part of scaling and operating at such a large scale is the number of dependencies between the different scales. It is recommended to only scale to the level you need, it is pointless to have a bigger infrastructure than what your company or personal service currently requires. 

The scalability is not designed only for big services, it applies to small services too. The ability to push to limits to another scale, allows small services to become medium-sized, and help them grow from the place they currently stand.

To better understand the entire process, we will talk about a specific use case. The Bayer Crop Science case.

Bayer Crop Science 

They have put together a team of design partners in a closed early access program so that GKE users can run workloads of over 5,000 nodes per cluster.

Bayer Crop Science is currently one of the biggest users of GKE. They currently hold the largest clusters in the entire Google Cloud ecosystem.  What does BCS use GKE for? It uses it to decide which seeds to advance for its research and development, finally leading to the final decision, which seeds to offer to the farmers.

They can only do this job if they have accurate and plentiful genotype data. They count with 60,000 germplasm in its corn catalog alone. As they can't test each seed independently because it takes out efficiency, they base their work on datasets. Pedigree and ancestral genotype observations.

In 2019, Bayer Crop Science migrated their calculations to GKE services. This way, counting with 5,000 node clusters, data scientists could forecast and calculate the needed data for the month. Before this, scientists had to request the specific genotype data needed for their job, waiting weeks for results. Their wait time decreased from weeks to one single day, where multi-day batch jobs take place.  

Each time new information arrives from the genotyping labs, and the observation passes the quality controls, it's written down to a service. An event is published to the Cloud. There is an inference engine which monitors this process. If the events that enter match the requirements so that there is inference, a job request is put on another topic.

The inference engine nodes are deployed on the Kubernetes Cluster. This is the largest cluster of all, and it uses a Horizontal Pod Autoscaler. When a worker selects a job from the topic list, it shows all the required inputs. The genotype observations that caused the trigger on the job are shown too. The genotype inference algorithm runs and the results are written into the service. Lastly, when the event is received trigger the start of the decision making (based on the inferred genotypes).

In conclusion, Google Kubernetes Engine is offering clusters of up to 15,000 nodes. This amount is considered immense, and it works for big corporations. The fact that Google provides such a big size doesn't mean it is suitable only if you are going to use 15,000 nodes. We have seen the case of Bayer Crop Science, which uses 5,000 nodes, but as they will continue to grow, they have the possibility to reach higher levels. The scalability is offered for small and big organizations. It is mostly being used for software launches and e-commerce campaigns.

Google Kubernetes Engine
Scalability
About the author
Lucas Bonder -Technical Writer
Lucas is an Entrepreneur, Web Developer, and Article Writer about Technology.

Related Articles