In the ever-evolving landscape of machine learning and AI development, separating your development and production environments is no longer just a best practice—it's a necessity.
There are many reasons why you'd want to split your environments, from ensuring the integrity of your production data to maintaining access control and resource segregation.
Why Separate Dev and Prod?
1. Maintain Separate Environments
One of the fundamental reasons for separating development and production environments is the need cost tracking, access control, and resource isolation. By separating your environments, you ensure that
- you safeguard your production environment from unintended changes that might occur during development
- you can track cost of development work vs. production pipelines
2. Access Control and Data Segregation
Access control is a paramount concern when it comes to machine learning projects. Separating dev and prod environments helps prevent unauthorized access and data breaches. With separate environments, you can ensure that
- no one accidentally launches a production pipeline with development data or vice versa
- only authorized members of the team can
- promote data/models/code to production
- launch and schedule production pipelines
- restrict that dev and test environments are completely isolated and don't have visibility to each other's data or results
This segregation is crucial for maintaining data integrity and security.
3. Different Resources for Different Needs
Each environment, whether it's development or production, may have distinct resource requirements. This includes machine types, networking configurations, and access rights.
For instance, in the production environment you might
- require workloads to:
- use only approved Docker images, avoiding ad-hoc installations from untrusted sources
- all packages used in production should undergo thorough vulnerability scanning.
- run a separate virtual network, that has access to production data not available from the development environments machines
These measures guarantee that your production pipeline remains robust and secure.
What to consider?
Before implementing the separation of your development and production environments in Valohai, consider the following checklist.
Not all sections will apply to your use case. Regardless, we recommend reviewing each point.
Accounts and Resource Groups
- Create separate accounts, subscriptions or resource groups for development and production.
- Restrict access to only approved base Docker images.
- Ensure that cost tracking is accurate and segregated.
- Use different clusters for deployment, potentially within distinct namespaces or entirely separate clusters.
User Access
- Define a distinct set of users who can allow your different environments.
- Implement access controls to enforce these restrictions.
Data Storage
- Consider separate data stores for different stages.
- You can also define that some resources are read-only in a specific environment, and read-write in another environment.
- Decide whether you want to promote data and models between stages, or will you promote the pipeline and generate the new datasets and models in the new environment.
- Determine if you need to share Valohai datasets and aliases between environments.
Version Control
- Decide which branches to pull from in different stages.
- You can limit that production projects only pull from the main branch.
- Implement Git branch protection rules to ensure code changes are reviewed and approved.
API Key Management
- Manage API keys for production and development separately.
- Consider implementing key rotation policies in the different environments.
Quota Management
- Evaluate quota management if your environments reside in the same account.
Comments
0 comments
Please sign in to leave a comment.