fix(train): correct Networking field names in ModelTrainer intelligent defaults#5866
Open
mandipat wants to merge 1 commit into
Open
fix(train): correct Networking field names in ModelTrainer intelligent defaults#5866mandipat wants to merge 1 commit into
mandipat wants to merge 1 commit into
Conversation
…t defaults Two bugs in _populate_intelligent_defaults_from_training_job_space(): 1. Networking constructor was passed `default_enable_network_isolation` which does not exist in sagemaker-core 2.x. The correct field name is `enable_network_isolation`. This caused a Pydantic ValidationError when sagemaker_config had a VpcConfig value set, making ModelTrainer unusable in SageMaker distribution 4.0 out of the box. 2. When updating an existing Networking object, `security_group_ids` was mistakenly assigned the subnets value from TRAINING_JOB_SUBNETS_PATH instead of the correct value from TRAINING_JOB_SECURITY_GROUP_IDS_PATH. Fixes aws#5766
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes two bugs in
_populate_intelligent_defaults_from_training_job_space()inmodel_trainer.pythat preventModelTrainerfrom working whensagemaker_confighas aVpcConfigset. This makes SageMaker distribution 4.0 unusable out of the box.Bug 1 — Wrong field name passed to
Networkingconstructordefault_enable_network_isolationdoes not exist as a field insagemaker-core 2.x. The correct field name isenable_network_isolation. This caused a PydanticValidationErroron everymodel_trainer.train()call when VPC config was present insagemaker_config.Bug 2 —
security_group_idssilently assigned wrong valueWhen updating an existing
Networkingobject,security_group_idswas mistakenly assigned the value fromTRAINING_JOB_SUBNETS_PATHinstead ofTRAINING_JOB_SECURITY_GROUP_IDS_PATH.Testing
Reproduced and verified the fix using a clean venv with
sagemaker-core==2.10.0andsagemaker-train==1.10.0(exact versions shipped in SageMaker distribution 4.0).Closes #5766