Automate Jupyter Notebook in ECS -1

Faisal KK
3 min readJun 10, 2021

--

There are many ways to run Jupyter notebook in an automated way in AWS like EMR Studio or Glue Notebooks. Most of them are designed to be an integrated work environment and pricing and configurations are designed in that way. What if you are building your notebooks in your local machine and only need to automate in AWS — Here we are discussing a very cost effective way to do that.

The Approach.

In this approach we are using popular Python library papermill to automate the notebook run.We can run the same notebook developed in UI without any change with papermill or even better by passing parameters and many other configurations. We will package the library with other dependencies like pandas into a docker image, push that to private image registry (ECR)in AWS account. Then will use AWS batch to run the Notebook using the containers created from that image. Optionally we Schedule this using AWS StepFunction

Environment

Now we ill discuss in detail the steps… before that here is the details about environment setup. Below image shows how home directory looks like. Not all the files in the directory are important for this discussion. All the required files are discussed in respective steps.

Here local_run_notebook.py will runs local notebooks and s3_run_notebook.py will download notebooks from s3 and run it. We will discuss this detail soon. entrypoint.sh is used inside Dockerfile.

To run a local notebook you can place inside notebooks directory, it will be ran and output notebooks will be placed inside same directory with a prefix out_

To install Python libraries, you can specify it in requirements.txt . It will be installed inside the container. entrypoint.sh will take care of that.

Steps

1. Create required files and scripts.

Copy following files to the directory.

Dockefile - Fairly simple Dockerfile. All it is doing is installing dependencies and copying local files to container.

entrypoint.sh — Used as entry point script in Dockefile.

local_run_notebook.py — This file will run local notebooks. Please note that this script will be running inside docker and dependencies will be satisfied inside docker

Now s3_run_notebook.py — will download the notebook from S3 run it and upload result back to s3.

2. Build Docker and test locally

Build docker — run the command from home directory

docker build -t pmill_final:v1 .

Now it will create docker image as per the Dockerfile. Once it completed, you can put a test notebooks inside notebooks directory and test with following command.

docker run — mount type=bind,source=<home_path>/notebooks,destination=/notebooks -e ENV=prod pmill_final:v1 local test1.ipynb

3. Test S3 Notebooks

Before running S3 notebooks make sure you have set up config and credential file in ~/.aws folder and user has enough access rights. Once that is done you can upload a notebooks to S3 and provide location in the command.

docker run — mount type=bind,source=<home_path>/notebooks, destination=/notebooks -e ENV=prod pmill_final:v1 s3 <s3_bucket_name> <s3_prefix>/test1.ipynb

Conclusion

Now we have a working Docker container. How to upload the image to ECR and creating Batch Jobs we will discuss in the next article.

--

--

No responses yet