I have installed the databricks cli tool by running the following command
pip install databricks-cli
using the appropriate version of pip for your Python installation. If you are using Python 3, run pip3.
Then by creating a PAT (personal-access token in Databricks) I run the following .sh bash script:
# You can run this on Windows as well, just change to a batch files # Note: You need the Databricks CLI installed and you need a token configued #!/bin/bash echo "Creating DBFS direcrtory" dbfs mkdirs dbfs:/databricks/packages echo "Uploading cluster init script" dbfs cp --overwrite python_dependencies.sh dbfs:/databricks/packages/python_dependencies.sh echo "Listing DBFS direcrtory" dbfs ls dbfs:/databricks/packages
python_dependencies.sh script
#!/bin/bash # Restart cluster after running. sudo apt-get install applicationinsights=0.11.9 -V -y sudo apt-get install azure-servicebus=0.50.2 -V -y sudo apt-get install azure-storage-file-datalake=12.0.0 -V -y sudo apt-get install humanfriendly=8.2 -V -y sudo apt-get install mlflow=1.8.0 -V -y sudo apt-get install numpy=1.18.3 -V -y sudo apt-get install opencensus-ext-azure=1.0.2 -V -y sudo apt-get install packaging=20.4 -V -y sudo apt-get install pandas=1.0.3 -V -y sudo apt update sudo apt-get install scikit-learn=0.22.2.post1 -V -y status=$? echo "The date command exit status : ${status}"
I use the above script to install python libraries in the init-scripts of the cluster
My problem is that even though everything seems to be fine and the cluster is started successfully, the libraries are not installed properly. When I click on the libraries tab of the cluster I get this:
Only 1 out of the 10 python libraries is installed.
Appreciate your help and comments.
Advertisement
Answer
I have found the solution based on the comment of @RedCricket,
#!/bin/bash pip install applicationinsights==0.11.9 pip install azure-servicebus==0.50.2 pip install azure-storage-file-datalake==12.0.0 pip install humanfriendly==8.2 pip install mlflow==1.8.0 pip install numpy==1.18.3 pip install opencensus-ext-azure==1.0.2 pip install packaging==20.4 pip install pandas==1.0.3 pip install --upgrade scikit-learn==0.22.2.post1
The above .sh file will install all the python dependencies referenced when the cluster is starting. So, the libraries won’t have to be re-installed when the notebook is re-executed.