Steps to install Anaconda to run Pyspark projects.
Install Prerequisites
Install Homebrew or update it if already installed:
brew update
Install Python
brew install python@3.9
Install Java
brew cask install java
Install Scala
brew install scala
Install Spark
brew install apache-spark
Install Anaconda from website or:
brew install --cask anaconda
Setup environment variables and path
I'm using zsh
, but if using default bash, use change ./bashrc
:
vim ~/.zshrc
Add the following to .zshrc:
export HOMEBREW_OPT="/opt/homebrew/opt"
export JAVA_HOME="$HOMEBREW_OPT/openjdk/"
export SPARK_HOME="$HOMEBREW_OPT/apache-spark/libexec"
export PATH="$JAVA_HOME:$SPARK_HOME:$PATH"
Refresh Terminal with the new settings or just restart it.
source ~/.zshrc
Errors
For some tests, I was getting:
Error 23: Too many open files in system
You can see what the current values are for file limits by the kernel:
sysctl kern.maxfiles
sysctl kern.maxfilesperproc
To resolve this error I updated the values for kern.maxfiles
and kern.maxfilesperproc
:
sudo sysctl -w kern.maxfiles=20480
sudo sysctl -w kern.maxfilesperproc=18000
These lines could be added to the ~/.zshrc
or ~/.bashrc
once the appropriate values are determined for your project. Downside is that it requires sudo. Alternatively, configs can be altered.
Summary
I just wanted to see if it could be done on a M1 Macbook Air (7-cores with 8GB of RAM).
Running 306 Tests in 10m:07s
. Failures with two tests. One involving ARIMA model and the other datetime format assertion error.
WSL on a Dell 5550 with Intel Core i9-10885H @ 2.40GHz and 32GB of RAM:
time pytest -n 8 --dist=loadscope .
...
==================================== 306 passed, 526 warnings in 535.95s (0:08:55) =====================================
real 8m56.466s
user 13m29.008s
sys 37m3.325s