Installing Python 3 third-party libraries
Python 3 is the replacement of the Python processor. It supports everything the old processor could do and adds some functionalities like the installation of third-party libraries.
Differences between Python 2 and Python 3
- The code should be Python 3 code and not Python 2.
- There is no concept of Map and Flatmap when trying to modify your records. Hence, the disappearance of the corresponding drop-down list in the user interface.
The latter difference is important as it allows you to write straightforward code to filter, map, or flatmap records.
if input['type'] == "house":
# Single family dwellings have a top-level occupant (MAP).
output = input['occupant']
elif input['type'] == "apartment":
# Apartment blocks have many occupants (FLATMAP).
output = [apt['occupant'] for apt in input['subdwellings']
else:
# Deleting the record (FILTER).
output = None
Installing libraries
- in the previewrunner container
- in the livy container
You can install libraries either using a file or via command line.
Installing libraries in the Remote Engine Gen2 using the requirements.txt file
In the previewrunner container:
-
Create a folder on your local machine. Name it /tmp/rqmts for example.
- Open this file to edit it:
default/docker-compose.yml if you are using the engine in the AWS USA, AWS Europe, AWS Asia-Pacific or Azure regions.
eap/docker-compose.yml if you are using the engine as part of the Early Adopter Program.
- Add this parameter in the
- /tmp/rqmts:/opt/rqmts
section of the
file: - Add this parameter in the
PYTHON_RQMTS_PATH: /opt/rqmts
Note that the paths are entirely customizable as long as you have written access to them.
section and save your
changes: - Go to Talend Cloud Pipeline Designer and check that creating a pipeline with a Python 3 processor works as usual.
- Create a requirements.txt file inside your
/tmp/rqmts folder. This file should contain libraries to
install in the Python Virtual
Environment:
jinja2==2.11.2
- Go back to your pipeline and add some code that uses the libraries specified in
requirements.txt in your Python 3
processor. For example:
from jinja2 import Template t = Template("Hello {{something}}!") output["hello"] = t.render(something = input["Op"])
Save your changes and check that the data is previewed successfully. You can modify the requirements.txt file on your local machine and update your code, and you should see that everything works fine.
In the livy container:
The procedure is the same as the one for the previewrunner container, the only difference being that you have to edit the livy section in the docker-compose.xml file.
- Add the PYTHON_RQMTS_PATH environment variable in your cluster.
It should point to a mounted volume and not a folder that is erased whenever the
worker server dies.
For example: /dbfs/tpd-python3-rqmts
- Repeat the same steps as for the previewrunner container (create the requirements.txt inside the /dbfs/tpd-python3-rqmts folder, update pipeline, etc.). Everything should be working fine.
Installing libraries in the Remote Engine Gen2 via command lines
Libraries can also be installed directly using command lines or by launching a shell script.
To do that you will have to install your libraries in both the previewrunner and the livy container.
-
In Talend Cloud Pipeline Designer, start by creating a pipeline with a Python 3 processor in it and try to preview it.
This will force the unpacking of all the Python files in your previewrunner container.
- From the command line, run a command like this to install numpy,
for example:
docker exec -it [previewrunner_docker_img_name] \ bash -c "source /tmp/luci/local/env/default/bin/activate && pip install numpy"
- You can then edit your code in the Python 3 processor and save your changes.
-
In Talend Cloud Pipeline Designer, create a pipeline with a Python 3 processor in it to force the unpacking of all the Python files.
- From the command line, run a command like this to install
jinja2, for example:
docker exec -it [livy_docker_img_name] \ bash -c "source /tmp/luci/local/env/default/bin/activate && pip install jinja2"
- Write some code in your Python 3 processor that uses jinja2, save your changes and check that the preview is displayed successfully.