Skip to main content Skip to complementary content

Processing strings to get the revenue corresponding to small taxi rides

A pipeline with a Test source, a Field selector processor, a Type converter processor, a Filter processor, and an HDFS destination.

Before you begin

  • You have previously created a connection to the system storing your source data.

  • You have previously added the dataset holding your source data.

    Here, hierarchical taxi data including pickup time, dropoff time, fare, etc. (download the type_converter-taxi.json file from the Downloads tab in the left panel of this page).

  • You also have created the connection and the related dataset that will hold the processed data.

    Here, a file stored on HDFS.

Procedure

  1. Click Add pipeline on the Pipelines page. Your new pipeline opens.
  2. Give the pipeline a meaningful name.

    Example

    Convert small taxi rides
  3. Click ADD SOURCE to open the panel allowing you to select your source data, here it is taxi-related data.
    Preview of a data sample with taxi hierarchical data.
    Information noteWarning: The Type converter processor cannot process sub-records. If you want to convert these records, you need to use a Field selector processor before in order to reorganize the records and put them on top of the schema.
  4. Select your dataset and click Select in order to add it to the pipeline.
    Rename it if needed.
  5. Click Plus and add a Field selector processor to the pipeline. The configuration panel opens.
  6. Give a meaningful name to the processor.

    Example

    reorganize records
  7. Click the Edit icon in the Simple selection mode:
    1. Select the .pickup.pickup_datetime field and rename it pickup_time, as you want to select and rename the pickup_datetime field of the first location and move it to the top level of the schema.
    2. Select the .dropoff.dropoff_datetime field and rename it dropoff_time, as you want to select and rename the dropoff_datetime field of the first location and move it to the top level of the schema.
    3. Select the .payment.fare_amount field and rename it fare, as you want to select and rename the fare_amount field of the first location and move it to the top level of the schema.
    4. Click Edit then Save to save your configuration.
      Preview of the Field selector processor after reorganizing taxi records.
  8. Click Plus and add a Type converter processor to the pipeline. The configuration panel opens.
  9. Give a meaningful name to the processor.

    Example

    convert rides and fares
  10. In the Converters area:
    1. Select .pickup_time in the Field path list, select the Primitive mode, select DateTime in the Output type list and type in yyyy-MM-dd HH:mm:ss in the Format field as you want to convert the DateTime type field holding pickup time information to an Integer type field. yyyy-MM-dd HH:mm:ss corresponds to the format of the input field.
      Information noteTip: To learn more about date formats and patterns, see Additional information about date and time patterns.
    2. Click the + icon to add a new converter and select .dropoff_time in the Field path list, select the Primitive mode, select DateTime in the Output type list and type in yyyy-MM-dd HH:mm:ss in the Format field as you want to convert the DateTime type field holding dropoff time information to an Integer type field. yyyy-MM-dd HH:mm:ss corresponds to the format of the input field.
    3. Click the + icon to add a new converter and select .fare in the Field path list, select the Primitive mode and select Double in the Output type list, as you want to convert the String type field holding fare information to a Double type field.
      Information noteTip: You have the possibility to apply multiple conversions on the same field. For example, you can convert a String type field that contains a date into a Long type field, and then use this generated Long type field to convert it into a DateTime type field.
    4. Click Save to save your configuration.
      Preview of the Type converter processor after converting records related to rides and fares.
  11. Click Plus after the Type Converter processor on the pipeline and add a Filter processor.
  12. Give a meaningful name to the processor.

    Example

    filter on short rides
  13. In the Filters area:
    1. Type in .{.dropoff_time - .pickup_time > 660000} in the Input list, as you want to filter rides which lasted less than 11 minutes.
    2. Select Count in the Optionally select a function to apply list, > in the Operator list and type in 0 in the Value list as you want to count these short rides.
    3. Click Save to save your configuration.
      Preview of the Filter processor after filtering taxi records on short rides.
  14. (Optional) Look at the preview of the Filter processor to see your data after the filtering operation.
  15. Click ADD DESTINATION on the pipeline to open the panel allowing to select the dataset that will hold your data (HDFS).
  16. Give it a meaningful name, short rides data for example.
  17. On the top toolbar of Talend Cloud Pipeline Designer, click the Run button to open the panel allowing you to select your run profile.
  18. Select your run profile in the list (for more information, see Run profiles), then click Run to run your pipeline.

Results

Your pipeline is being executed, the field types are converted and filtered, and the output flow is sent to the target system you have indicated.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!