<< Back

Full Outer and Inner Joins with Multiple Inputs: The ‘Join Multiple’ vs ‘Manual’ Method

Most Alteryx Designer users are familiar with merging two datasets together using the Join tool. But when it comes to creating full-outer and inner joins, and working with multiple datasets, things are slightly more complex.

In this post, I’ll present two main methods for producing full-outer and inner joins with multiple inputs: first, using Join and Union tools in combination; and second, with the Join Multiple tool. We’ll look at how data structure can differ between the two, before summarising the pros and cons of each.

The tools looked at today can all be found in the ‘Join’ tool palette:

In these examples, all Joins and Join Multiples are configured to ‘Join by Specific Fields’, which is the default (and by far the most common) setting. It allows for maximum control, and avoids the headache of mismatched data that can occur with positional joins.

If you want to take a closer look at any of the following content, you can download the workflow here.

A Full Outer Join

A full outer join will produce a single output with all records, whether they match fully, partially, or not at all, across your inputs.

A full outer join of 3 inputs; ‘A’, ‘B’, and ‘C’ can be visualised like this:

1. THE ‘MANUAL’ METHOD

This can be achieved using a series of standard Join tools, with their L, J and R outputs Union-ed back together:

The structure of the output depends on a) the names of your joining fields, and b) the configuration of the Union tools.

OUTPUT: ONE, CONSOLIDATED JOIN FIELD

This structure has one primary, ‘complete’ column that merged all the join condition values, regardless of whether the records matched or not. (Incomplete, duplicate columns resulting from the joins can be removed). This is perhaps the most useful structure for most data.

Matching join field names
Union: Auto-configure by Name

Ensure the field/s you’re using as the join condition have exactly the same names across all your inputs. In this example, the join field is called ‘ID’ across all 3 inputs. The IDs which fell out of the L and R joins are therefore stacked, one on top of the other, in a single ID column.

Different join field names
Union: Manually Configure Fields

If you wanted a consolidated join field in the manual-join process, but the join field columns across inputs have different names, you could recreate the structure using the ‘Manually Configure Fields’ option at the top of the configuration pane. The left and right arrows allow you to choose the column each field from your inputs should fall into.

OUTPUT: SEPARATE JOIN FIELD COLUMNS

This structure stays true to the original data: only the IDs present in Input 1 are in Input 1’s ID column, and so on for Input 2 and 3. (Naturally, if a join field value is present in two inputs, it’ll appear in multiple columns).

Different join field names
Union: Auto-Configure by Name

Non-matching names with name-based matching means that each join field column is preserved in the output.

2. THE JOIN MULTIPLE TOOL

A full outer multi-join is achieved by leaving this option in the configuration pane of the Multi-Join tool unticked:

OUTPUT: SEPARATE JOIN FIELD COLUMNS

The workflow does look tidier, but the Join Multiple tool does not have an option within the tool to consolidate join fields into a single column. It automatically renames the join fields to reflect the input they’ve come from, and there’s no manual Union-ing going on to allow us flexibility.

The resulting output is therefore always ‘separate IDs in separate columns’.

An Inner Join

An inner join will only output records that match across all your inputs.

The Venn diagram of an inner join of 3 inputs would be:

1. THE ‘MANUAL’ METHOD

This can be achieved using a series of Join tools, with the J inputs (i.e. matches) being carried forward for the next Join:

OUTPUT: ONE, CONSOLIDATED JOIN FIELD

As you’re only returning IDs that matched across all inputs, the result is a neat, single column, comparable to the consolidated output above. (Additional columns resulting from the Joins will be exact duplicates, which you can remove.)

2. THE JOIN MULTIPLE TOOL

Again, it’s a sleek alternative:

An inner multi-join is achieved by selecting this option in the configuration pane:

OUTPUT: ONE, CONSOLIDATED JOIN FIELD

The resulting structure is the same as above. (Only the headers are different for the Multi-Join by default, with those automatic suffixes based on input number.)

Pros, Cons and Structure

When making a decision on which method to use, you should consider the context of your workflow as well as desired data structure:

METHODPROSCONS
‎‏‏‎ ‎ ‎‏‏‎ ‎ ‎‏‏‎ ‎
MANUAL
L/R outputs to view
unmatched records
easier to explain, troubleshoot, and audit
Can be repetitive and
look untidy when
working with many inputs
‎‏‏‎ ‎ ‏‏‎ ‎ ‎‏‏‎ ‎
JOIN
MULTIPLE
Single tool can save time
and look tidy when
dealing with many
inputs
Single output more difficult
to troubleshoot and audit
METHODFULL OUTERINNER
‎‏‏‎ ‎ ‎‏‏‎ ‎ ‎‏‏‎ ‎
MANUAL
Join Fields separate or
consolidated from
Union configuration
Join Field/s complete
‎‏‏‎ ‎ ‎‏‏‎ ‎ ‎‏‏‎ ‎
JOIN
MULTIPLE
Join Fields separate
per input
Join Field/s complete

And that’s it for this introductory dive into multiple joins! If you’d like a closer look at the examples covered today, you can download the workflow at this link.

Have questions about merging data? Be sure to leave a comment below.

Jess Hancock

London, UK

Leave a Reply

Your email address will not be published. Required fields are marked *