
How to recover a broken Tableau Server with no backup (2018.2+ Edition)
This is an update to a previous post I wrote, which described how to recover your data from a broken Tableau Server if you don’t have a backup to hand. Please refer to my previous post if you are running Tableau Server version 2018.1 or earlier. If you are running Tableau Server version 2018.2 or later, then keep reading.
As discussed in my previous post, this is a last resort option. Please exhaust all other methods to try and get yourself a backup first before trying this. When I say backup, I mean a proper Tableau Server backup file (.tsbak) generated using TSM at the command line interface or Web UI. If you don’t know what I’m talking about then read this.
In this scenario, we’re going to assume that your server is in an unstable state, and that for whatever reason you are unable to use TSM to create a backup file. Your actual Tableau Server could still be running fine, but since TSM is decoupled from the Tableau Server processes and runs independently, it can still be in a bad way even if the actual application is not. How could this possibly happen you say? It could occur after a botched install/upgrade; or permissions could have not been set up correctly – did you know that the Tableau Server Run As user now needs to be a local administrator on the server machine?; or it could be something preventing one or more of the six TSM services from starting correctly.
So let’s say that for what0ever reason this happens for real and after trying all possible troubleshooting steps you are left with no choice but to attempt a manual recovery. Here’s what you do. One last caveat, I offer this advice with no support. What you do to your server is on you. If you do find yourself in the unenviable position of having to undertake this recovery and you don’t feel comfortable following these steps, then contact Tableau Support, and they will help walk you through this process. Better still, set up a separate throwaway server and practice the steps until you feel comfortable with them, then apply them in your actual environment.
- Find and run the stop-administrative-services script to kill all the server processes. Note this script only seems to exist on the Windows release in 2018.3, but looks like it will be available in the 2019.1 Linux release, as it’s available in the beta version already. You can find this script in: <tableau server directory>/packages/scripts.<version number>/
- Monitor your server processes using Task Manager (Windows) or top (Linux) and wait for all the Tableau related services to die off. By the end your Tableau server application should be completely stopped, allowing us to safely copy files.
- Rename the Tableau application/data folders. Specifically, you want to rename the “Tableau Server” folders in C:\Program Files\Tableau\ and C:\ProgramData\Tableau, or if you installed to a separate drive, rename the original install folder location, e.g. D:\Program Files\Tableau\. On Linux, the default locations are: /opt/tableau/ (for application files) and /var/opt/tableau (for data). Renaming these folders will preserve them when we run the obliterate script in the next step.
- Run the tableau-server-obliterate script. You’ll find this script in the same folder as the script in Step 1, although you’ll have renamed the parent tableau server folder, but go ahead and run it anyway. Make sure you run this manually from the command line – double-clicking on it won’t do anything – and be sure to follow the instructions to get it to execute properly. This will completely uninstall Tableau Server and remove any environment variables, but since we renamed our folders, our data is preserved.
- Uninstall tableau server from add/remove programs – if you’re on Windows, you might need to take this extra step to deregister Tableau Server from Windows.
- Run the Tableau Server installer – make sure you install the exact same version that you were running previously. Simultaneously attempting an upgrade will fail. Don’t do it!
- You should be prompted to select an install location (if you do not, or the installer tells you there is already another installation present, then go back and run the obliterate script again – properly). Select the location where you had originally installed Tableau Server. Do not select the renamed folders.
- Choose to create a new tableau server installation
- Install server with same authentication method and service account as used previously. Be sure to specify the fully qualified domain name for your Run As User if it is a domain user. Also make sure that user is added explicitly to the local administrators group on your server.
- Wait for the install to complete, coffee time.
- Once finished, you will be prompted to create a new admin account in the Tableau Server interface and login. Wait and make sure all samples have finished publishing before continuing to the next step.
- At this point, you should be able to access the TSM Web UI on port 8850 and use TSM at the command prompt. Stop the server using either method.
- Now here comes the recovery: we are going to replace some files and folders in the new installation with the files and folders from the old, renamed installation. Go ahead and rename the “pgsql” and “data engine” folders in the new installation directory. You should find these in <Tableau Server data directory>/data/tabsvc/
- Find those same folders in your original, renamed installation, and copy them over to the new location. We have now copied all our data over from the old, broken install. From this point onwards, we will work in the new install location only.
- Modify your pg_hba.conf file. You’ll find this file in the following place: <Tableau Server data directory>\data\tabsvc\config\pgsql_<version>/pg_hba.conf
Change “md5” to “trust” for the user “tblwgadmin”
from: host all tblwgadmin <address>/32 md5
to : host all tblwgadmin <address>/32 trust
16. Repeat this for all the lines that contain the “tblwgadmin” user
17. Regenerate the internal security tokens:
tsm security regenerate-internal-tokens
tsm pending-changes apply
18. At this point, your server should have started, and you should be able to login to your Tableau Server after a few minutes. If you find that some of the content on the server is missing from the web interface, then you’ll want to reindex the search engine:
tsm maintenance reindex-search
19. Confirm that the tsm commands (ie: stop/start/status) all function as expected
All finished? Server back up and running? Great, you can wipe that sweat off your brow! Now is a good time to do a ‘tsm maintenance backup’ and store that backup somewhere safe!
Can you fo this on a new clean server, I mean other machine?
Yes that will work.
Can you do this in a new clean windows server, not the one tableau server was installed before?
Yes it works
Hello, does this work on a clean windows machine?
Really nice post helped os recover from a broken dockerized CentOS setup after the server it was running on (ubuntu) got full and cleaning it for storage did not help. Only want to point out that when running these commands
`tsm security regenerate-internal-tokens`
`tsm pending-changes apply`
on a linux distro setup you have to make sure that the copied `dataengine` and `pgsql` from the old server belong to the correct `user` and `group` else these commands won’t work. So running the cmd `ls -al` inside the folder `/var/opt/tableau/tableau_server/data/tabsvc/` (for CentOS cannot tell if it is the same for other distros) will show you all the folders with their respective owners and groups they belong to. `dataengine` and `pgsql` have to belong to the same user and group as the rest of the folders. So a simple command to fix this is:
`sudo chown -R tableau:tableau pgsql/`
`sudo chown -R tableau:tableau dataengine/`
because we had the problem that when we copied the files into our clean installation it belongs to the user and group `tsm`
That’s a useful tip, thanks!
This is a fantastic reference.
I followed this process to recover from a failed upgrade of 2019.4.1 to 2020.1.1.
I’m not sure what caused the original failure, but I was in a state where my server wouldn’t start up and I couldn’t do anything in tsm (short of seeing that services weren’t running).
I followed this process to get a clean slate, reinstall 2019.4.1, move my data (dataengine and pgsql), and then perform a successful upgrade. I did have to manually reconfigure some of my server settings like mail server, SSL, etc.
I also found that I had to be careful with checking the Windows security on the dataengine and pgsql directories. (At one point, I’d copied them off to a network share and then lost the appropriate ownership, so I had to restore that by comparing to the fresh install versions before running the regenerate-internal-tokens.)
For me, everything stopped fine (nothing to do in step 2), and I didn’t have to uninstall using Windows Add / Remove (step 5).
Thank you a million times for this post.
(That’ll teach me to do an in-place upgrade without running a backup first…)
Thanks Aaron, 3-4 months later and your update for checking the folder security saved me a headache.
Not until I accessed the copied folders on a windows server installation did I get this to work. Thanks for the pointer Nedim even if for another operating system.
I was extremely excited about these steps and have tried them about 10 times unfortunately I keep hitting a brick wall on this step:
tsm security regenerate-internal-tokens
It fails on step 45/46 and I get a ton of errors in the log for backuprestore with this error — any ideas?
2020-10-01 05:48:40.038 -0400 pool-17-thread-2 : WARN com.tableausoftware.core.encryption.EncryptedAttributesService – Failed to decrypt keychain for entity with id 35. Asset key 1 not found. Skipping.
2020-10-01 05:48:40.039 -0400 pool-17-thread-2 : ERROR com.tableausoftware.core.encryption.EncryptedAttributesService – failure to decrypt keychain – returning null keychain: java.lang.RuntimeException: Error attempting to perform AEG decryption operationsjavax.crypto.BadPaddingException: Given final block not properly padded. Such issues can arise if a bad key is used during decryption.
java.lang.RuntimeException: Error attempting to perform AEG decryption operationsjavax.crypto.BadPaddingException: Given final block not properly padded. Such issues can arise if a bad key is used during decryption.
at com.tableausoftware.core.encryption.AEGEncryptionOperations.decrypt(AEGEncryptionOperations.java:140) ~[tab-encrypted-attribute-latest.jar:?]
at com.tableausoftware.core.encryption.EncryptionAlgorithmAEG.decrypt(EncryptionAlgorithmAEG.java:101) ~[tab-encrypted-attribute-latest.jar:?]
Were you ever able to fix this? that’s where I am at right now.
Unfortunately for my situation with 2020.3.0 having a disk-full event, after several hours the “re-encrypting assets” step fails and TSM crashes.
Has anyone managed to follow this while only having a backup of the “pgsql” folder, but not the “dataengine”? Don’t ask me how, but I managed to “lose” my “dataengine” folder.
tsm security regenerate-internal-tokens is failing. My guess is that it tries to recrypt “dataengine” but since it doesn’t match with my “pgsql”, it fails.
What I did mange to do was extracting all my workbooks .twb (XML) files from my original “pgsql”, but I would like to keep trying to recover the whole “pgsql” since it has my users, etc.
If for some reason the “regenerate-internal-tokens” step fails for you, you can try the following. For me, it fails because I lost my “dataengine” folder and was attempting a recover only of the “pgsql” folder. This will cause to “regenerate-internal-tokens” to fail probably because it tries to re-encrypt extracts that it can’t find. The command would hang, and “backuprestore” logs would show:
2020-12-24 17:08:14.083 +0000 pool-13-thread-2 : WARN com.tableausoftware.core.encryption.EncryptedAttributesService – Failed to decrypt keychain for entity with id 29. Asset key 1 not found. Skipping.
Anyway, what I did to prevent the need of running “regenerate-internal-tokens” was sort of an opposite strategy. Instead of telling Tableau to regenerate all credentials so it would create a new master password and new passwords for Postgres, I retrieve the Postgres passwords it already generated on the installation and override them on my restored “pgsql”.
Here are the steps. Note that you have to do this on yet another clean installation, before running “regenerate-internal-tokens”, as when it fails it leaves the server on a corrupted state (it seems some services were re-encrypted, but not all):
1) After step 16 of the blog post, retrieve the generated passwords from the new clean installation:
tsm configuration get -k pgsql.adminpassword
tsm configuration get -k pgsql.readonly_password
tsm configuration get -k pgsql.remote_password
tsm configuration get -k jdbc.password
Save them somewhere.
2) Start the Postgres server directly (note that this is starting our recovered pgsql which has the old passwords assigned to the users):
cd /opt/tableau/tableau_server/packages/pgsql.20204.20.1116.1810/bin/
./postgres –config-file=/var/opt/tableau/tableau_server/data/tabsvc/config/pgsql_0.20204.20.1116.1810/postgresql.conf
And in another tab, log in into Postgres:
cd /opt/tableau/tableau_server/packages/pgsql.20204.20.1116.1810/bin/
./psql -h localhost -p 8060 -U tblwgadmin postgres
Inside Postgres, run:
ALTER USER tblwgadmin WITH PASSWORD ”;
ALTER USER readonly WITH PASSWORD ”;
ALTER USER tableau WITH PASSWORD ”;
ALTER USER rails WITH PASSWORD ”;
Using the 4 passwords retrieved from above.
3) Revert the changes on hba_conf (put md5 back instead of trust).
4) Start the server. For me, everything worked fine, meaning that now our restored pgsql is using the new passwords generated on the Tableau clean installation.
Note that this works for me because I have no extracts (I lost my “dataengine” folder).
Thanks a lot, Jonathan MacDonald, for this blog post. It really really helped me.
Fernando I don’t know who you are, but I know that you must have had a mildly stressful start to your Christmas!
Thank you for building on the work of Jonathan, the method you detailed helped me to help a customer out of a tricky situation too.
Genius. Merry Christmas x
Jonathan & Fernando, phew! THANKS A TON! Recovered the server with all the data intact after a failed upgrade from 2020.3 to 2021.2. I had a ton of extracts in the dataengine folder and still the tip from Fernando worked (on Windows) ! Great find htat even Tableau Support was surprised with the recovery.
One point to note is to gracefully close the postgres session after the four ALTER commands by using the \q command so that when you try taking a backup after the server is up, you’ll not have any errors.
Quick Note: SMTP and topology changes had to redone after installing Tableau again.
We recently had an out of diskspace issue on a server – and now quite a few things are in an error state and will not start. Will this process work with that as sadly we don’t have a recent backup.
Thank you for this post, we followed it on 2020.2.3, everything seems to work, but we have an issue with the Embedded password in the connection, the stored password seems to be not working and if I change the password then the test is ok. But after saving and opening the same issue occurs. Does anyone have that same issue?
This is really helpful thanks.
I found that the `tsm security regenerate-internal-tokens` step was hanging on the enabling database services task, and then failing.
Looks like I had incorrect permissions on the foldersI had carried over and for pgsql at least the logs were telling me `DETAIL: Permissions should be u=rwx (0700) or u=rwx,g=rx (0750).`
This was exceptionally helpful to me when our server (2021.1.2) decided the local server SSL cert was no longer valid, and as a result TSM couldn’t start or be accessed or used for anything (Windows Server 2016 here)
TSM was not accessible via command line after reinstalling, and I had to recreate the environmental variable per this article –
https://kb.tableau.com/articles/issue/error-tsm-is-not-recognized-as-an-internal-or-external-command-operable-program-or-batch-file
I did have to replicate the permissions on the copied pgsql and data engine folders, and also came across this article from Tableau regarding required permissions –
https://help.tableau.com/current/server/en-us/runas_confirm_read_ex.htm
Slowly putting the missing settings back together, but so far, so good!
Does this work on a 3 node cluster?
Wonder if adding the nodes back between steps 16 and 17 would do the trick.
I’m giving it a shot and will let you know. I know this post is quite dated but it seems to be the only one of it’s kind.
Appreciate the guidance!
Hi,
I followed the steps below on a new windows server and had an issue when regenerating the security tokens – failed with error ID 5. I can’t find anything useful in the logs to explain the error.
Did anyone have this issue? Any ideas to resolve?
Hey John,
My 2021.4 is somehow corrupted. I followed your steps but when I’m generating tokens the postgl db is not starting up thats why it’s not finishing that step. Can you help a little?
DO NOT contact Tableau Support for assistance with this entirely unsupported recovery method. If it works for you, that’s great, but it is in no way supported and we will not provide support to perform these steps.