The Portable Document Format, known best by its acronym and file type “PDF”, was developed by Adobe Systems in 1993 and has remained the standard for consumer printing, publishing and electronic sharing ever since. It is no wonder, then, that every now and then the need to convert information stored in a PDF into an Excel worksheet comes up. Powerful tools are destined to collaborate.
The nature of the PDF file format, however, makes this problematic. By design, PDF files are graphic, visual documents meant to be viewed as a set of pixels organized in relationship to each other. They are not, then, collections of letters and numbers easily converted to other programs that deal more elegantly with text and data.
In Excel 365, you will use the Get & Transform tools to convert PDF data into a worksheet. The pros for this method include lasting queries and connections from your source so that, if you wish, you can update your workbook data simply by updating the original file. Since PDFs are not dynamic, this means saving a new file with updated information in the same place with the same name.
The cons for this method are simple: Excel doesn’t do a great job of interpreting PDF content. Plan to spend time cleaning up data formatting or stitching together tables that get split.
Import a Single Table from a PDF File
- Open the workbook where you want the data to live, then click the Data
- Click the Get Data dropdown menu in the Get & Transform
- Hover over From File and select PDF from the flyout menu. [A]
- Browse to and select the PDF that you want to convert, then click Import.
- The Navigator window opens. Excel will parse the file and prepare a list of tables and pages that it finds in the Display Options pane. Select any item to see a preview of the table in the preview pane.
- When the table you want imported is selected, click Load [B]to create a new tab with the data imported as a table, or open the dropdown and select Load To [C].
- Load To opens the Import Data dialog window. Select how you want the data to appear in the workbook – as a plain table or a PivotTable for example. Then, select where you want the data imported. We want to import to a plain table [D] into an Existing worksheet [E] at a specific cell address [F].
- Click OK
The data will be added in the type of table you select to your workbook at the location you specified.
Import Multiple Tables from a PDF File
If there are multiple tables that you want imported within the same PDF document, begin by following the same steps as above from the Get Data command on the Data tab.
- In the Navigator window, put a check in the Select multiple items checkbox [G], and then a checkmark by each item you want to include.
- Select Load To to open the Import Data dialog window.
- Select the Table radio button and click OK. You cannot place multiple tables at a location using the Existing worksheet option, it will be greyed out. When you click OK, a new tab will be created for each table you imported.
Import and Place Multiple PDF Tables into Excel Worksheets from the Queries & Connections Pane
Each time you import data using the Get Data command on the Data tab, Excel creates a connection back to the original file. If your PDF is complicated and Excel splits the information up into multiple tables – a common problem when tables cross page borders – then you may need to stitch your tables back together into a single worksheet. One easy, though tedious, way to do this is to place each table one at a time like a puzzle.
Here is an example:
- Begin as you did above from the Get Data command on the Data Select your tables, then choose the Load To option.
- In the Import Data dialog, click on the Only Create Connection radio button [H] and click OK. Each table or page you selected appears in the Queries & Connections pane but no data is loaded into a worksheet, yet.
- Hover over an item in the pane to see a preview of the table. You will probably select the top-leftmost table to begin. In the preview pane, click the More button [I] and select Load To [J] from the menu.
- Click the Table radio button [D] in the Import Data dialog, then select Existing worksheet [E] and the A1 [F] Click OK.
- Excel imports that table, then you are ready to stitch in the next one. Select the first cell in the first empty row under your imported data, then find the next table in the Queries & Connections pane, most likely the table that begins on page 2 of your PDF but it will depend upon how Excel broke up the pages.
- Repeat 3-4 steps above. In the Import Data dialog, the selected cell should be pre-filled in the field under Existing worksheet. If not, type or select the cell and click OK.
Repeat for each table you need to stitch, either to the bottom or to the right of your worksheet. You will also likely need to do some cleanup to remove repeated row headings and so on. This is not an elegant solution to complex PDF conversions, but simple in terms of steps.
Convert PDF Data to Excel Using Power Query Editor
If you are an advanced Excel user, or are familiar with Power Query, the Power Query Editor can be a big help in importing data from a PDF that has been “chopped up” or formatted badly by the Get Data command.
- To open the Power Query Editor, begin as you did above from the Get Data command on the Data
- Select your tables, then choose the Load To
- In the Import Data dialog, click on the Only Create Connection radio button and click OK. Each table or page you selected appears in the Queries & Connections pane but no data is loaded into a worksheet, yet.
- Double-click on any table in the pane, or hover and select Edit [K] from the preview to open the Power Query Editor [L]. In the editor you can edit data, merge or split columns that Excel imported incorrectly, or even stitch tables back together before adding them to the worksheet.
The cumbersome nature of these steps reveals just how different PDF files are from Excel workbooks. But, with a little patience and good ol’ fashioned time, you can convert PDF data to Excel data. It might not be faster than copy/pasting but having Excel do it means there are fewer copy/paste errors introduced. Play around with a few PDF documents just to get a feeling for how Excel will interpret the tables. Not only will you gain experience using the Get & Transform process, you will also gain a feeling for which PDFs will convert easily, and which will require multiple loads or Power Query edits.