New Sources of Microsoft Office Metadata – Tool Release MetadataPlus
TL;DR – 31 usernames extracted vs 13 from the next leading brand!
Open Source Intelligence Gathering (OSINT) can be an activity in itself and can also form a solid foundation for Full Spectrum Attack Simulations. Getting an idea of username formats as well as a number of known usernames increases the chances of success with password spraying. In addition, any information that can be gathered such as hostname conventions, internal servers, or Operating System types could all inform decisions made once a foothold has been established. Finally, the more information the better when it comes to social engineering.
When conducting research for a macro-based client/server framework, I discovered a number of new places within different types of office documents that contained useful metadata (such as usernames and hostnames) that were not recovered using industry standard tools such as FOCA. This post introduces a new tool, MetadataPlus, which can be found on the NCC Group GitHub (https://github.com/nccgroup/MetadataPlus) and describes the new metadata sources covered by this tool.
In a test case involving roughly 120 publicly accessible documents, FOCA extracted 13 usernames, while MetadataPlus extracted 31- and due to the formatting of the output, also lead to the discovery of the unusual username format pattern in use.
It is probably fairly well known that Office files can be extracted like a zip file to gain access to internal files that make up the document. MetadataPlus works by extracting document files and looking for specific tags or patterns within the internal files. MetadataPlus is designed to cover a number of Microsoft Office filetypes. It began with xlsx/xlsm (Excel) and docx/docm (Word) but was expanded to trial against all possible save formats for a number of Office products. The ones found to work and included now by default are:
- xlsx/xlsm, docx/docm – Word and Excel files
- xltx/xltm, dotx/dotm, potx/potm – template files for Excel, Word, and PowerPoint
- ppt/pptx – PowerPoint files
The program includes a -a option that will attempt to process every file in the folder, and in theory it should work on any Office file that can be extracted out into XML files, however, the ones listed above were the ones found to work during testing.
Last saved location
For the first new location of metadata, we need to look at the workbook.xml file of an Excel document. If this contains a tag with the value absPath then this shows the document’s last saved location which could potentially include usernames or hostnames. For example, the hostname of a device if it was saved on a network drive such as \networkedcomputer01\docstest.xlsx, or a username if it was saved locally somewhere like c:\users\bilbo1\Documentstest.xlsx. In the following screenshot from a demonstration document we can find the username gragra576.
The second new location for metadata is any comments file extracted from the document. The comments file includes the author name which can be their username or name, and it is also possible to view the comment – even if this doesn’t show up when you open the document itself. As an example, I have comments1.xml with the <author> tag showing another username:
External and image links
External links and image link files contain links that have been found to include links to unsecure (HTTP) servers that might indicate an internal location on a network. These have also included usernames where the server was set up with a user folder structure, as well as hostnames and network filepaths. Additional information can sometimes be inferred, such as an external link to a OneDrive folder – indicating O365 or OneDrive in use. Further, there is always the chance that additional domains or subdomains will be discovered that can be added to your list of targets for investigation. I have tried to clear most of the noise from these responses so that what is left should be useful for further investigation. In a number of cases I found external links to user folders using “/” instead of “\”, and for this reason, username pattern matching was extended to look for users using this unusual style. In the following example, a link has been added to a network server that seems to include a Windows style folder structure, and the username rodmig358:
Hidden sheets are designated using the element tag State=hidden, and MetadataPlus calls out these hidden sheets by file, meaning that they can be unhidden and investigated manually as it is possible they may contain something useful that was expected to remain private. Unhiding is easily performed in Excel for example by right-clicking on a visible sheet and choosing Unhide… however, this would be tedious to check over a vast trove of documents, and so this list can point you towards documents where this may be worthwhile.
The <dc:creator> tag contains the name of the creator of the document and can be the name or username. In some instances this has been seen to include both in the following way “Chris Nevin – cnevin” and Metadataplus will try and separate this into a name and username if this is seen.
As shown above MetadataPlus looks for users and hostnames in specific locations, however, it also contains pattern matching to extract usernames from filepaths such as c:\users\bill, c:\documents and settings\bill or /users/bill. This pattern matching is also used to perform some basic grepping on each individual file to search for usernames, names, and hostnames that may appear in places we did not expect. MetadataPlus also searches for the string “password” and will flag documents and strings containing this word and contains the option to search for a user supplied string that may be relevant to your specific target or could include searching for something like API keys.
Extracting embedded documents and media
Office documents can include media as well as other embedded documents which can be additional sources of metadata. MetadataPlus has -m and -e options which will extract these files for further analysis. While some image files may come from external sources, or be stripped of metadata, some may be included locally without processing and could include additional data when examined with a tool such as exiftool. Embedded documents may contain additional information that can be analysed with MetadataPlus by moving the exe into the Embed folder and running again, and in the test case elaborated on below, an additional 5 names were extracted from embedded documents within the original documents.
Results Output and Analysis
Where possible MetadataPlus is designed to display data in a way that is useful and may aid in further discoveries. The tool prints a number of outputs at the end, including output from documents that contain both names and usernames as well as the tags these were pulled from – although for ease of use in other tools, MetadataPlus also prints a raw list of usernames, names, and email addresses. In the following example, a name is often pulled from the same document as a username:
The usernames appear to be random, however, as the name is often taken from who the file was last modified by then it stands to reason that the username taken from the filepath might belong to this name. At first, the usernames appear to be random letters and numbers, however, thanks to linking the names and usernames in the output, it becomes apparent that there is a pattern: first three letters of the surname + first three letters of the first name + three numbers. While the three numbers would take further investigation (they could be a building or department tag for example, or they may actually be random) the usernames initially appear to be completely unconnected but this linking in the output makes it clear there is a pattern after all. This could be useful in social engineering as well, as something like calling for a password reset without knowing the name to a username or vice versa would be likely to end in failure.
As a final example, the following shows the list of unique usernames MetadataPlus was able to pull from 9 documents:
In contrast, FOCA only returned full names:
It should be noted that these documents were created to highlight the difference as FOCA does not make it easy to distinguish between names and usernames in their output here and made it difficult to show a comparison between the tools. In the roughly 120 documents analysed from a public document test case for one organisation MetadataPlus found an additional 18 usernames to the 13 discovered by FOCA.