Importing Data Securely: A Balancing Act

August 8, 20246 min read

Importing data into your network is invariably necessary but, without the proper defenses against malicious attacks, risky to both your network and your company. In this blog, I will discuss how a company might manage this risk. There are many ways to do this, and which is best depends on the level of threat and the impact of potential damage. Therefore, I will look at a range of measures in turn, each giving increasingly stronger protection but also becoming more complex. It will be up to the company to determine which is best for them.

Let’s take the example of a company that needs to allow its customers to upload PDF documents to its protected network through the Internet. This creates a risk to the company network infrastructure, as it could allow malicious files to infiltrate and exploit vulnerabilities in the network, so the company must take steps to protect themselves.

Ensure the Correct File Extension.

The simplest thing the company can do is to check that the uploaded files have a .pdf file extension. This will prevent the accidental upload of other data by an innocent user but won’t stop an attacker from uploading an unsafe file that isn’t a PDF, as they only need to rename the file before uploading it. However, even if the file is executable malware, under normal circumstances it won’t execute. The system will use the file extension to determine how to open the file and will give it to a PDF application. If it’s not a valid PDF, the application will open the file and reject it. If the threat is low and the impact of any damage is low, this defense might be all that is required.

However, a file extension isn’t the only way the system can decide how to handle a file. A file could be referenced by some other data which determines how the file is used. If an attacker is smart, they might create an attack that uses two files – the first a PDF that contains a link to the second and the second a script with a PDF file extension. The link determines the second file is treated as a script, and a PDF file extension doesn’t stop that. The system allows both files to be uploaded, and when the first is opened the link is followed and the second file executes. This is not a straightforward attack to pull off, because the attacker must figure out how to make the link refer to the second file once it is uploaded, but there’s a risk they might find a way.

Confirm that the Data is Legitimate.

If the simple file extension defense leaves the risk too high, the company needs to implement something that offers stronger protection. The next step up is to check the data is a legitimate PDF, which can be done by opening the file with a PDF reader application. If this opens the file, it must be a valid PDF. An attacker can now only upload PDFs, not scripts impersonating them.

However, PDFs can contain scripts, which can run when the file is opened or when the user takes some action, such as moving their mouse. In theory, PDF applications restrict what these scripts can do and prevent them from performing unsafe actions, such as writing to a user’s file store. Yet in practice, the complexity of the scripting engines embedded in PDF readers makes it hard to ensure that a script won’t do something damaging. There is a risk the attacker will know of such flaw and be able to exploit it.

Inspect the File for Scripts.

If the company is not willing to take that risk, the defense can be enhanced to also inspect the PDF to make sure it contains no scripts. This will stop an attacker from exploiting any flaws in the scripting engine. However, it will also block any scripts used to automate forms in the PDF, which may stop the company’s application working. In that case, the checks will have to be refined to allow scripts the company uses and block others.

Checking for scripts sounds trivial, but a PDF is a complex structure, and it’s not obvious where scripts hide. There’s a risk the checker will fail to find certain scripts, and an attacker might discover this flaw and exploit it to deliver unsafe content into the network. There have been examples of small scripts being part of a link – this is sometimes called “fileless malware”. To reduce the risk further, the checker needs to look for other parts of the PDF that should not be present or are unusual, especially links, to make sure it can be safely opened.

The defense has now become quite complex, making it harder to implement, and harder to prove that it is effective. However, if the risk is high, this is necessary.

Implement Content Disarm and Reconstruction.

With a complex defense, another kind of risk is introduced. This is the risk that the defense itself will be attacked. This requires having to check the uploaded data in some detail, before the application receives it, and there could be a flaw in the way the checker handles unusual or invalid PDF data. An attacker might find a way to exploit such a flaw to gain control of the checker so they can access parts of the network without constraint.

To solve these concerns, the company can look at another approach – Everfox Content Disarm and Reconstruction (CDR). CDR works by extracting information that describes the incoming PDF file and then immediately discarding the original file that could potentially contain malware. The description is then used to build a completely new PDF file which can be safely brought into the protected network. This provides a better solution than the PDF checker, as no data is left unchecked, and even zero-day attacks are removed. This is the solution that the Everfox Gateway Extension (GX) appliance provides – it transforms all data passing through a web gateway to make sure it is safe to download.

Add a Data Break.

If the company determines that their network requires even higher security, Everfox CDR can still be used but implemented differently. By separating extracting the description from building the new PDF file into two different tasks, executed on two independent (virtual) machines, they can implement a data break.

Here, the first machine must handle data that is potentially malicious and is still at risk for attack. This machine would not be connected to the protected destination network, so to succeed, the attacker must attack the second machine. However, the process of building a new PDF is much simpler so this machine is much harder to attack. As a result, the attack surface has been significantly reduced. This is the solution the Everfox Information eXchange (iX) appliance provides – it allows email and web service traffic to pass between two networks, transforming all data to make sure it is safe.

In summary, I have outlined a range of approaches to the problem in varying protection levels and complexity. Which of these is right for a company depends on the threat they face and the risk they are prepared to take.

There is no single solution that is perfect for everyone.

Get in touch to find out more.

Maisie Eddleston

Software Engineer

Maisie Eddleston is a Software Engineer at Everfox based in the UK. Maisie holds a degree in Computer Science from Newcastle University, focusing on cryptography and cyber security. She had previously completed an internship with Everfox and since graduation joined the Everfox engineering team to focus on innovation development in hardware security and applications.