Hunting malware has largely been about a specific vendor. Symantec, Lookout and the likes develop different engines capable of detecting malware samples. They do this either by looking up the hash of the file on their database or detonating it in a secure VM and observing the results.

This approach forces enterprises to select the “best” AV vendor on the market which in-turn locks the company with the product for the duration of their subscription. A company like VirusTotal attempts to solve this problem by aggregating results from different vendors.

Polyswarm is a decentralized threat detection platform that involves industry and participating parties including security experts, consumers & vendors; to name a few. It’s the first decentralized threat intelligence market made possible by Etherum smart contracts and blockchain technology. Experts develop “micro-engines” that compete in the market by investigating artifacts for malicious behavior.

The market consists of

  1. End Users
  2. Ambassadors
  3. Security Experts / AV vendors
  4. Arbiters

End users participate by submitting artifact samples to the market. They attach bounties as an incentive for market participants.

Ambassadors introduce bounties and offers into the market on their clients’ behalf. It is the ambassador’s responsibility to distill the Assertions of various Experts into a simple malicious or benign Verdict that they deliver to their clients .

Security experts take these samples and run them through connected micro engines for analysis & assertion. Upon completion they assert their result to the market. If correctly classified, they get a portion of the bounty pot, based on how many engines were correct. (NCT nectar)

Arbiters offer ground truth regarding the malintent of the artifact.

PolySwarm market participants & flow

Micro Engine

Micro-engines are responsible for scanning samples and providing a verdict. There are multiple engines in the marketplace specializing in different areas of infosec. Currently, PolySwarm accepts two types of samples/input; URL & File. A micro engine can either participate or pass on a given sample. E.g. If an engine is not confident about its assertion, it can skip and wait for another bounty to arrive. This dynamic allows engines to bet on samples based on their confidence level/technology.

In this article we will build a very simple microengine that can detect Metasploit based malwares. Metasploit is a framework for pen-testers to evaluate applications for various security related benchmarks. It also allows attackers to remotely “control” an app and pull in sensitive data.

Polyswarm is constantly evolving, hence, the steps discussed below might change. Refer https://docs.polyswarm.io for the latest

Start by installing the following prerequisites.

  1. Docker
  2. Python, pip & Virtual Environment (virtualenv)
  3. Cookiecutter https://github.com/cookiecutter/cookiecutter

Polyswarm has created a template from which developers can start building their engines. Using cookiecutter we can setup a basic working micro-engine capable of receiving samples from the network and making a verdict.

Run the command below and answer the questions as they come. They are mostly self explanatory.

cookiecutter https://github.com/polyswarm/participant-template

participant_type - Can be microengine or arbiter. In this case, we are interested in building a scanning engine, hence choose 1 (microengine)

microengine_arbiter_supports_scanning_files - Choose true. Since our engine makes verdict on files

microengine_arbiter_supports_scanning_urls - Choose false. We will not operate on URLs. This will be a good use-case for engines/AV vendors that analyze phishing URLs.

microengine_arbiter__scan_mode - There are two options; Sync & Async. Choose sync for now.

Create microengine

A new directory will be created with the name of your engine that contains multiple files generated by cookiecutter. The most important files reside inside the directory. Look for a file named <participant_name_slung>.py. It contains three classes. BidStrategy, Scanner & <participant_name_slung>. BidStrategy is responsible for staking NCT token. This will be our bidding price on the market. Scanner is responsible for setting up what’s needed for our detection to work. It receives artifacts from the PolySwarm network and makes it available for the <participant_name_slung> class to handle. Specifically, PolySwarm doesn’t send artifacts to engines, a “signed URL” in the bounty will be sent, where the engine can perform a download action to get the artifact. Finally, our detection logic returns a ScanResult object that represents our decision. Based on our analysis, we can either skip or stake our assertion with a certain confidence level.

In this example, our engine detection relies on yara rule that matches apps injected with metasploit framework. We have two rules. The first one looks for Payload classes while the second one looks for methods that are used by metasploit.

rule metasploit 
{
    meta:
        description = "This rule detects apps made with metasploit framework"
        sample = "cb9a217032620c63b85a58dde0f9493f69e4bda1e12b180047407c15ee491b41"

    strings:
        $a = "*Lcom/metasploit/stage/PayloadTrustManager;"
        $b = "(com.metasploit.stage.PayloadTrustManager"
        $c = "Lcom/metasploit/stage/Payload$1;"
        $d = "Lcom/metasploit/stage/Payload;"

    condition:
        all of them

}

rule metasploit_obsfuscated
{
    meta:
        description = "This rule tries to detect apps made with metasploit framework but with the paths changed"

    strings:
        $a = "currentDir"
        $b = "path"
        $c = "timeouts"
        $d = "sessionExpiry"
        $e = "commTimeout"
        $f = "retryTotal"
        $g = "retryWait"
        $h = "payloadStart"
        $i = "readAndRunStage"
        $j = "runStageFromHTTP"
        $k = "useFor"


    condition:
        all of them

}

The details of the implementation can be found here, https://github.com/atlantis0/polyswarm-metasploit-microengine/blob/master/addis_ababa/addis.py#L169

The code checks if the incoming artifact is indeed an APK and proceeds to run the yara command to check if any of the classes.dex files match the two rules we declared above.

How artifacts are routed

In the current architecture, a websocket is opened between the engine and the sidechain bounty manager “polyswarmd”. Each bounty is sent to each engine via the websocket. The bounty information is in a JSON. One of the fields in that JSON is a signed URL where the engine can submit a request to download the artifact.

Bid Strategy

Bidding is an important concept in Polyswarm’s decentralized market. When artifacts are submitted, they will be assigned a minimum & maximum bid amount. Micro-engines are supposed to select a value somewhere in the middle based on different factors. E.g. An engine might bid higher (closer to max bid) if it thinks it has high confidence with its assertion, or might bid lower if the cost of analysis consumes high CPU cycles making infrastructure cost not feasible for the given price range.

Ground Truth & Bounty Grants

It is assumed Arbiters have enough threat intel to have the final say or competent enough to judge if a sample is malicious or not. Obviously, they can do their own analysis. Polyswarm is tight lipped about how they onboard these players . You & I can’t be arbiters. Engines provide proof of work by setting meta-data / malware family. It’s a simple json file containing your “proof” on why you think it’s malicious or clean. Furthermore, engine participants can challenge Arbiters by submitting a blog post or technical paper. It remains to be seen how this can be automated when participants are processing huge number of artifacts per day.

Bounty grant flow

In the above example, Engine A correctly predicts while B & C made the wrong assertion. Engine D abstained and did not make any decision. Hence, B &C will lose 3 & 2 NCT respectively. Their loss will be disbursed to Engine A. If multiple engines predict correctly, the amount of NCT awarded is proportional to the Expert’s Bid amount relative to the total pool of accurate Bids.

Polyswarm is also introducing two additional states for asserting a sample. Hence, we have, unknown, suspicious, benign & malicious

Testing & Integration

Before publishing our engine, we have to test our implementation. Unit tests validate if our engine is catching malwares and skipping benign samples. When creating our project using polyswarm-client-template, it creates basic tests located inside the test folder. Running it is as simple as building the docker image and running the container with docker compose command

$ docker build -t basename ${PWD} -f docker/Dockerfile .
$ docker-compose -f docker/test-unit.yml up
Production

Finally, we need to publish our engine to a production environment so we can start accepting artifacts. It’s up-to the developers to maintain, test & deploy their work. I typically work with GCP and will spin up a dedicated compute instance. Before publishing, we will need few more things

  1. ETH account
  2. NCT token
  3. Production Keyfile

To participate in the marketplace, we need Polyswarm’s NCT token (the currency we are bidding with). Start by purchasing ETH from your favorite wallet app. It’s critical that you have access to your keypairs (public/private) keys so we can package it with our engine. Purchasing ETH & NCT is outside the scope of this article, but, you can follow a detailed explanation here, https://blog.polyswarm.io/how-to-buy-polyswarm-nectar-using-uniswap

Once you have your account in place, its time to generate an encrypted keyfile. We will utilize geth command to import an existing private key into our local machine. The file (private.bin) is an unencrypted private key as canonical EC raw bytes encoded into hex. I was able to obtain this from my favorite crypto wallet app metamask.io. You will be prompted to enter a passphrase/password. Memorize this password and never disclose it to any other party except you and your engine. Polyswarm is working on a feature to offload transaction signing process to another device thereby providing a more secure way of handling keys.

geth account import private.bin 

But for now, running the above command will generate a keyfile (json file). You can get the file path by running the command geth account list. Copy this file into your engine repository and update your docker/marketplace.env file. This file needs three variables; Keyfile, passphrase & API key. The first two you already obtained using the above command. You can obtain an API key by signing up for Polyswarm here, https://polyswarm.network/login

Polyswarm white paper
Engine source code