Malware analysis has always been a game of who knows what. A typical vendor will analyze a given sample and try to predict whether its harmful or not. Vendors will try to accumulate threat data from various sources to strength their ability to make high accurate predictions. These include, collecting malware samples, deploying honeypot to lure attackers in, etc. This will help them build the following
- Signature/hash database - Collection of known samples. Some companies go even further by using tools like ssdeep (https://ssdeep-project.github.io/ssdeep/index.html). This will help detect malware derivatives of known samples.
- Machine learning models - sophisticated models to predict an unknown sample.