Data Discovery
Data Discovery (also known as Business Intelligence, or BI) is the process of identifying data needed for building Machine Learning models.
Processes
Generalized Data Pipeline processes can be expressed as follows:
Key Factors
Key factors for successful data discovery include:
Data Sources
Data sources often considered include:
existing internal databases
client and internal data not yet collected
commercially available data: such as from companies with these data focuses:
acxiom: consumer marketing
corelogic: housing
datalogix: consumer goods
datasift: social media
equifax: credit
experian: credit
facebook: analytics
google: analytics
idanalytics: risk and fraud
intelius: people and identity
iri: consumer behavior
nielsen: media
peekyou: internet activity
recordedfuture: security
towerdata: email
transunion: credit risk and fraud detection
twitter: social media
Objectives
Key objectives can include:
models: business goals, model accuracies
prediction objectives: prediction accuracy goals over time, individual prediction confidence level goals