Structured data is typically in tables with defined field names. In big data collection, normalizing the data structure is a key to having data that can then be extracted for multiple applications.
Merely having massive data in different formats is not reflective of a platform that can be easily scaled. Knowing which data fields are important provides the base table structures for normalization.
Examples of structured data include:
- RFID logs
- Search Indexes
- Shopping records from “best customer clubs” at supermarkets
- Readings from sensors
- GPS signals from cell phones showing who is calling whom
- GPS in cars to allow insurance companies to track speed and rate of braking by their insured for use in pricing
- Rides Uber drivers have given passengers is shared with the US government
- Criminal Records
- Toll-Booth readings
Marital status, number of children at home, first new child at home, number of children versus number of bathrooms, and when last child leaves home (empty nest) are events that could trigger a change in home ownership. The date mortgage was originated, mortgage balance versus home value, and property tax loans are important factors as well.
Unstructured data is not organized in a table or Excel format.
It includes:
- Photos
- Videos
- Social Media Posts - Such as Facebook and Twitter
- Text documents
- Data in reports - Such as appraisal reports that cant easily be retrieved
The internet is the ultimate source of unstructured big data.
Big data can be assembled from numerous sources:
- Computers in autos, airplanes and other machines monitor not only the machines’ performance and when repairs are needed; they also take readings on the people using them. How fast they are going, where they are going, even what they are saying can be recorded.
- Computers in the home can now monitor electricity consumption, homeowner traits, and video feeds to record the status of the home. This data can be fed back through huge power grids so utility companies can monitor and manage peak demand and when shortages may occur.
- While historically public company data was available to the masses, now thousands of companies monitor and evaluate even private companies data.
- Government data like sales taxes can be used to project revenue for private companies.
- Search engines like Google provide thousands and even millions of potential data points when searching companies, individuals or topics.