Data Storage and Processing

Data Storage and Data Processing is one of the processes, which can be an expensive goal to be set in computer science, apart from being somewhat complex and thoughtful processes. In both of these steps — data storage, and data processing — you require the juice out of computer machine.

Data Storage

The data storage is the process of storing the data on computer's hard drive, or any other storage medium. In computer machine, your CPU requires to follow the instruction, process the information and present you with the results. All of this is data, and you need to properly, take care of it. There are several ways to store the data, and since we have talked about the instructions, information and data I believe we can understand data storage by starting with them. The levels of data storage in a computer machine start at,

  1. CPU registers
    • They are fast, small-size memory locations where you can place the data, which the CPU requires.
    • They have small sizes for the data location, each CPU has a different size for them.
    • They contain the data that CPU requires to process.
    • As the name suggests, they are a part of the CPU itself.
  2. RAM — Random access memory
    • RAM is slower than the registers, and not a part of CPU.
    • RAM provides fast access to the data blocks.
    • RAM has blocks, and word sizes for the data storage.
    • CPU stores related, but extra data related to the programs in the RAM for fast access.
  3. Hard drive, optical drives, etc.
    • Finally, rest of the data storage medium are counted and they provide a large amount of storage for the data.
    • They provide slow, yet large amount of data storage capabilities.
    • Hard disk, Flash drives, Solid-state drive, or optical drives etc are all the types of tertiary storage medium.

The hierarchy follows the storage, price and speed ratio.

Typically, data storage medium is attached to the computer machine and comes shipped with your device when you purchase a new machine, laptop or a computer system. Your software to run the device is installed and default storage is used for that, which is your hard drive or a solid-state drive in most latest editions of computers. The concepts of data storage, techniques and other information will be shared in the chapters for this category.

Other techniques for data storage include, but are not limited to, web storage, offline storage, cloud storage, as well. All of these storage medium have their own benefits and downsides but the cloud storage providers are the most widely used now and other storage mediums are being replaced due to the certain concerns about performance, speed, efficiency and life-time of data storage. In this, a cloud storage provider provides the data storage medium where you upload your content to their environment and they manage and store your data. Cloud storage depends entirely upon the network and the amount of budget you are having. Sometimes, a cloud storage provider supports high bandwidth data transfer and upto 1 terabytes of data storage for your personal use, which comes at a very cheaper cost as compared to purchasing your own device and the price of maintenance associated with it.

Some common problems in data storage are,

  1. Latency
  2. Availability — as in the cloud storage providers
  3. Redundancy
  4. Structure
  5. Replication

There are several other concepts in this but I would leave them for the chapters portion and you can continue to read about it, there.

Data Processing

Data Storage was the technique for storing the data, whereas data processing is the process in which you process the data, to get the results from it. Raw data — data that comes from sources, and is not properly handled — is fed into the computer machine and CPU runs some algorithms on the data to generate reports for your data. Although not as much broader topic as data storage, data processing still comprises of a few steps, special steps that a computer program takes for the users in order to provide them with the results. The steps are often as listed below, but can change depending on the requirement of the solution, algorithm or the programmer,

  1. Data collection
  2. Data validation
  3. Data analysis — which often contains steps such as,
    • Transposing the data
    • Filling the data with default values
    • Removing redundant data
    • Generating dynamic data
  4. Producing reports

All of these steps are performed in order to generate the reports for raw data. Data comes in all shapes and sizes, and thus a software program is created to execute on the data sets and produce the reports and visuals for the users to analyze. They can be simple programs, all the way to complex software applications such as Microsoft Excel or even programming languages, such as R or Python (which I prefer), to write the scripts for data processing.

There are many software suites developed especially for data processing, such as Power BI.

This category is dedicated to data storage and processing techniques, concepts and explanations with proper demonstration using tools and code samples that you can get in the drilled down chapters and topics.

Sorry, no chapters in this category.