28 Data Management, Governance, and Ethics
To be a successful people analytics professional you will need a solid foundation in data management. In fact, people analytics isn’t even possible without data management. Data management is the process of structuring, securing, and preserving data in order to maintain its usability, quality, and availability. From data production and collection through storage, retrieval, and eventual destruction, it encompasses any action that you might take to ensure the appropriate use of data. Data governance, data integration, data quality, data security, ethics, and data privacy are all aspects of data management as are any organizational processes or policies concerning data.
Because data management includes so many components, it might be easier to discuss how it fits into people analytics using an example. In this example I’ll highlight important data management components in bold text and make sure any key terms or items for consideration are italicized to draw attention to them.
Example: Imagine you work for a company that wants to boost employee satisfaction by analyzing data from an employee engagement survey and acting on the resulting insights. They have asked you to help make this happen.
Here are the many different data management aspects you may need to consider for this scenario: First you will need to define how you will measure engagement, then design a method for data creation to occur. Since the company already decided on a survey method, you will need to create and format questions that will provide valid and reliable measurements and insights relevant to engagement. You will also need to determine how and when to distribute the survey to employees and whether the information gathered will be kept anonymous or confidential. Before you can send the survey, you need everything in place to handle the secure collection, loading, and storage of this information. How and where will the data first be inputted? (Will it be an online survey? paper surveys? will it be administered by a 3rd party vendor or directly by you?) Where will it be stored once it has been gathered? (Will it be in a centralized database? a data repository? will it be locally hosted or stored on a cloud?) When making these decisions, you will need to plan for your expected data volume and data processing needs. Wherever the data is stored, access should be restricted to only authorized individuals and the storage and usage policies must adhere to data privacy laws, regulations, and standards. You may need to apply techniques like anonymization or de-identification to ensure privacy and you will need to ensure data security measures are in place and working as intended to prevent unauthorized access, breaches, or misuse of the data. Your role in protecting the data extends beyond just access, you will also be responsible for protecting the data against inappropriate use. Your data should only be used for approved purposes for which employee consent was obtained during the data collection process and it is your job to make sure it stays that way. When it’s time to use the data for its approved purpose, you will need to extract the information from its data storage location and load it into some kind of tool to be analyzed. Depending on the storage system and analytical tools used, this can range from simple downloads to more advanced data integration processes. It is rare that the format in which data is collected will be in a structure that is immediately ready for your use. You will likely have to transform it first. Sometimes this means changing the actual structure of the data. For example, if you want to use Microsoft Excel, then you will need the data to be structured and formatted into rows and columns or with comma separated values so that you can view it in a spreadsheet. You may also need to make changes to the data itself. For example, if you received responses on a Likert-type scale of options from “strongly agree” to “strongly disagree” you may need to recode those words to numerical values so you can quantitatively analyze them. Even with well structured and re-coded data, survey results almost always contain errors, missing values, or inconsistencies. So, you will need to ensure data quality through data validation and data cleaning processes that focus on assessing and addressing any issues regarding accuracy, completeness, validity, or reliability. Once the survey data is prepared and trustworthy, only then can you start to analyze and report your findings. But your data management journey doesn’t end after you share your findings, you will need to set and adhere to data retention and data deletion policies which dictate how long and in what ways the information can be stored, kept, or used. And, since engagement surveys aren’t usually done only one time, you will need to ensure that the company has regular and consistent data management procedures and standardized reporting for use when doing this and similar surveys in the future. In fact, the company will need to have defined data governance policies and strong data ethics for all activities involving people data. These policies should cover topics and define aspects such as roles, responsibilities, ownership, employee communication, training, remediation for data breaches, and more.
This scenario is meant only as an example, but it illustrates the point that even something as simple seeming as a survey requires a tremendous amount of thoughtful data governance (the setting of policies and procedures for how the data should be managed) and data management (the steps taken in order to follow those policies and procedures). The goal of data management is to provide the necessary frameworks and procedures for managing all stages of the data lifecycle and the many interconnected processes related to the effective and appropriate use of data. Data management is important for any type of work that involves data (which is probably everything these days!) but it is especially crucial in a field like people analytics where we deal with private and sensitive information and where quality is crucial given that the decisions made from the data affect people’s lives.
In my opinion, data management is the defining skill set that separates people analytics practitioners from people analytics leaders. Only when we are good stewards of people data can we be good stewards of the people analytics profession.
Explore & Engage:
Here are some strategies that can help you integrate more data management into your development journey.
- Read about data management policies, regulations, standards, and practices. In order to uphold good data management practices you have to be aware of them. In part 1 we talked about the importance of continually consuming information about people analytics. So, when you do, make sure to include articles and resources that cover data management practices. And be sure to stay up to date on data privacy and security laws, regulations, and standards so you can follow them appropriately before beginning any analytics project.
- Personal note: I used to find reading these things so boring, until I realized how important they were. Now you’ll find videos on my YouTube channel where you can see just how weirdly excited I am when there is some new standard, law, or regulation related to people analytics. I even became the global convener of the HR Data Privacy Standard for the International Organization for Standardization (ISO 30439). For me reading laws, regulations, and standards was a lot like learning statistics – super awkward and seems like gibberish at first, but once you get the hang of it you realize how powerful it is to possess the knowledge.
- Include data management professionals in your community. Build community with the individuals who own data management for the systems you use. They can tell you about the processes and measures in place to manage data. There are People Analytics experts with depth in data management, but I encourage you to seek out specialized data management professionals from outside the People Analytics field and learn from them as well.
- Include subject matter experts in your data management efforts. It’s not just data management professionals who can help you in your data management efforts. Those who are expert in the topic or subject matter that your data is focused on should be included as well. They will often know how to define and measure with more nuance. They will notice data quality issues no one else can. And, they can provide additional context to explain and interpret the outcomes of your analyses. If, for example, you are doing an analysis that includes data about employee salaries but you are not a compensation professional, invite them to the conversation. Even subject matter experts who have no data experience are able to identify issues or things in the data that seem inaccurate or out of place.
- Start with one data set and work out from there. As you saw in our example, there are so many pieces to the data management puzzle. I recommend working on building your skills by finding one data example that you can work through from start to finish just as I did in our engagement survey example above. Pick a topic and type of data that you are most interested in or one that you might work with and walk through all of the topics bolded or italicized above. Some of the questions you can seek answers to are: How is it measured and defined? What is the collection method? How does the technology or approach of that collection method work? How is it stored? For how long? Who can access it? How can it be extracted? In what kind of format? What kind of transformation or cleaning needs to be done? How can it be assessed for accuracy, validity, and reliability? What processes and policies are in place for how it can be used and reported on? What are the privacy and security controls put in place from the beginning to the end of its data lifecycle? If you can work through and find answers to each of these questions you will be building your data management skills.
Data management is the process of structuring, securing, and preserving data in order to maintain its usability, quality, and availability.