{"id":971,"date":"2023-03-01T22:40:22","date_gmt":"2023-03-01T22:40:22","guid":{"rendered":"https:\/\/kpwalters.com\/?page_id=971"},"modified":"2023-08-14T14:32:04","modified_gmt":"2023-08-14T14:32:04","slug":"uk-gender-pay-gap-project","status":"publish","type":"page","link":"https:\/\/kpwalters.com\/index.php\/data-analyst\/uk-gender-pay-gap-project\/","title":{"rendered":"UK Gender Pay Gap project"},"content":{"rendered":"\n<div class=\"wp-block-uagb-container uagb-block-a5d035ed alignfull uagb-is-root-container\">\n<div class=\"wp-block-uagb-advanced-heading uagb-block-cdc5552f\"><h5 class=\"uagb-heading-text\">UK Gender Pay Gap project<\/h5><\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-uagb-container uagb-block-0284624a alignfull uagb-is-root-container\"><div class=\"uagb-container-inner-blocks-wrap\">\n<div class=\"wp-block-uagb-container uagb-block-d22da3d2\">\n<p class=\"has-text-align-left has-small-font-size\" style=\"margin-top:0px;margin-bottom:0px;padding-top:0px;padding-bottom:0px\"><em>February 4, 2023<\/em><br>&nbsp;<\/p>\n\n\n\n<p>Summary<br><a href=\"#uk-gender-pay-gap-project_outcome\">Outcome<\/a><br><a href=\"#uk-gender-pay-gap-project_project-steps\">Project steps<\/a><\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-uagb-container uagb-block-4b2efce9\">\n<div class=\"wp-block-uagb-advanced-heading uagb-block-5660eea4\"><h5 class=\"uagb-heading-text\">Summary<\/h5><\/div>\n\n\n\n<p style=\"margin-top:0px;padding-top:0px\">This project involved creating a local PostgreSQL database of data from the <a rel=\"noreferrer noopener\" href=\"https:\/\/gender-pay-gap.service.gov.uk\/\" target=\"_blank\">UK Gender Pay Gap<\/a> service. Particularly, I wanted a local database so that I could demonstrate a recursive common table expression as a means of relating employer info from the service\u2019s data with industry designations based on <a rel=\"noreferrer noopener\" href=\"https:\/\/www.ons.gov.uk\/methodology\/classificationsandstandards\/ukstandardindustrialclassificationofeconomicactivities\/uksic2007\" target=\"_blank\">UK SIC<\/a> codes.<br><br>In this project, I used Python to create a truncated version of the data and then used Excel to identify data types and create the SQL needed to build the database. I also used Excel to create CSV files for the SIC code tables. I had already installed PostgreSQL Server for another project, so that wasn\u2019t part of this project. For the final part of the project, I used pgAdmin to connect to my PostgreSQL Server and import the data from the UK Gender Pay Gap service.<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-uagb-container uagb-block-b52b9201\">\n<p class=\"has-text-align-right has-small-font-size\" style=\"margin-top:0px;margin-bottom:0;padding-top:0px;padding-bottom:0px;line-height:1.2\"><em>Python<br>Excel<br>PostgreSQL Server<br>pgAdmin<\/em><\/p>\n<\/div>\n<\/div><\/div>\n\n\n\n<div class=\"wp-block-uagb-container uagb-block-97538485 alignfull uagb-is-root-container\"><div class=\"uagb-container-inner-blocks-wrap\">\n<div class=\"wp-block-uagb-container uagb-block-023e5b13\"><\/div>\n\n\n\n<div class=\"wp-block-uagb-container uagb-block-098729d0\">\n<div class=\"wp-block-uagb-advanced-heading uagb-block-bb793c8d\" id=\"uk-gender-pay-gap-project_outcome\"><h5 class=\"uagb-heading-text\">Outcome<\/h5><\/div>\n\n\n\n<p style=\"margin-top:0px;padding-top:0px\">The project provide me with in a PostgreSQL database of 9 tables and a view.<\/p>\n\n\n\n<ul>\n<li><strong>employersizes<\/strong>: A lookup table to permit sorting when reporting the employersize field.<\/li>\n\n\n\n<li><strong>gender_pay_gap<\/strong>: The data from the <a href=\"https:\/\/gender-pay-gap.service.gov.uk\/viewing\/download\" target=\"_blank\" rel=\"noreferrer noopener\">UK Gender Pay Gap<\/a> service.<\/li>\n\n\n\n<li><strong>sicactivities<\/strong>: A list of commercial activities and their associated UK SIC codes.<\/li>\n\n\n\n<li><strong>sicclasses<\/strong>: A list of the items at the fourth level of SIC classification hierarchy.<\/li>\n\n\n\n<li><strong>siccodes<\/strong>: A list of all SIC codes used in the SIC classification hierarchy.<\/li>\n\n\n\n<li><strong>sicdivisions<\/strong>: A list of the items at the second level of SIC classification hierarchy.<\/li>\n\n\n\n<li><strong>sicgroups<\/strong>: A list of the items at the third level of the SIC classification hierarchy.<\/li>\n\n\n\n<li><strong>sicsections<\/strong>: A list of the items at the top level of the SIC classification hierarchy.<\/li>\n\n\n\n<li><strong>sicsubsclasses<\/strong>: A list of the items at the fifth, or bottom, level of the SIC classification hierarchy.<\/li>\n\n\n\n<li><strong>vw_gpg_sic_codes<\/strong>: A view that facilitates joining the gender_pay_gap table with the siccodes table. Based on a recursive CTE.<\/li>\n<\/ul>\n<\/div>\n\n\n\n<div class=\"wp-block-uagb-container uagb-block-085e4903\">\n<div class=\"wp-block-uagb-image uagb-block-20e48ae9 wp-block-uagb-image--layout-default wp-block-uagb-image--effect-static wp-block-uagb-image--align-none\"><figure class=\"wp-block-uagb-image__figure\"><a class=\"\" href=\"http:\/\/kpwalters.com\/wp-content\/uploads\/2023\/03\/pgAdmin.UK-Gender-Pay-Gap-tables.png\" target=\"_blank\" rel=\"noreferrer noopener\"><img decoding=\"async\" srcset=\"https:\/\/kpwalters.com\/wp-content\/uploads\/2023\/03\/pgAdmin.UK-Gender-Pay-Gap-tables.png \" sizes=\"(max-width: 480px) 150px\" src=\"https:\/\/kpwalters.com\/wp-content\/uploads\/2023\/03\/pgAdmin.UK-Gender-Pay-Gap-tables.png\" alt=\"\" class=\"uag-image-1317\" width=\"398\" height=\"659\" title=\"\" loading=\"lazy\"\/><\/a><\/figure><\/div>\n\n\n\n<div class=\"wp-block-uagb-container uagb-block-3980fb84\"><\/div>\n<\/div>\n<\/div><\/div>\n\n\n\n<div class=\"wp-block-uagb-container uagb-block-f9e12530 alignfull uagb-is-root-container\"><div class=\"uagb-container-inner-blocks-wrap\">\n<div class=\"wp-block-uagb-container uagb-block-20c66cd3\"><\/div>\n\n\n\n<div class=\"wp-block-uagb-container uagb-block-765f0b19\">\n<div class=\"wp-block-uagb-advanced-heading uagb-block-ac5e56f2\" id=\"uk-gender-pay-gap-project_project-steps\"><h5 class=\"uagb-heading-text\">Project steps<\/h5><\/div>\n\n\n\n<ol>\n<li>Download the data.<br><br>The UK Gender Pay Gap data can be found here:<br><a rel=\"noreferrer noopener\" href=\"https:\/\/gender-pay-gap.service.gov.uk\/viewing\/download\" target=\"_blank\">https:\/\/gender-pay-gap.service.gov.uk\/viewing\/download<\/a><br><br>I downloaded all six (as of February 4, 2023) files.<br><br>The UK SIC data can be found here:<br><a href=\"https:\/\/www.sec.gov\/corpfin\/division-of-corporation-finance-standard-industrial-classification-sic-code-list\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.sec.gov\/corpfin\/division-of-corporation-finance-standard-industrial-classification-sic-code-list<\/a><br>&nbsp;<\/li>\n\n\n\n<li><a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/kpw1000\/Python-Jupyter-Notebooks\/blob\/main\/csv_prep_for_import_to_postgreSQL.ipynb\" target=\"_blank\">Using Python, create a truncated CSV file of the gender pay gap data.<\/a><br><br>Technically, this step was unnecessary. The purpose of the step was to use Excel to identify data types and create SQL to create the gender pay gap table in pgAdmin. However, none of the original files from the UK Gender Pay Gap service was so large that Excel would have been bogged down. (On my laptop, Excel gets bogged down with files in excess of about 50 MB or so.)<br><br>However, I had written Python code for another project which involved millions of rows of data, so I simply applied it here.<br>&nbsp;<\/li>\n\n\n\n<li><a href=\"http:\/\/kpwalters.com\/wp-content\/uploads\/2023\/03\/Excel-Power-Query-import-CSV.mp4\" target=\"_blank\" rel=\"noreferrer noopener\">Import the truncated CSV file into Excel<\/a>.<br><br>I used Power Query to control how the data types appeared. I chose to import all the columns as text, so that I could see the data in its original form and make my own judgments about data type.<br>&nbsp;<\/li>\n\n\n\n<li>Write <a href=\".\/sql-gender_pay_gap-create\" target=\"_blank\" rel=\"noreferrer noopener\">SQL for the gender pay gap table<\/a>.<br><br>I used Excel to expedite the creation process, minimize typos and similar. You can <a download=\"\" href=\"http:\/\/kpwalters.com\/wp-content\/uploads\/2023\/03\/UK-Gender-Pay-Gap-import-prep.xlsx\">see this here<\/a>.<br><br>Of note, the gender pay gap data has columns of numeric data in which many rows contain empty values. I thought that the simplest method of importing the data into pgAdmin was to create the columns as text first, import the data, update the empty strings to null values, then alter the table to change the columns to numeric data. I did so as step six.<br><br>Also of note, the gender pay gap data does not have a unique identifier that we can use across multiple years, so the CREATE TABLE sql creates a sequence object, declares a primary key (\u201cGPGKey\u201d) and uses the sequence object to fill in unique values.<br>&nbsp;<\/li>\n\n\n\n<li>Create the gender pay gap table and import the CSVs.<br><br>Run the CREATE TABLE sql in the query window and then use the <a rel=\"noreferrer noopener\" href=\"http:\/\/kpwalters.com\/wp-content\/uploads\/2023\/03\/pgAdmin-import-gender_pay_gap.mp4\" target=\"_blank\">pgAdmin Import\/Export feature<\/a> to import the CSV files, one at a time. During this import, with each CSV file, we need to remove the GPGKey, as it is not in the source data. (See notes under Step Four above.)<br>&nbsp;<\/li>\n\n\n\n<li><a rel=\"noreferrer noopener\" href=\".\/sql-gender_pay_gap-alter\" target=\"_blank\">Modify the gender pay gap table<\/a>.<br><br>See the note for step four, above: after completing the import for all six CSV files, we can update the empty strings to null and alter the table to change the numeric columns from data type varchar to data type numeric. <br>&nbsp;&nbsp;<\/li>\n\n\n\n<li><a href=\"http:\/\/kpwalters.com\/wp-content\/uploads\/2023\/03\/SIC-Codes.UK_.xlsx\" download=\"\">Use Excel, to create CSV files for the UK SIC data.<\/a><br> <br>The UK SIC classification hierarchy required manipulation in order to properly normalize the data and create CSV files. All of the work is visible in this Excel file.<br>&nbsp;<\/li>\n\n\n\n<li>Create the SIC tables and import the CSVs.<br><br>This step is akin to Step Five: we run the SQL commands that create the seven SIC tables, then use pgAdmin\u2019s Import\/Export feature to import the CSV files created in Step Seven.<br>&nbsp;<\/li>\n\n\n\n<li>Create the <strong>vw_gpg_sic_codes<\/strong> view.<br><br>This step is covered as a separate project. See \u201c<a href=\".\/recursive-cte-project\">Recursive CTE Project<\/a>.\u201d<br>&nbsp;<\/li>\n\n\n\n<li>Run SQL to <a href=\".\/sql-employersizes-create\" target=\"_blank\" rel=\"noreferrer noopener\">create and populate the employersizes<\/a> table.<br><br>This is a simple table, and we can use SQL that creates the table and fills in the values.<\/li>\n<\/ol>\n<\/div>\n\n\n\n<div class=\"wp-block-uagb-container uagb-block-1de09cbc\"><\/div>\n<\/div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>UK Gender Pay Gap project February 4, 2023&nbsp; SummaryOutcomeProject steps Summary This project involved creating a local PostgreSQL database of data from the UK Gender Pay Gap service. Particularly, I wanted a local database so that I could demonstrate a recursive common table expression as a means of relating employer info from the service\u2019s data &hellip;<\/p>\n<p class=\"read-more\"> <a class=\"\" href=\"https:\/\/kpwalters.com\/index.php\/data-analyst\/uk-gender-pay-gap-project\/\"> <span class=\"screen-reader-text\">UK Gender Pay Gap project<\/span> Read More &raquo;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":931,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_uag_custom_page_level_css":"","site-sidebar-layout":"default","site-content-layout":"page-builder","ast-global-header-display":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"disabled","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"enabled","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","footnotes":""},"uagb_featured_image_src":{"full":false,"thumbnail":false,"medium":false,"medium_large":false,"large":false,"1536x1536":false,"2048x2048":false},"uagb_author_info":{"display_name":"kevin.p.walters@live.com","author_link":"https:\/\/kpwalters.com\/index.php\/author\/kevin-p-walterslive-com\/"},"uagb_comment_info":0,"uagb_excerpt":"UK Gender Pay Gap project February 4, 2023&nbsp; SummaryOutcomeProject steps Summary This project involved creating a local PostgreSQL database of data from the UK Gender Pay Gap service. Particularly, I wanted a local database so that I could demonstrate a recursive common table expression as a means of relating employer info from the service\u2019s data&hellip;","_links":{"self":[{"href":"https:\/\/kpwalters.com\/index.php\/wp-json\/wp\/v2\/pages\/971"}],"collection":[{"href":"https:\/\/kpwalters.com\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/kpwalters.com\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/kpwalters.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/kpwalters.com\/index.php\/wp-json\/wp\/v2\/comments?post=971"}],"version-history":[{"count":52,"href":"https:\/\/kpwalters.com\/index.php\/wp-json\/wp\/v2\/pages\/971\/revisions"}],"predecessor-version":[{"id":1726,"href":"https:\/\/kpwalters.com\/index.php\/wp-json\/wp\/v2\/pages\/971\/revisions\/1726"}],"up":[{"embeddable":true,"href":"https:\/\/kpwalters.com\/index.php\/wp-json\/wp\/v2\/pages\/931"}],"wp:attachment":[{"href":"https:\/\/kpwalters.com\/index.php\/wp-json\/wp\/v2\/media?parent=971"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}