{"id":979,"date":"2023-03-01T22:50:28","date_gmt":"2023-03-01T22:50:28","guid":{"rendered":"https:\/\/kpwalters.com\/?page_id=979"},"modified":"2023-08-14T14:35:02","modified_gmt":"2023-08-14T14:35:02","slug":"recursive-cte-project","status":"publish","type":"page","link":"https:\/\/kpwalters.com\/index.php\/data-analyst\/recursive-cte-project\/","title":{"rendered":"Recursive CTE project"},"content":{"rendered":"\n<div class=\"wp-block-uagb-container uagb-block-46e55046 alignfull uagb-is-root-container\">\n<div class=\"wp-block-uagb-advanced-heading uagb-block-721394ab\"><h5 class=\"uagb-heading-text\">Recursive CTE project<\/h5><\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-uagb-container uagb-block-8eff6e19 alignfull uagb-is-root-container\"><div class=\"uagb-container-inner-blocks-wrap\">\n<div class=\"wp-block-uagb-container uagb-block-21ed32c4\">\n<p class=\"has-text-align-left has-small-font-size\" style=\"margin-top:0px;margin-bottom:0px;padding-top:0px;padding-bottom:0px\"><em>October, 2022<\/em><br>&nbsp;<\/p>\n\n\n\n<p>Summary<br><a href=\"#recursive-cte-project_outcome\">Outcome<\/a><br><a href=\"#recursive-cte-project_project-steps\">Project Steps<\/a><\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-uagb-container uagb-block-3ae729b2\">\n<div class=\"wp-block-uagb-advanced-heading uagb-block-a120cff9\"><h5 class=\"uagb-heading-text\">Summary<\/h5><\/div>\n\n\n\n<p style=\"margin-top:0px;padding-top:0px\">The data from the UK Gender Pay Gap service contains a column called <code>siccodes<\/code> which is based on the UK SIC code classification system. In <a rel=\"noreferrer noopener\" href=\".\/uk-gender-pay-gap-project\" target=\"_blank\">another project in this portfolio<\/a>, I created a local database of that data.<br><br>Unfortunately, the <code>siccodes<\/code> column contains a variable length, comma-delimited list of SIC codes, making it impossible to join directly to any SIC code information.<\/p>\n\n\n\n<div class=\"wp-block-uagb-image uagb-block-42935d0a wp-block-uagb-image--layout-default wp-block-uagb-image--effect-static wp-block-uagb-image--align-none\"><figure class=\"wp-block-uagb-image__figure\"><a class=\"\" href=\"https:\/\/kpwalters.com\/wp-content\/uploads\/2023\/03\/gender_pay_gap-siccides-in-pgAdmin.png\" target=\"_blank\" rel=\"noreferrer noopener\"><img decoding=\"async\" srcset=\"http:\/\/kpwalters.com\/wp-content\/uploads\/2023\/03\/gender_pay_gap-siccides-in-pgAdmin.png \" sizes=\"(max-width: 480px) 150px\" src=\"http:\/\/kpwalters.com\/wp-content\/uploads\/2023\/03\/gender_pay_gap-siccides-in-pgAdmin.png\" alt=\"\" class=\"uag-image-1395\" width=\"915\" height=\"640\" title=\"\" loading=\"lazy\"\/><\/a><\/figure><\/div>\n\n\n\n<p>As you can see, above, some companies have multiple SIC codes\u2014and many have an invalid code \u201c1.\u201d<br><br>The solution to the problem is to create a recursive common table expression (CTE) to return one row for every SIC code (and omitting the bogus \u201c1s\u201d).<br><br>This project does that.<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-uagb-container uagb-block-012fe5d4\">\n<p class=\"has-text-align-right has-small-font-size\" style=\"margin-top:0px;margin-bottom:0;padding-top:0px;padding-bottom:0px;line-height:1.2\"><em>PostgreSQL<br>pgAdmin<\/em><\/p>\n<\/div>\n<\/div><\/div>\n\n\n\n<div class=\"wp-block-uagb-container uagb-block-1c726d62 alignfull uagb-is-root-container\"><div class=\"uagb-container-inner-blocks-wrap\">\n<div class=\"wp-block-uagb-container uagb-block-c999aa64\"><\/div>\n\n\n\n<div class=\"wp-block-uagb-container uagb-block-7a71675a\">\n<div class=\"wp-block-uagb-advanced-heading uagb-block-11f3ce0c\" id=\"recursive-cte-project_outcome\"><h5 class=\"uagb-heading-text\">Outcome<\/h5><\/div>\n\n\n\n<p style=\"margin-top:0px;padding-top:0px\">The final outcome of the recursive CTE is a view, <code>vw_gpg_sic_codes<\/code>. The view joins <code>gender_pay_gap<\/code> to <code>siccodes<\/code> on the SIC code data.<\/p>\n\n\n\n<div class=\"wp-block-uagb-image uagb-block-555f05bd wp-block-uagb-image--layout-default wp-block-uagb-image--effect-static wp-block-uagb-image--align-none\"><figure class=\"wp-block-uagb-image__figure\"><a class=\"\" href=\"https:\/\/kpwalters.com\/wp-content\/uploads\/2023\/03\/vw_gpg_sic_codes-1024x529.png\" target=\"_blank\" rel=\"noreferrer noopener\"><img decoding=\"async\" srcset=\"http:\/\/kpwalters.com\/wp-content\/uploads\/2023\/03\/vw_gpg_sic_codes-1024x529.png \" sizes=\"(max-width: 480px) 150px\" src=\"http:\/\/kpwalters.com\/wp-content\/uploads\/2023\/03\/vw_gpg_sic_codes-1024x529.png\" alt=\"\" class=\"uag-image-1369\" width=\"1024\" height=\"529\" title=\"\" loading=\"lazy\"\/><\/a><\/figure><\/div>\n\n\n\n<p>With the view created, I use Python code to do some basic analysis and visualizations of the Gender Pay Gap data based on SIC code.<\/p>\n\n\n\n<div class=\"wp-block-uagb-image uagb-block-c45b2b5a wp-block-uagb-image--layout-default wp-block-uagb-image--effect-static wp-block-uagb-image--align-none\"><figure class=\"wp-block-uagb-image__figure\"><a class=\"\" href=\"https:\/\/kpwalters.com\/wp-content\/uploads\/2023\/03\/gpg-sic-jupyter-notebook-1024x840.png\" target=\"_blank\" rel=\"noreferrer noopener\"><img decoding=\"async\" srcset=\"http:\/\/kpwalters.com\/wp-content\/uploads\/2023\/03\/gpg-sic-jupyter-notebook-1024x840.png \" sizes=\"(max-width: 480px) 150px\" src=\"http:\/\/kpwalters.com\/wp-content\/uploads\/2023\/03\/gpg-sic-jupyter-notebook-1024x840.png\" alt=\"\" class=\"uag-image-1375\" width=\"1024\" height=\"840\" title=\"\" loading=\"lazy\"\/><\/a><\/figure><\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-uagb-container uagb-block-c57e9f50\"><\/div>\n<\/div><\/div>\n\n\n\n<div class=\"wp-block-uagb-container uagb-block-39815b3a alignfull uagb-is-root-container\"><div class=\"uagb-container-inner-blocks-wrap\">\n<div class=\"wp-block-uagb-container uagb-block-be831d1a\"><\/div>\n\n\n\n<div class=\"wp-block-uagb-container uagb-block-843106bc\">\n<div class=\"wp-block-uagb-advanced-heading uagb-block-a250aaef\" id=\"recursive-cte-project_project-steps\"><h5 class=\"uagb-heading-text\">Project Steps<\/h5><\/div>\n\n\n\n<p style=\"margin-bottom:0px;padding-bottom:0px\"><strong>STEP ONE<\/strong>: Understand the basics of a recursive CTE.<\/p>\n\n\n\n<p style=\"margin-top:0px;padding-top:0px\">A recursive CTE has two parts, an initial query, often called the \u201canchor\u201d query, and the secondary query, often called the \u201crecursive\u201d query, which begins execution using the results of the anchor query and adds to it through recursion.<br><br>The execution can be graphically represented in a flowchart:<\/p>\n\n\n\n<div class=\"wp-block-uagb-image uagb-block-b5b565cd wp-block-uagb-image--layout-default wp-block-uagb-image--effect-static wp-block-uagb-image--align-none\"><figure class=\"wp-block-uagb-image__figure\"><img decoding=\"async\" srcset=\"http:\/\/kpwalters.com\/wp-content\/uploads\/2023\/03\/Recursive-CTE-683x1024.png \" sizes=\"(max-width: 480px) 150px\" src=\"http:\/\/kpwalters.com\/wp-content\/uploads\/2023\/03\/Recursive-CTE-683x1024.png\" alt=\"\" class=\"uag-image-1383\" width=\"683\" height=\"1024\" title=\"\" loading=\"lazy\"\/><\/figure><\/div>\n\n\n\n<p>The two parts of the CTE form a UNION query. The recursive query\u2019s FROM clause calls back to the CTE.<br><br>The following simple query demonstrates:<\/p>\n\n\n\n<div class=\"wp-block-uagb-image uagb-block-5a67d08e wp-block-uagb-image--layout-default wp-block-uagb-image--effect-static wp-block-uagb-image--align-none\"><figure class=\"wp-block-uagb-image__figure\"><a class=\"\" href=\".\/simple-recursive-cte\" target=\"_blank\" rel=\"noreferrer noopener\"><img decoding=\"async\" srcset=\"http:\/\/kpwalters.com\/wp-content\/uploads\/2023\/03\/pgAdmin.Recursive.Simple.ex01.png \" sizes=\"(max-width: 480px) 150px\" src=\"http:\/\/kpwalters.com\/wp-content\/uploads\/2023\/03\/pgAdmin.Recursive.Simple.ex01.png\" alt=\"\" class=\"uag-image-1385\" width=\"602\" height=\"653\" title=\"\" loading=\"lazy\"\/><\/a><figcaption class=\"uagb-image-caption\"><em>(Click image to open a new tab and access the SQL code.)<\/em> <\/figcaption><\/figure><\/div>\n\n\n\n<p style=\"margin-bottom:0px;padding-bottom:0px\"><strong>STEP TWO<\/strong>: Apply the concepts to the <code>gender_pay_gap<\/code> table.<\/p>\n\n\n\n<p style=\"margin-bottom:0px\">With a basic understanding of recursive CTEs, the following is the process for applying this to the <code>siccodes<\/code> field:<\/p>\n\n\n\n<ul style=\"margin-top:0px\">\n<li>Convert the comma-delimited <code>siccodes<\/code> field to a string array: <code>STRING_TO_ARRAY( siccodes, CHR(10) )<\/code>. The column we create with this SQL is <code>siccode_array<\/code>.<\/li>\n\n\n\n<li>Call back to the CTE (<code>gpg_siccode_recurs<\/code>) for each element in <code>siccode_array<\/code>. This is done by comparing the length of the array to the counter column, <code>counter<\/code>, and continuing as long as the number of elements in the <code>siccode_array<\/code> column are greater than the incrementing counter:<br><code style=\"font-size:small\">ARRAY_LENGTH( y.siccode_array, 1 ) &gt;= y.counter + 1<\/code> <\/li>\n<\/ul>\n\n\n\n<div class=\"wp-block-uagb-image uagb-block-ea6c697a wp-block-uagb-image--layout-default wp-block-uagb-image--effect-static wp-block-uagb-image--align-none\"><figure class=\"wp-block-uagb-image__figure\"><a class=\"\" href=\".\/siccodes-recursive-cte\" target=\"_blank\" rel=\"noreferrer noopener\"><img decoding=\"async\" srcset=\"http:\/\/kpwalters.com\/wp-content\/uploads\/2023\/03\/recursive-cte-in-pgamin.png \" sizes=\"(max-width: 480px) 150px\" src=\"http:\/\/kpwalters.com\/wp-content\/uploads\/2023\/03\/recursive-cte-in-pgamin.png\" alt=\"\" class=\"uag-image-1421\" width=\"833\" height=\"839\" title=\"\" loading=\"lazy\"\/><\/a><figcaption class=\"uagb-image-caption\"><em><em>(Click image to open a new tab and access the SQL code.)<\/em><\/em><\/figcaption><\/figure><\/div>\n\n\n\n<p style=\"margin-bottom:0px;padding-bottom:0px\"><strong>STEP THREE<\/strong>: <\/p>\n\n\n\n<p style=\"margin-top:0px;padding-top:0px\">With the view created, I can now use Python to create some visualizations based on the SIC information.<br><br><a href=\"https:\/\/github.com\/kpw1000\/Python-Jupyter-Notebooks\/blob\/main\/gpg-with-SICs-analysis.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">My Python code for this project is available via GitHub.<\/a><\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-uagb-container uagb-block-4c29be1c\"><\/div>\n<\/div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Recursive CTE project October, 2022&nbsp; SummaryOutcomeProject Steps Summary The data from the UK Gender Pay Gap service contains a column called siccodes which is based on the UK SIC code classification system. In another project in this portfolio, I created a local database of that data. Unfortunately, the siccodes column contains a variable length, comma-delimited &hellip;<\/p>\n<p class=\"read-more\"> <a class=\"\" href=\"https:\/\/kpwalters.com\/index.php\/data-analyst\/recursive-cte-project\/\"> <span class=\"screen-reader-text\">Recursive CTE project<\/span> Read More &raquo;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":931,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_uag_custom_page_level_css":"","site-sidebar-layout":"default","site-content-layout":"page-builder","ast-global-header-display":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"disabled","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"enabled","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","footnotes":""},"uagb_featured_image_src":{"full":false,"thumbnail":false,"medium":false,"medium_large":false,"large":false,"1536x1536":false,"2048x2048":false},"uagb_author_info":{"display_name":"kevin.p.walters@live.com","author_link":"https:\/\/kpwalters.com\/index.php\/author\/kevin-p-walterslive-com\/"},"uagb_comment_info":0,"uagb_excerpt":"Recursive CTE project October, 2022&nbsp; SummaryOutcomeProject Steps Summary The data from the UK Gender Pay Gap service contains a column called siccodes which is based on the UK SIC code classification system. In another project in this portfolio, I created a local database of that data. Unfortunately, the siccodes column contains a variable length, comma-delimited&hellip;","_links":{"self":[{"href":"https:\/\/kpwalters.com\/index.php\/wp-json\/wp\/v2\/pages\/979"}],"collection":[{"href":"https:\/\/kpwalters.com\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/kpwalters.com\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/kpwalters.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/kpwalters.com\/index.php\/wp-json\/wp\/v2\/comments?post=979"}],"version-history":[{"count":56,"href":"https:\/\/kpwalters.com\/index.php\/wp-json\/wp\/v2\/pages\/979\/revisions"}],"predecessor-version":[{"id":1728,"href":"https:\/\/kpwalters.com\/index.php\/wp-json\/wp\/v2\/pages\/979\/revisions\/1728"}],"up":[{"embeddable":true,"href":"https:\/\/kpwalters.com\/index.php\/wp-json\/wp\/v2\/pages\/931"}],"wp:attachment":[{"href":"https:\/\/kpwalters.com\/index.php\/wp-json\/wp\/v2\/media?parent=979"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}