

{"id":14626,"date":"2019-03-25T07:28:00","date_gmt":"2019-03-25T05:28:00","guid":{"rendered":"http:\/\/catalinx.ro\/wordpress\/?p=14626"},"modified":"2019-03-26T14:47:06","modified_gmt":"2019-03-26T14:47:06","slug":"excelar-hardcore","status":"publish","type":"post","link":"https:\/\/catalinx.ro\/wordpress\/?p=14626","title":{"rendered":"Excelar Hardcore"},"content":{"rendered":"<div id=\"fb-root\"><\/div>\n\n<p>       Un tip cu skillsetul si ocupatiunile mele probabil ca nu s-ar recomanda ca un excelar. Orice tip care poate programa in ceva limbaje ar zice ca excelul e de cacat ca pentru analiza\/prelucrarea de date ar trebui sa folosim alte tooluri mai bune, macar access, MySQL, daca nu matlab\u0103 si alte figuri .<\/p>\n\n\n\n<p>   Din experienta mea, proramatorii urasc excelul si powerpointu&#8217;. <\/p>\n\n\n\n<p> Eu sunt un excelar hardcore.  Pentru ca daca e vorba de analiza de date , corectie and <em>sheet <\/em> excelul e cel mai rapid, cel mai flexibil, cel mai vizual si sa imi bag pula in de ce nu exista alte tool-uri la fel de bune . Exista, fac parte din pachete de softuri pentru Business Intelligence sau Business Warehouse.   Sau chestii gen <a href=\"https:\/\/www.dataiku.com\/dss\/trynow\/windows\/\">Dataiku<\/a>. <\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"674\" height=\"600\" data-attachment-id=\"14630\" data-permalink=\"https:\/\/catalinx.ro\/wordpress\/?attachment_id=14630#main\" data-orig-file=\"https:\/\/catalinx.ro\/wordpress\/wp-content\/uploads\/2019\/03\/ExcelarHartdcore.png\" data-orig-size=\"674,600\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"ExcelarHartdcore\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/catalinx.ro\/wordpress\/wp-content\/uploads\/2019\/03\/ExcelarHartdcore-300x267.png\" data-large-file=\"https:\/\/catalinx.ro\/wordpress\/wp-content\/uploads\/2019\/03\/ExcelarHartdcore.png\" src=\"http:\/\/catalinx.ro\/wordpress\/wp-content\/uploads\/2019\/03\/ExcelarHartdcore.png\" alt=\"\" class=\"wp-image-14630\" srcset=\"https:\/\/catalinx.ro\/wordpress\/wp-content\/uploads\/2019\/03\/ExcelarHartdcore.png 674w, https:\/\/catalinx.ro\/wordpress\/wp-content\/uploads\/2019\/03\/ExcelarHartdcore-300x267.png 300w\" sizes=\"auto, (max-width: 674px) 100vw, 674px\" \/><\/figure>\n\n\n\n<p>Weekendul asta am inregistrat recordul de rulare pentru un excel  26 de ore.  Da, mi-am dat seama ca nu rulez optim niste scripturi de  calcul de distante Levenshtein pe niste dictionare care duceau la calculul a vreo 350 milioane de distante pentru 70 000 de cuvinte, chestie nu chiar necesara. Pe care am optiizat-o cand s-a terminat scriptul si mi-am dat seama ca am uitat sa fac un cacat de conversie catre upper si toate concluziile mele erau oarecum gresite, cu circa 50% . <\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>Gen &#8211; asta e distanta, copiata  de pe ceva sourceforge, nu mai stiu exact. <\/p>\n\n\n\n<p>Iar asta-i contextul .<\/p>\n\n\n<p>[code language=&#8221;vb&#8221;]<br \/>\nFunction Levenshtein(ByVal string1 As String, ByVal string2 As String) As Long<\/p>\n<p>Dim i As Long, j As Long<br \/>\nDim string1_length As Long<br \/>\nDim string2_length As Long<br \/>\nDim distance() As Long<\/p>\n<p>string1_length = Len(string1)<br \/>\nstring2_length = Len(string2)<br \/>\nReDim distance(string1_length, string2_length)<\/p>\n<p>For i = 0 To string1_length<br \/>\ndistance(i, 0) = i<br \/>\nNext<\/p>\n<p>For j = 0 To string2_length<br \/>\ndistance(0, j) = j<br \/>\nNext<\/p>\n<p>For i = 1 To string1_length<br \/>\nFor j = 1 To string2_length<br \/>\nIf Asc(Mid$(string1, i, 1)) = Asc(Mid$(string2, j, 1)) Then<br \/>\ndistance(i, j) = distance(i &#8211; 1, j &#8211; 1)<br \/>\nElse<br \/>\ndistance(i, j) = Application.WorksheetFunction.Min _<br \/>\n(distance(i &#8211; 1, j) + 1, _<br \/>\ndistance(i, j &#8211; 1) + 1, _<br \/>\ndistance(i &#8211; 1, j &#8211; 1) + 1)<br \/>\nEnd If<br \/>\nNext<br \/>\nNext<\/p>\n<p>Levenshtein = distance(string1_length, string2_length)<\/p>\n<p>End Function<br \/>\n[\/code]<\/p>\n\n\n<p>[code language=&#8221;vb&#8221;]<br \/>\nFunction BestMatch(region As String) As String<\/p>\n<p>Dim wb As Workbook<br \/>\nDim sstart As String<br \/>\nDim send As String<\/p>\n<p>Set wb = ThisWorkbook<\/p>\n<p>&#8216;define your sheets<br \/>\nDim ws_data As Worksheet<br \/>\nDim ws_iso As Worksheet<br \/>\nDim ws_found As Worksheet<br \/>\nDim vfound As Range<br \/>\nDim vsearch As Range<br \/>\nDim vdist As Range<br \/>\nSet ws_data = wb.Sheets(&#8220;DATA&#8221;)<br \/>\nSet ws_iso = wb.Sheets(&#8220;ISO&#8221;)<br \/>\nSet ws_found = wb.Sheets(&#8220;ISOFOUND&#8221;)<br \/>\nsstart = &#8220;&#8221;<br \/>\nDim lastRow As Long<br \/>\nBestMatch = &#8220;Not Found&#8221;<br \/>\nlastRow = ws_iso.Range(&#8220;A&#8221; &#038; Rows.Count).End(xlUp).Row + 1<br \/>\nlastFound = ws_found.Range(&#8220;A&#8221; &#038; Rows.Count).End(xlUp).Row + 1<br \/>\nDim isodict(1 To 32000) As String<br \/>\nDim isosearch(1 To 70000) As String<br \/>\nDim isofound(1 To 70000) As String<br \/>\nDim isofounddist(1 To 70000) As String<br \/>\nDim xfound As Boolean<br \/>\nFor i = 1 To lastRow<\/p>\n<p>n isodict(i) = ws_iso.Cells(i, 4)<br \/>\nNext i<\/p>\n<p>For j = 1 To lastFound<br \/>\nisosearch(j) = ws_found.Cells(j, 1)<br \/>\nisofound(j) = ws_found.Cells(j, 2)<br \/>\nisofounddist(j) = ws_found.Cells(j, 3)<br \/>\nNext j<\/p>\n<p>mindist = 10000<br \/>\nFor i = 1 To lastRow<br \/>\n&#8216;meaninful test check if we already distanced that value<br \/>\nFor j = 1 To lastFound<br \/>\nIf UCase(region) = UCase(isosearch(j)) Then<br \/>\nBestMatch = isofound(j)<br \/>\nj = lastFound<br \/>\ni = lastRow<br \/>\n&#8216;exit this loop and the above one<br \/>\nxfound = True<br \/>\nEnd If<\/p>\n<p>Next j<br \/>\n&#8216; means we don&#8217;t have an already mapped best match<br \/>\n&#8216; looping through dictionary<br \/>\nIf xfound = False Then<br \/>\ntestword = isodict(i)<br \/>\ndistance = Levenshtein(UCase(region), UCase(testword))<br \/>\nIf distance = 0 Then<br \/>\nBestMatch = testword<br \/>\ni = lastRow<\/p>\n<p>Else<br \/>\nIf distance < mindist Then\n                BestMatch = testword\n                mindist = distance\n              End If\n            End If\n    End If\n\nNext i\n     ' write the value of best match  in the found dictionary\n\nEnd Function\n\n[\/code]\n\n<\/p>\n\n\n\n<p>Tre&#8217; sa ma modernizez.<\/p>\n\n\n\n<p>Si codul asta apare asa pentru ca sa imi bag pula in &#8220;Blocks&#8221;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Un tip cu skillsetul si ocupatiunile mele probabil ca nu s-ar recomanda ca un excelar. Orice tip care poate programa in ceva limbaje ar zice&hellip;<\/p>\n","protected":false},"author":2,"featured_media":14688,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[229],"tags":[3159,3160],"class_list":["post-14626","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tech","tag-excel","tag-scripturi-vba"],"views":3385,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/catalinx.ro\/wordpress\/wp-content\/uploads\/2019\/03\/excel_icon.png","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p4Eq23-3NU","jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/catalinx.ro\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/14626","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/catalinx.ro\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/catalinx.ro\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/catalinx.ro\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/catalinx.ro\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=14626"}],"version-history":[{"count":6,"href":"https:\/\/catalinx.ro\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/14626\/revisions"}],"predecessor-version":[{"id":14642,"href":"https:\/\/catalinx.ro\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/14626\/revisions\/14642"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/catalinx.ro\/wordpress\/index.php?rest_route=\/wp\/v2\/media\/14688"}],"wp:attachment":[{"href":"https:\/\/catalinx.ro\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=14626"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/catalinx.ro\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=14626"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/catalinx.ro\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=14626"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}