http://hadoop.apache.org/docs/r1.2.1/api/index.html
适当的利用 null 在map中可以实现对文件的简单处理,如排序,和分集合输出等。
需要关心的内容
一个节点面对的是一个Map任务,一个Map任务面对的是一个split文件,一个map方法面对的是一个split文件生成的键值对。
mapper类中map方法的输入是InputFormat的ReadeRecord类读取到的键值对
学习一周之后问题总结:
1.实验时使用的文件过小,大量小文件问题,需要通过处理,最终形成sequencefile进行处理。
2.在设置sequenceFileOutputFormat.class时,设置setOutputKeyClass() setOutoutvalueClass(),中的数据类型时,有问题,
3.sequenceFile相关的 recordreader不知道应该怎么使用,总是会有错。
4.在未设置reduce过程时,系统默认的是将key value都作为text进行输出。
5.文件划分split的大小还不会设置
6.文件划分,record读写的自定义函数还不会重载
7.输入文件夹和输出文件的路径,以及在不同运行环境中,指的是hdfs中的位置还是pc中的路径,还不太明白,在eclipse中虚拟云中不带hdfs://的是本地路径,在真正的云环境中运行则都代表hdfs中的路径
8.构建较大的数据集,如果采用程序将大量的小文件合并成云上的 序列文件,时间非常长,如果将构建好的序列文件下载到本地,下次可以直接将该文件上传至云端,速度较快。
9.在大数据集是(分块64M)没有真正的分布在各个子节点中运行,是应为在eclipse环境中默认是使用的虚拟云。需要在代码中进行相关的配置,从而真正的运行到云端。
10.运行过程中class not found 、jar not found、500030端口中查看job运行为none登等情况的处理~
http://os.51cto.com/art/201305/392965.htm
如果程序中使用了系统的动态链接库.so文件,执行作业之前一定要首先确保每台节点上的执行环境已经配置好,特别是节点中有64位和32位的机器时,
首先将自己的自定义库代码针对不同的平台进行编译。(特别是在使用了JNI的情况下),否则最终的分配给节点 的任务还是执行失败。
aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAADYAAAA9CAIAAABp4A1JAAAAA3NCSVQICAjb4U/gAAAJqklEQVRoQ+2ZbYgdVxnH58zct727oWZJdpuwxth1yVbbotVoqoXGfpNSUKkKBQMaYlGL/RCFClKLAYkKJW2tUq0NiiWoLe1HkUKtVPANNR9qtzHtUm2zuya7rkn25d45Z8b//3lm7s7MvXvTvbfpC2RyM3vumXPP+c3//zzPmb1r4jj23tyH/+bGI91bALFUUHFqampubi6KImOMXtJGx7faWa1Wd+7cuWXLlotkSBFxdnZ2z549vr8BdZeWlqanp1dXV8fGxi4GZRER2aN8K0895E5Pd18yGLliYO/+wcHBycnJkydPLi8vQ85KpdL9Uxu9WkRsGep5ryLT02pQKpV27do1MzNz/Phx51x7hHSMEwwrRBHmGRkZGR0dzd5GEbF1beCjBzZ0u1hsuxwb+lRhMG7vxIkTF0DESmEY9rNMn59tr9MdVFxcXOxzmX4+nom0ZJoiIka8sYjtt2eywp769Q+WX/qHlxTEtsEYqimEs7alIXkl56SpbRmrVzE4QmcUR+jBOfL4Fp1osx9vOQJXI4f+2tZtE1+9zy8nlSGn4srLz2+9ZrdfLhsTeCiNQSANY/zAMz5eqQuc0cNKukyMeSPPOVkb57ThbIzsdiHP1nrOyhntkA2LRhOdJrQxekJr0BlG6D+1cLox89LAjgkVqWB0jDIQzk9zZwEimHyDBsmAiHb2oGxKqQqpNlHs8HIxuHFOX2gbxxtIGkSXto0MWNnGpyzGANQ1janUWkvlEP2gjFVNENAz8kFI4VNcEJJVQOkgDoOzMfTIoMMHEy7QMBoqzuHFQfTSAdFoPxrApQ+O42P0ixW4PxzNZjAw2BkRGwts8mCu59b0g4pAVC25MSoi9eO6HC+zkRHIZEooGZsiMMGVQwPRYpiECqEBqrhQEVqC0q6G6yJCRS4JDsoigSdw5CMiRU1ujoQRJAEZKFQtoZFD4CAMZ5MA5YuyU0X2wFYqKsohdpkrVi4xfKPQ+ZVqZxVhMUaABYuoMlRHJKK/oiBYRRIoJWqJTrSPrqUvYjFbRT8hoMupWtJWbmYS+qmfRCRcRoDiaubIxaLxq5GzfksqLSfUA/etkYepGVpraQsUJUg0wzJWhIFCGQlFThGVmdHio8Uk44tXnY2QPfmn2Hy6lKsxRjBbJM4YWDGDMvZjnhmiCDdRTlRM6wuWYfKmK7GfPWDVMbq8yCNAdFnbHMMiRVYpSQ7Vxx9YX8VSNQobXhl2W2Y05GMCqMs4px9kGIiVElisMlCO2jCS8JYlUIzDPLRS8kDcTEYatRXCS4N1J2zGrIuhW2oYby2dsVJeRSA2zsZRFZOiluDR0SC6GIVG9RMlcWg+qF8SczSX5VB9FDKlJK685Q2wNGKMDWMA4WElbHrNUEp3egk1p2n9obVcKSKiYEbnGnEUsNwDi/qhgMNUyRW6DnDGsmQtvRYrySq7GR0Ux7FJiKKwL2zSmSaZDOFwljJOXTUuNedgGeeNXORnimIbIo2G4BUiapXWRBbppGhLSeEBLLyjdFIHE0QohBmws8k8wGoCNAViSMgeI7cnZUFKVozaL5PwhWAJal2MLldtM8SuGlnsQQASi8VdxiEJ8YOT0mkKBgIKIw1xMBGVikrkCRPIRO/8eQ0rZUXxog1BfUhUSI5iRmM755Ih0oUjmMjyEwoYZoONo1DSwoqgqgeLkOwX4jsrlLoPpjjhS/RTythg23MqXsItN5CoWBq6rMWHRgFxwDUaUbMUrZwjB16eRTng2oyUdIMW6JaiZBXvOEzPiWbKl6FEP8lSRclNrJzRLip3Rayuzv6rZgbdykL6gEMsedLRotMqPCllVktBlCUpqp4pDzk0CjPtlvv5SIhiP7tBt6lYrbmVVWMrLFS4NQYiX0xkjJVT7hCnKaeSyZmBJYh5yhSaZJ3ch7qiKHaObohBtR6hXKFKhQ58a+KRkozkS4JTUFnZhU9QyaTtNLBaEabmrrmc0zXjPoI4LndD9Kt112AtxaNvgU91FERQaSDKO4VLKKW2UUjVTHNW8yBzZt0paJlGArd2E9TXLzooSFDRNKmiPNckRguL/M9anRQgocTRcnnNbkWUp5DEXEFhDGglymaSWMF6a0pdEeuO20CDv0akfPBS/62BCpFktJAVtFS7k6DU6EwKSkY8ZaXFSSbJZDgA301Fv1Jz2MhXG6iOKaLs0YVA1MlSFVPENG90t9AAkIKi9ZLi4clS0ohRq3CaWCkffqIud0MEFu7BLYd8HhYy7s7g01xOXE7zRVXErEKTJAo6JS75Ni8k3rJ8tozWqpSB02bT2W67C5fzy7bRCHibef0KgZgVUijXSW2h1+BL92LuikLfhscO/ALYVUXPCyo1u3LelLQcioo4OqpIMllFF6NyiYTSL3uGaCmKtqo6x3CgfLT9sDbqli74gPUCRKNPUVI42o0J8+mscwuKAvFMu1uUwqCUrWLeRtQOakPrD9SzA3N7NL5pdX6p0XA+fvHgqGKiJJ28JM1kBaHECw62+gEHIZPtsQ2trUNnYhgHfqPZrNXW+VUfHxy88ZZXjt0frVgxQ9cWI1ttupcypWNkRDKGV/komX6kjaZbh+9f/rGPFwbwPrNdCwsLZ86caTRwJ81uc12Ea/gGGt/s42v94eHh7PRFRFyz1sLx1/+L0HK5DH/xXfIFVLwI6vQ75Qb+eNHvUr1+/hJir8plP3dJxUsqvhYKvBZzFOvkq50zWnr+8Xu+c/TJf/4v8mqXX/+Vew/dvL3k5n//4OEfP/n3F06vet6m3Xce/d5No4Gd/9PP73ngV8+8eDYqbX73p7595EtX5b6bu+CKPSKGLz5y15HfjX7uWw98eCSe//fi2DAnihaf/e0fXhnbf/ed12x2S/H4cBCvPPuj2w8eszcc+MYX33NZOL+4afuG/87aI6I7P3fW23Td+z9w9eSg8a7MKjF0xYeu/+CV+vVbvPjMw4++/I79P/vmvneWLyjXOgN6LDq1yVs/+96lx27/zG2HHv7Nc/+168zeOPW3k+Hw+3Zv65kPE/eI6NXGb7338Ue+u29i7olDX/jEbQ89t5x7YEqR+dTY79ErIu+uvuO6T3/t/mPf/+TwiUd/OYUMaTsq2656u7/w1z/P9PPn4x5jMZx5+ok/euMTIwPh7F9eOOcNbRnCl/hthz/8kc/fvPWOnxy8Kzpw09VbS+f/s7Rj743j9eR3ybbxHTt6RLTzU0/99Bf3nYE6pc0TN3z57n3vQqa2a2WGrr3jh4ffduTBxw5//SjK09jeg9fuHa93up2OeOzs8Ei77tg36EIfsfh6Eb8FEP8PeaK0P6O6yd4AAAAASUVORK5CYII=" alt="" />aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAADoAAAA/CAIAAAA+FEzMAAAAA3NCSVQICAjb4U/gAAALfklEQVRoQ+2ae4wVVx3HZ+a+9illZRcWoUDLhoUFI9oqrWsXaERJAymIJbYWtS0mKinaxab2HUVSKCKP1oYiJZgSxBJB6x+Y0DQliICNhRbosiDbpcAuLGyXxz7unZff7+/M3L07dxYupisl6fQye+bMued8zvf3Pb8z997qrutq189hXD+oJP0Uty/jdZ2pG83Woq6u7vTp047j6Lqu7qpC6KWqTCQSw4cPHzBgQHZvH29NCG5zc/OECRMM4yqEb29vb2ho6OrqGjJkyMfLF+gtBBepTbF2vvl7u6Xh8sNHym7Kn/hgYWFhZWXl0aNHOzo6IHM8Hr/8u/7nuyG46aBrWg4p2U/b0Wh01KhRTU1N+/fvt20720WhXkKzgNPQT1lZ2cCBA0OnFIKbbpc/aW7oe3qrxMCD5eitQS71mGp9ff1V4GJU0zRz6bqP2lxmow1Xt62trY9Qcuk2w43B5iG4aH1tcYOMGdd6QPlT237X0XhI8xJu1hvRWi0/nFVZCrIm5ewVVVnaqrto7KDScR3U4OxovEQlyqzHJVvgrmOjPq+0vGLBSiMWzDBBdTtPHC79/K1GLKbrEQ2pNxKRgq4bEU038PIjxd41jKqGdDGGo9m2cODsF2zLRZawTZ4tS7MtOaNssmChkEKlblouakxLR6XpoP5Ua0uyqTH/xoqAYEFc6oOldq6BOxpwwWfoKJASuChnHpRTESvllGaOa+Nlu5gDzv4LZd3mZLwCpyFly9HBzTLeZaENoO2UrsfzsoKrBXGNSAwEeiQi3GCFwMKq0EFLboFmlHHoOOs646ijwgAfbjCoDLpEFy82Yrxt4OqqHgWgMz4227uolxBhrjhSqUh+4ZVxsaEhlBoMoNndukJd4CqNuTkrXOpKBraXnskLfPJ5xPSyCM9JKCZlXAvNxE6cAKAVOtSFxiC2usyccKEuhwcT5RKjCihZiUuxvUmT1oFUoASRUlHI5BBQCMbexNB8MRxUlzUIPZUWReF1rjNLbtHujmkb8cSV1YUN0BpcGFApRtVEOnpAlAW3SAUFRUXRjyFmZP0XEbnqRVehoRN8FaWs5sBViHrqKg6GE2Bo3A07gt7VjYRjW0ZaQspGTThvDEqnYhhasXv5A0vReFpiSEsEg3IZ0orMIjZXVZqVNiAlX7xrWw5WXi8fc4K4RizhojVXmviSRnRpYtdweaalwSyKirp+zsKQTAL+qKxnDbhVG4UisgkcnaDKbMPER25JczYympEfJm5WZtCjCcdMajFYwmJmgKxcPMoJOPudUHQJtxiRmQuKUjM6D5dMsRJc9MNwyxqSiHstdRV6BEQKzGVmymXeNe32pK6FpAUMmaUucJMXXCeBAZCf8Oirw410ra50FYVpEFlLKqbiURqA6VbFWigVMdHlkpNh6kUby3QBhwcpM6WlTNkm/FvIYynLKApZZyG4SM7OxaTrRLjNAJG6YrNA4GWd0RmYBNeBrH76QcJNbtlRGWVxBTYnURohNlOMWIp8OkFxli2Deisfq/WKULJfx3aMsKQbhkszIChx4qodQSUEkVQ2CElTPICIK0oqedbDhXLoAbur9APEFKB9ONpG9jaZqqQXSYMu9hnphC8YKpKXoxliCStlYpd3LOyDgBMbiAPoW9LiDwegGygkaCiYFCTKnthUWpwqfKCUOPQ8dyP63EiIDE+koEgUCR5Z3o0l8KjB4U0sNbZmQpC/UEbnSrJcx5QlZYnQSicmNtmnxBvMesoh4HM9Vk9XRezq2HptJao3B5mMp260qF+QVK6zcfPtZNJJRZ3Oi2TCS7OQYshBZ/kPDDKBtNLklviymTp7WirWDGLUk9JXmnMgYg8z2E4sZ9xEV/PxPL3Q7mz1H8SIKE9kKpGlk5lPnKmx4MrwFFudKRuZlGszymmH9HSL4xqhDwxh6iby7M4u3YozEWLKNC5fTAhoLqceh7iBMitKOdOIgtuT2J8AKcMcAtVFaexSueJGEgUO0iGyoGmDtVtUEpOXrJ6ZBZu7iLAKNvlU2Tdi2pHKAN1O6KF3hkNgejeWK66RKLCTzNt4pA+wKn0FF4TKuHKlQD1iyZ0UWGmp1r5aQxln5rKAxr5b+KihRwpyS2RIeFBXT1Fdef7yzCBc8i/TDl5SE2IcaSd0W0LhyhOSZwDBok9UdstchRIi5nM9mjNugc3tJ8mPTT4r4q3+64YWOskMQhnQWFnCM7Fys5ekMkRV3LSBtwqlMxyYSK7qGvE8Gw8ZXUlkXx9XnhkCxlUd++r6uP6aU7uUMokkKZWPKSqejGUJ0uUKVC1KnxV/kfdzxQUi5mZ3mHzOF0o+LYBV5QTPCf5aU+piBCHzFhkqxce87CkwLpme02ZQmS4DVBVTtpXrrsahjZiVTEY4/Z66BoybKbAQ95IiZCbKrP6zAXdmmUkWKivwQThndTUtEs+zOi/pUZVuRV0coeqSUkZUA1NRT1qpl71KNBal0zsI27ChvDX7sCwn16WGN1taBO41KJYPSkug855pQY0jWAqOZ1oiTSw8iji9cWTRZUNbpmXkF2Q1ZEXwmQHfgNtGNJm0DXzQYoPgIvMqeUuK3mhCjBeinK4HKAT2tmi2vfyheqLtI0YylcrLy+FrEfRYOHnWyY2rnE5LAqY4JNjpMiPs8/ltpIXXhnf5KOy/5fKYgbuGMWjq3b29g5MP3GttbT179mwyiRmmentbH9XjVwL8KoOfZEpKSkKHCMFFO8uy4Ir//5fSsVgMHsD3/aGsqAzH7a31Na+/il+jrjkrAD7F7cso5Kau1fzGiicWbj1+LX8PEhVyw7XbDr61452TXcGc15dChvadG27oW69F5XWGG5aQrXN7X1324ms7j11wov3HfHvR8h/3+AHGPP7aYz9dvbclqWl5g744Y/6TP6wuRTdO++Etyxav237kvIP66odX/Gra4GhoJcJiNu9Yu/SlLXtOdBj9KibNqa2dVVVs9NY4I4xZuG7nwZfn1W60auY++aOqfua5tuLBgR+3IiXjvzX/13MGFDtn9r76/Npnn6/a/FzNDfaxDU8v3zHwB7988fYy99yHbUNK0LUZVqm57ftWzXvi74O+u2Dl18rO71733KoFvxm66ZlbzoT0EHBcENc9v/uVzSeGPfiHZ+aMiKXbQsjuwygaeXvNSF5XVfQ7tu17f337ZKrmBu3S6Qta8W1fumVcZaGujVbN7bBK56Ndr7x+bvxjq+d+/bPw4qhHW3bMfvn1uo7x8ZAeroCbPPXOUbPkjlvLu1kD79CSJ95cu3L99ncbWzojhXF8dT3OQpO8ynvv/8LOl+bNPjRl5qxZMyaP7g8lQitTTe82WlbrwrtrFnZ3HW/pjN8Z0kNwcHkc7T4631s6vXr60gOdPWq73l8xo3rmC4eTrps6uuY71Xfct3jr7oP1Rw5sXzyjeurCfV5ru71x16YlP5lWXV3zwJpD7fhkhiOrUob4xuPb6hq6jw+aLuF5NaRxDwx8OR7Aj5ePHWq0/vtfTdk7Ar8J1bSO4/s+1Crve2jaV8ZUjKwcPbw4owOj4Mbb7vn5qo0vzCyp3/ynui65lVUZL68aarQfadDLh6ePYYMK8XNISOMAXtC7RslXH5hWOn9t7dPO3LvGlUYvnWm/ceLkYUWlRVrL3u17God++XNjBmmb/rjub/2nVPQ3zpzs8Do0m97auke7uaIs32x++z8XtaIBRREttBJDfP+u0p9tePQp46Hp48vjHU0ftI2YOq0q73RIDwHcsAdI8/Q/1i9fvWVXwwWkpCETa3/77DfLzu9a9YtFfzk7aeWGR0Y0/nnZkvVv1H/E/4Ml0W/w2NmLltx/s31gde1Tm947i6hE+1fU3PtI7T1ji5NhlQyo2bxz/Yo1W/957Dy+v+t3050PL3l8ymfeD+khEP0w3MCMPkmXQe9+kthCWK4z3P8CrOkn9uiNx9sAAAAASUVORK5CYII=" alt="" />aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAHwAAABACAIAAABXxIQ/AAAAA3NCSVQICAjb4U/gAAAN80lEQVR4Xu2cCXgT1RaAZyZrm6RN04WmLd2gVCiggCxCN5DFx/Z8gIiK1lfhufFEBVE+Vv3wk6fIU+CrtggIlSeggsquslqg0srW9gHSFlpKS5eEtM2+TN6ZSZptskxt2tHXDvnSzL3nnHvuf8+cuXNnBtRsNiM9W9cSwLq2uZ7WCAI90BmIgx7oDEBnM9Ampcm68qLqKyfMiJnFouUPitpjxWzGKfZcC3Az/MPBePwDE8PjBrlWd/k+rU52qldGvabq8rHgECGGUQ47D+d4jMWx1eAmg2f3nPRNJlNF8aEQaRKby/es0hU1zEOvr7zEYqEQrybcRKfHAJKFcWySgNJJy8M4kTIoNNRw83JU8kg6DXWeDPPQa66eYbMQV3bUHkP2IQqJL4yN2upNRlpDZZHHUPPtqwXdHbpSXqtpaQoOi/DOziV82WYH6CaXnO4s67yHsthaeZNSXieUSKnD2mUlDEd6zdWzLDYHd0kRDr13my3MCGoJexD0PlpUjnA6vXPtXPLoGdSqLithEjrMKKrLCkQhYSbHaG3D7CVizWb7Kdd3XrKxJC1ibG5V6el+ox5FqeftrqLOJPSGWyUIMeHDXKLVbXQ7AiEivW0jBoxQ8KnUpoBiMIFsqCrtlTC4qyC7tsMk9MqLP8DkzzFUXcm57tu8d4j0dpxIrYMDk5jKCz90U+hwFpVI4zgcHqDHOBwWfLO4GIuNsTnwIYPZGtH2wEaJn4HiXjb84gipvdamQhaZcSNuMppNRhy3fsOk3gwnEENYbWW5zULX/2Ay0vlCcXAQn4xmI/HBNQiO4Ab48rHx+FybhFbeDnzEAYIiMLJ8UbCPNjqzmknoOI6bcJ+E3fTe8dIfjLiRcCpqS1IOyYqGli+rHahnEjoc/7+v8zhu5wcDZ+2+xxMAUe9aSWPFpgNUfagyCR3YeYXuCspGzinSXS+OKHzdEXAcNnf1nVvGJHRgZ49TN9Hoseey6iuWOsqweFRxqaCzNknXVvvlGIVOJnWaPrefrzcNGG6a7XaGGJPQcTLUqb3yRssq3VERaJrabpeVMAkdkFNmL7RZ0BZ0i7L7RrprTnfB0zGsblm3FcIjEG6OMK8q/qxkMtKJ2QvtnE67027GilrUzWcv9COOis7NQLgXopR250i3XRxRqLjh6Vpk12m/ttNU1dVwp+8zmV585HSHvvtpruEwTH6y+PvGh1HoRKCT6cUhVNsftc769DB039kLTJad7hm550WMQl2DorpW9sCAWB7X/hyAe3GfpeSodut5Os3cSizSkMnI7cWUK2caB0s3jXS9Xm/B6IrM3T6HzQoSEk8IAXSDwQRR36rSBgkDQsXCu03NCdFhMoVSZzDC40qiQF5IsMDRhvXRDWez0DQ4wOXa1+XdNdtZZYzldKVS6XGVkRKq8nvK8lv1MRFio8F09mK5VmcQBvLqZc3lVYjBaIoME9U3KWobm0FvcHJMkMj3A1zQNDggkUg6i6tXu8xAh4BVq9VEpHu8OHICD3IgDJ8bt+o1Wt2wlLgwichoNJ0vuQUBDmsJUA3fo+5PFAcF0klBYAocEIvFbp7l88rLL5XMQIdDG/pMpmjfz2cBfoI3iVXerBIEcCViAWhiGBodEdzcqrYkfLgtCinIlTjloLFQs4w6uMHn+z4s/ALa0Qgz0A0Gg06nI1ZAnFcZPSAiUjlB3WxGUVSnN8Kch7xBjcBv8oFcWE4gah2OG0+WrH0HYXAA3OhG0OGxC+gwmV18LQOQ9AjopHRkqOhqZV3pbzXRvcRqjf7m7QZy4IjkQ978833cWKiDLDjQjgeV/BrtzEQ64CE2kqNzd9xHKBnMEJ0maXhQs1J95+49+MB9fUmwQK5QtVkjqNOBQ+YrqxIdeb/LMAMdTl+wkZnBd2wC7OZWDcz8IKNARCfHR8RHSVQafQCfw+Ww9HoTn8fuGxueEBNKhe40htYd4g80bfHB70DpGGQGOgseLOJwyMzgOzZhggiTccgnkMct8hw2Jm6bF/K4LBgVNgse22LRTxfQNDgAKnQY+V2GGejQYR6PRyRqckMdnk2k9pDNwob27y0WBUB4UmvbW0IeMSg0DQ6AG+1V94s8M9DhUjAQtpAoleIunW4EsFGdRktHkqaMMCQa2mfqihSFQKPpqH/F5HJ5S0sLnfTi33Yt1iChBwUFda8rUui5UCi05JbOYOrTJsz3wQGfYp0kwFikQ3/INS9mjjOAzlRugY4zCb2T4uiPb5by6uYf3+U/v4c90BkYw/8v6Lji4p5NuT/WenmJmgHGlCb/JNBxdc3lgsJKlY/rV5Os6OvdR8qaXcW0ZR/NGTfr/YsqSv/bV+AnO10HXVmwKDNtyjvnXIjoy7fMTUvL2lUDb8B43nTXt765dNM5uStNzxpONSxBZGx8XKSwo5eCfrLTUTdo9hpesm2plZmQlh837Hly2Py+bfcm8cZjG3dWIUh4fSusfHWaM5z4Oe9vnUPbV4+CfrLjIdJN9cf+9eIT0yekEVv6lKzl/ylpsUaZ4e7pTxc/MQnKM6Zmr95T1oojquJVj6TNyrmhJ71VFa2YmJb9zV1y/RBvOvxieubC4wpDc20zwosMrvkqr8AWsNr/7tx8gRstQFpJ6LAZqr9aNGM82eyExxbmFDQ6HABVnz45lqxZcKqVkDXKzn++LGtKBpSMnf58TqmGtIA0HVk2cxzhnt1tQ0Xe7LS/fXRVB9VeumaSF+evyp5GGEzL/Mtj2UvzbxAa9s3RjidXTbIzOW88O8OCbvLrB+up66geggtXVvxa2pQw/51l9wVobhds3/TJkvWJe1aNEqovbVyw7Gjk3MUb0iKaC7et3bj4w967V/ZPT2Ydv3StGU8Kx/S1xdc0iPx8pXpmpAjR3DxXaU58foAIv92oRBNfeD0pf/XmvTdT5/XhInjjibz9rSNeWSL6ZE29QmtGAuHNN8mQmQvffSZMhDec/+KDLas/SPl6bYbYEhqRs95bM1XKRtDASAFi1pTlLVj0pTFj/vIXU4INMoUoynr4cBMmvfzUkFBz7ZnPN1jddrwl57Fr2rLcl1/7UjPm70vnDQ7F7/y0ft3RS03Gp5N4TtztOx5cxRVlJwvvxDy3+q3BISaVuY+EupLpATppWpgwfPTI/jxkxBBp7dl5R45X6kb0Prt1v2zIW7nzJ4QCiOQljacfz9t/TTMyZWxfZOPP15VTwwWyy0UNKAu/VlClHT0QrT5bqo6ZNiQU05fe07KCoobNzRpweHP+r7NXPcSv2Pf5JfH03MzY4zvMN+62GhEJB8GEfUdn9CVaT0kKrjyS9X3xHX2G2AKNFxbXp09vi8dmReHWr2vintux6pkE+1IhOWsJSk4flwpuIw9E1BTMOwxujxpgR2X5Re3ayLhfwKB07pa3s/uBLh5Xvx056qrmtO/NVWHiyNQRhA9uN2/QbQqcsL7hiLJJadLXXakyGuVrHs1YY7fGbdQgQx98OEH/xalK7ejYktN1fbKyebsOnas1JLOKi2S90lKjOXiTXIXwpQG8yAnZY7e8ue3H2mTpjn2Ng16a3U+gKw5AVHI1mb50NSe2bNj+05WqRg1LwNUhnEHuT7C62ovlBkn6cKmXxVl2eJ8w0m23PbcU2rtWe7HCEDJ6TKwnUhQjdF2lKNI7d8H/2IEhZhMskxArJYL05R/PT7b5hvLDQzA2PmpsTM63xypkI49VhGW+Oi7o3I5952sfYZ+4LUkdGwupRK3QIPwgHiw0DXl6VnTWjk05wYXs8evGRbCwRniDV6vQAHRDRf6Slbuxaa+/vWRAqLl674rVJ6kuW0porK2jWJvbnowQ/++OVYZ4vRrBOJjz69eeFdvhKsWIhxMpRc5SwJWm9MZUN26i0njbFhcpgKzFicmcGCP/ed/+78qEDz0UEzlsXNSdw/u/P3gzNGNSIgyQCagiPCEP2uPETn1mSGvB0ZvRs54cLIA7CjwhF9Eo1BCQ6upLt5H7npo3beSApL739Y8XtfmBcgQ8RNustU0ZudKBvTH5haI6f10HcXv1j0Rkly47nrk9UCCLPbrqTcla1z7omGTMs1PC63YuWZF3oKDo1/OnDuz5jpi+wMaJGTc5TvbDtkJ++vg4LluaOlFauXtXeejDk/sQBwWubdEi3EAuEUhYWPpLL0x9JOuV6bFEckDZgQEsXN2iw5GA6AGRyLVd2w4Ully9fu3GHXVbD7hRg+LZ9Qc/23Pq/C8nDxwubUYkY7KnhVdvWbRy6+GCouLCE4eOVag7smKJhY6ZmyGo/HTZ+r0FFy6cObDr4C2ycVx+csX0zJlri1qdrXt0tc1lL39p5XS7Pip68LWc9yQfb/523dJ8E8IKTnz4lYypKUREcmImzOqf9+G9CZMSiGlEVObkxNwc01+nknuIWavUWyMd9vj9Zr+5zGYV4wp5SAORX0KTsta8Knt/+/o3viUSMS84amCskIgLTJK5aHHRyk25y88inPAR/xg4fmD80IWfrBV/lPvN2qXbcIQfk7loaGaszWb7f2AhqUs3LAr8987cpfvVvMikaGjVMvEg1p8diZMZiOvJVTotW+4k9Hy7ENBXfDYndfq6Uo1Tuabkg2mps/MqiPsAHdjaGel0hvHPKqOvOrr3AhbTOyIIa604uX1nTcT0ZYltM3xcVXO9/HbJrhP3BMPv7+VlykSn9z3Q2yjhypqS4/k//daoMiFYUNzwGatXzk8JaKvVl+e/teCQIvT+x1f8c5jTo9h0KLvI9Nw5+h3QOqrSvtlLR1vr0ScJ/A/uLDpPdpxkOQAAAABJRU5ErkJggg==" alt="" />
编译打包:
引用的外部包路径+java源码路径+class输出路径
(将.java文件编译成.class文件, 将.class文件打包成.jar文件,jar文件的名字可以随便起)
$ javac -classpath /home/sunny/usr/hadoop-1.2.1/hadoop-core-1.2.1.jar:/home/sunny/usr/hadoop-1.2.1/lib/commons-cli-1.2.jar -d ./classes ./src/test/example/Facelib.java ./src/newMatching/*.java
$ jar -cvf newMatching.jar -C ./classes/ .
放到云端执行:
jar包+主类的包名.主类名
sunny@MASTERPC:~/usr/hadoop-1.2.1/bin$ ./hadoop jar /home/sunny/workspace/eclipse/newMatching/newMatching.jar newMatching.MatchDoer
在分布式计算的时候,datanode节点需要从hdfs文件系统上将Job的jar文件下载到本地磁盘进行本地计算,所以需要在终端中提交任务的时候将jar包提交上去。
在eclipse中Run on hadoop 是在eclipse的虚拟云上进行执行的,没有生成jar文件。
在分布式执行的过程中可能遇到的问题:
1. class not found, job代码中添加job.setJarByClass(xxx.class);(xxx是自己的主类)
2. native lib .so not found, 如果使用了系统中的动态链接库,则需要保证每台及节点系统中已经配安装了相应的库文件,特别是在使用了JNI的情况下。
3.在eclipse中直接点击Run on Hadoop,默认使用的是本地文件系统,LocalFileSystem, 如果文件路径前没有显示的加上hdfs://master:9000/所处理的
就是本地磁盘路径。如果在代码中添加:
conf.set("fs.default.name", "hdfs://master:9000/"); //在你的文件地址前自动添加:hdfs://master:9000/
conf.set("hadoop.job.user","yourname");
conf.set("mapred.job.tracker","master:9001");
这样文件路径默认的就是HDFS分布式文件系统上的了。
综上:
1.eclipse中的hadoop就是在本地运行的java程序,(文件存储在本地磁盘,没有jobTrack进行作业调度,程序顺序执行);
2.加上conf.set之后文件系统使用的是hdfs,但依然没有jobTracker,进行map的节点只有本机一个,程序顺序执行;
3.通过终端命令进行job申请和提交才会触发jobtracker进行真正的分布式计算。
>>开发过程:
1.在eclipse中coding;
2.在eclipse中运行通过; (Run on Hadoop)
3.将eclipse生成的bin目录下的class文件打包成.jar文件;(jar -cvf yourname.jar -C ./bin .)
4.在命令行中提交作业,进行分布式执行; (./hadoop jar yourname.jar packename.classname args1 args2)
其中1~3步骤是编程调试阶段,第4步是真正运行阶段。 ^.^
aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAOQAAABGCAIAAABT4a8/AAAP0ElEQVR4nO2deVwUR9rHq3tODgHlEJQgR0RRZEFNvM/XYz03bjwwBq9VY6J4ZHWjGFdj1MRV45F4H1GCrhoV44VsEIijLOoAIiBxFEQFAbkZ52K6q94/eoABumYYr2U+qe9/0/PwdFHzm5qq7udXDdAro1Vs7SL0XZWuefVUBIIJwKunIGIlvB1qxaqtlG0JDXIXAQCk3n2XHH/IslCPlNEjaenw9f+cGOhsKwD2XWYcesCyECEEy1L+NaWTVACAW/CcteHeEm8iVsKbBiCEWJbNWhcscR6y+lSyQqGQ7QrtIPaOSFbrkfLsCIqi/JbHFNfU1GRu7C12GHWxGEFYfuVjZ9D2r3sSFYobR6f6AQDIyEp44wCECrUlaZNdhEOiCiFECCHEFEX2E7svSlKxqvMjAOh3oFCPEEJM7o4godeyVB0quTzMlu67rwhCBCEsO/VnCU1GVsIbByBUmJt+qTMNGiEee1GpV0aPpKWTL2v0CCGkf3a0l9BtfpJKl73RR+ix4KaWS1GTucaP8iNiJbxpDGL1B15LbqkavcfNWW2mxBiL9ZMbL3TZG/2B19Lbai5Me2/Du/S7RKyEN41hGjClNQjcrOAWT3XgxIpKLo9sJfrzz+XctEEVM7WV0IuIlfCmMSywMtYGUaKuc/cnKhQP0hPPbVs6eXHsC5xYISyPDXMR91iXUo5gVdamgSKywCK8BeovXSVuDevuaQMAENi6BI2efyRLjx1ZEYJlKZsm+9vZ2jm26/yXBaFeog5ErIQ3zWu4KUAgvB2IWAlWAxErwWogYiVYDUSsBKuBiJVgNRCxEqwGIlaC1WAQK3N339wxPX0cAQDg/UP5sPa2K+44P7qqhC/HhnRwogAFBDY+wxf+lKk2/Im2MnHDOF8niqYE7oMXnFFoXyY/4Y+NQayam1vmL175/Z5VAYIGosEd54Vly+L+FbHl0Nm4xBvJF35c3FsI2s6WVbIQwqeHRtpS7afvvZaefG7FeyLQZWVWtcX5CX9wGk4DSmMGi/lEgztuEuVvM9pQfqszdEjzaHMgkEw4U8ZChJA2ddk7wHWWTP2K+Ql/NN6IWCGEbFHm4WntBG2nXy1HqOTyUCno+f0Tw59XxH/gALpteVBf5EXESmgGr1+sVb9OkgoAAAB4fxiVpUEIMU/29aDo/ztbLF/RSer54S+5txa+Q7VfelPLvEx+wh+W1y9WtqogKz0l+cKP4X1tJcGrUsoNYh0W/Txjy8iAPgsTC24SsRJegjc4Z2Wf7g+mJaPPVPJOAxrUehOxEprBy4sVQqh8/vRx3hOVjl9i3IDa/6fn3ALL5sPocs5ZkBbhRbnNideZPy+BYIRBrKyqLDNNfjt+Zw+K7rryilyeklmghhB7HCGE9JqYMTQQBGzL0RuSFF0I/3jRjsgLCbLk69EHwvvaAsnQIzl6COGTgyPsaM85h5OzUy9/2cem7tIVb/7/SUcQWj4Gseqz1vpRDRyuXsvkLAtxxxHiESsqT90Y2t/HxQYAAGjpOwNm7pZVccpmmNKE9WN9HAFNCdoO+uykgjFx3rfeCQTrgNxuJVgNRKwEq4GIlWA1ELESrAYiVoLVQMRKsBqIWAlWw1sRa3XSRy6Gy6jkNhXhpakXq/HuQM3BoniGYXSVyfPaC5olVkYrm+FgfKegbgsjlmXzz3817k8uFKBoG+fun5zl9o7FORHYousbQwMdxQAA0KrjoM9P5BruaGDiCS2ZtyRWhBD7Qm6ZWIPXJ8pvyuVyuTzlTl41t3FxdeIiT4F9/0/3XLqaJIv5eX9kXCmDdSIwTOnpDyTAbcIPV7Pzfr9zckEQEARsyNaZcC4QWjIAIWVO+o0AQcOthOtuoqpzTywd4GFPUYB2fS90161KCJE+ZztvPISwIDpiXA9PGyFFUwKPPh9/n1xeJ02cWHkKYhitbIaDYMDJsoZbcMKaZ/sHiV2m/6e8kdwxTgRGk7HSF7gtvM7VIuof7w2mJaMvVZp3LhBaJAAhJUJKxDtSMtq0L3yBXZ+1v2Qo7l37YZIH1XrihSKDUprGsyybtfuzJVtPJcrv52SlRM7pJHT6y8Xi2ndxI2vTGgNuZKWlEhoAkWO3SWt/zYcIIVR2Zbyd/dide/42wEMqkboEj98QUwwh1olQA6tli73oTotinzCsTiX/brC904iox81wLhBaJCanAcr//s0dvLvqruFTfHp8gEQ48Fgp9xmbnQYwhZHctu7cy+aLlWXZe1Hrv4+6JJNdv3xo2UAXQAWtvqdEzJN93QEltPeftTvudprs4OwAIO2+577elBNBnXtwqjs3+lOt+m28Xg1hM5wLhBaJKbHC4qj3BMJRFysMr9Wpf/cG3l+kctrlFasuL2bt9IEdPV3tbO3EYjEQ2Ey5agiwYM5qBIRQGTvDmXaZn6TixNpq2q/cYguWxU1wBEFbH+of7+V1Imj0FXHh3uJ2E7ZdvJlxW3ZkXjex26ijuYx55wKhRWJerOPOKQ2vzYmV0WSsCAA2Q1efu6HIzy8ovHuoh8j2FcWKaou4h/5SicrPDBaDbusNv9esNjPCD7RbkqwqvcDrRNA/PzlYDPoeLeKOw6obM12B/+rMmrJzZpwLhBZJvVhhcdT7AteZ14weg9FkGtBfLBgQVWL47JvEM0/2hQg8Pks2bIGtS1vuKbCvEytSpy7qIAjelmt+gdUQddxsF4Hr/CQVUt1f1wk4TY/nRlZUfvWvTiBws0Jfo+B1IsD8A5xVwXDG6qTZbUH7pTdVuhwzzgVCi8TopkB10iw34Bd+5Vm5RqvRsSxEjDZluTew6/PV+cwH2bJdk9sJHMefeYaPr4if7CIMWZOqrkGwLGXTUIcG0wC2LHqMLQhZFnvvaXHRc03taopngVURv3Tm0h+OXZbJrsdE/nOEOwBB6+4pEcuyD3f0Egg6LjiSlPm7/MRn3YAkZPs9ButEUOdu6g6A95S9V+/mZKWcWtJLKHD95OoLE84FQkumXqwsy+ZFzg12a3DpCqpyjoQPaGsLAKDadJ+0+UZF/aWoJvEQwucX1wzxk0rtWrt27DVv27c9xXb1IytC2kfHZwY5NLg0hvgWWC/kX47wdZIAAAAlbRMydX1iQQ33Fqx5dnnNGF8nCgBg6z9sZXQBN+rjnAg1j6+smhDgIAIAAFvf/vMPZ3JfElw8oSVDagMIVgMRK8FqIGIlWA1ErASrgYiVYDUQsRKsBiJWgtVAxEqwGlqqWC2q5NdWJuyYO6Sru4QGlNC205jPzz7Qmc7D/ywDvEPB0nico4HNu7B8jJ9UAAAt9Rm+0OyzFXBOB2x+vnj9s6M9Gu7RBADoc6QQ6TXY9uPOy9t+E/3/WmmJYrW0kh+qUpYP6j9z9f7omMTE6G2hvoDyCr+lZCx+lgHGoWBpPM7RgNS5m3sCuuu8qBv37t+5smF0G6rL8hQ1Nr8JpwNvflw80lbeS0+Vy+Vyufz27dvxW/qKxb0P5mLbj82DaT+u/1+7MAAnDt4K/5rMNR3EHSLu1JbQ6wuPDBHahl5RIgZpK2VbQoPcRQAAqXffJccfGopdWN2dZe2p7hvP75gZ0hYAABx6RMhVEDWjYKWeV6vkr7owyY72XJmmsvhZBhiHgqXxOEcD82RfMC2pq2Krebiti6hNgxrLhvlxTgdsfpwzwjiGKT0+0tZ+1MlShsW1H5fHfPsb9T+GqoQwF8duowKdgKjDR9+s/9CHAvZd/hFXDSFCmkf/XtLfw54CAEjaBoz+Jsm4bhMgExX+qvtfdwY+K9IMQnxyrI+dU1hMNWJ1WeuCJc5DVp9KVigUsl2hHcTeEclqhAxiBWIn70lbEvKqNCWPEs5G53JlWE0LVnC8QiU/hLD05EiJIORAjuXPMsA5FOpoZjzG0cDk7gik7Sb+p4pLxpXn9tj5uL4MrVF+vYbf6YBzTODijWAU+0IkTtMuVUKIbz8mj/n2N+p/DFUJYa2B77LYtNOz3IEg4OvE1KipTnbjfqmG+vyjg4T276+/lJ2Xl5cae2zj7puNxdqIugp/lmXvrwuk/f5xV8NACIsj+4tcQ+MrEFddNSSqsPYzK4rsJ3ZflKRlakdWm2GnC5rkbbZYX6WSX199a1kA5Tr7fBkLLX2WAc6hUE/z4nGOBq4guNWInenler3m6fmF/hSgfFfeMbUDOMbpwJ8fE18Hy7IZX/oDjzk3qs39v7x5zLa/Yf/jPqOqhDAX5ykJlajixFC6/bxbSqboUA9pyNZHem32l77gnYW4KYRBrLgKf5i5OUDgE3FHzc0BuFpS/f1vOzeerwPx2IsavUGsdNdvDKPpS/Hylfzq3B8/cLbruSalvHl58DtuGzsU6o82Lx7naGBZWPXf70Z7AgAABWj/KSunewvry4Wb5tdV4ZwOvPm1TCVvfH0rq5M+bWf0U4n7fzHnhdBc+xv2P46qhDA3j1lJSlR9YbjY7/MUNSqJGijuui5HjXQZ3/VrRdNugWM+Dl9zIPahqkE7AUIljPIBtsJf8+jbbkKfFWmq/J8GSlxnx2kQQvr73/oDryW3+CYlrO7Osvai7jsLzE5MTWD2GQS8aPMip7lJAv/+a2lt2Ks9y6DeoVBHM+MxjgbuS8KybMmzpwVlalgW94ED6HW4ADcNYEpO8ToddKXRvPnVxad547kwCGHl+YmOdOdv7vP/stW1H3feugsR/O1v2v8YqhLC2rabzYnVwevzdLW+8lg/UZevcrjVRMWjhJO718wb7yulBL235xgtMQBCJU8V13AV/izLPtgSJHj3i9iDA0UeM69XQYQQqoif0hqjHrxYLV1g4Sr5efMw2vyoMHdpwMIrxfpm5kHIjPjqHQqWxmMcDcbdBSEs/nmcA9Xp6yyj36CG+XFOB60mmze/9ulh/nhubNUXHh0qovvteVbD3+t17cee12iMbtR+/v7HYEastSgvTXEQBx8pqE8IEEKmK/z1Odv/JHLo6CJ0/vQ3Q3NZXcbaIErUde7+RIXiQXriuW1LJy+OfcG9hR1Zmz1nNVPJ3yQPrHl2bLoHcBy66Tx3fUaelnonv5K19FkGTIWM16FgabwJR0N14rav95757bdr53fND5EAx4mnS/QMLj/O6YDLj4vneonN3hlMS4afKKn/pmEcGSYcFrztx/W/pWLN1up+P7Q8Ys/F5MzHeQ+TD37kTrWbm2S0ZjBcujJV4a/N2/4eoCnnBhcptJWJW8O6e9oAAAS2LkGj5x/J0r8usSLTlfx8Npjx9o023TB82Sx6loGmOhXnULA0ntfRACF8Eb+yt5cEAAAcvIct+VGhNNUeloU4pwPOMYGLR6wubYUv1Xqi8W+0CUcGbx5s+/H9z4sJseYdDx8U4CahAU0J3HtP2327yvhHrCXeFCAQeCFiJVgNRKwEq4GIlWA1ELESrAYiVoLV8P/F43glcLgxRwAAAABJRU5ErkJggg==" alt="" /> aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAASAAAABJCAIAAAASbWzxAAAQOklEQVR4nO2deUBTV9rGz703JCEooKCy1BFQxx0VRBFBAcsqoICAS8UFsY51tNBWu9jWqu2n4r7Uqmgdt+pU0CpadVymjooLWmotuCEKCIhssi/33nf+SAwhJJcQ0H467+/PhPfck5M8nJOb85yHkDbBzP8CsGmxtjRFt02DCII0gAJDkJcICgxBtCLp4B6792Z2DQCweb8fWuBhIiaEEIY291917o/HxRzHA0DV3bNx422UEqLa94vanPykluc4LvvW79X1gAJDkCYwhq4r73I1t7fEBLq5jgz76kgpW3txdjeaoonYJjYV4N7mKeO8vIOn/t/ZcoBfY/qKCSFE0iH0YCnHPtj58Xj/4MkxWy/WsCgwBGkC0y3ySm3dmWldKEIRQmjKLPwUC6nzbMREIbB/h7QXEUII0+PdW8Be/dtbFKFIz7m3gP11oZ1CUbhERBCNGHrvr64HdXK2DJExagIjJp5HnkPGV71pijbyPVDGF2wYIVa0ggJDEI3IfH4o4wv2TBjUVxU7KwmtPoMRE8/EEshc2pemaHnVRleJohUUGIJoxibqOstlxTlJmSZPaRcYsZ2ZwvEPvxkskQsKBYYgGqFEMvc1GcDWXNv18aRgf7+AkOiYeUE2IkKEBEaJZN6bsnm++OfV0UF+3j5TVz7geBQYgmhCbOo6b1fyg+cAwPN8yb0Tnw6UECI4gxFCDLv4fZZwK48FAI7jSu5d3jBGcacEQRAEQRAEQRAEQRAEQRAEQRAEQRAE+R+AkvWd8s9srmC7m0R1M4as99f3G20DvjypQzMttXPZV9Bk93DezuFGhKYYm5BlCdfyOI7nOC7/+uGFb5s3+W2akjnOO18KcHdZb1lbv0oEefUYWA6cviIprx4AoHj7cFWBMUZO3+bA/eU+gwYOsre3HzBgQFdTA+HWKJGsz2CXES9wC1t8sxoKd44zlRCaYt4KX7d9yZywAC/fd2L3/gZQdnxc50bl0j5zzxfWFlXwKDDkTYChRYOXpT9JPfRJxIRtObyawIiJ5+HKunPhzc1aWhAxncJ+LIOszS7mGrZQGQX+VMs9/spBrHyEMfP+LqP8+rJRgbvKUWDIGwNFCKEMHTc+Vp/BiNXEK3xZ0pTeXUwk2ooFmm3vua2QLdobbKq2DqQI3b7nqIVny+HuuqGmLx4UWU1OKC36caKFROr+fTEKDHmj0Cgwca855zJLq+oAANjM/2yY3EuDq0UbYpv3LgGkLLCTNp6+rCddqgMA4CtvfO6i0BBNMV1nnC7N3j+mE0VoFBjyxqF5BpM/Rej2tsOi4tM5rmiHt67LRbHjkhy28th49S32tIFRzwFDPUNmrjv7DGpSv3A0IoSILMYeeVp+ZKI5RVBgyJuIgMAUtHOJz4Xyvd46TWK01HltLhQnjO6g1cBCG7vuzIfK3e5ShphPutDkziOw9ddn2qL/BXkjaFZgVPsRagKjKcbSY+7Kb2JHWYrV/7qd/aqHUHXYT+El09Lgjjyo2OcjZYjIpGufPorzCvr0d5j6UwVkbQ6w727SpGEEec1gZOb9+vUfMDxiXwGUHYl0GjSgl7UJRSjCGA6dv2JhdLi3l7d/WNTKE0/Ul4hGw+NzAQCKtjmry7LHe7eAvRRt0Wh9KO0ctHDx/ClBXm97+YdFrTieA9zTeC9TogYuEZE3CfGwuHy20dqseq+nlCFE0sF/+Yk/cqsBALjaJylJ34R2VxWSiOkU/P1DtiZnd2hntS9astHfF3FFW0c2/tFMYjtjy9n0/FoAALYm6+pPy4JtNUyYKDAEQRAEQRAEQRAEQRAEQRAEQRCkjTC2cQsMcrRs4cHXLa3qNuMG/+K3NuWBwQjyxmPgsiGXz13v0oyZsrVVYtO/9h040MF3VVoLBSZorKYIbdQv6LPvjv+WVc5X3pjV/cVWLgMTtw8O/lEAPM+X3Ez4wL3BNUPLekyMO5tZCgBQk5kcP2uwoYhqtgpB9OQVCYwQQghtYD33ZssEJmSsZgz7zEnMrco7v+Pzd8O8Rzj262hIE0JoirGdda6Kr/plTeTYoGlrfymH6nPRdjQhhBLJRn/7FCqurHzHfcgwt8iN12q553t8TYSrEEQvNB2hUX3IU7GjV2YRtPhoeiHwPF+UmrQ4sKsi+UGwirEOP5hWwfMAAHWPr+6c7dgwPwgKTOveK63GasrIdd2jwisfDJE2eV32qzOh4sQEM5ohhNBWYWcqIXu5vYQmlKHjpizI3zpCsUvLaHh8Ljxa4SihhaoQRC9oqZVdv8GT9hVxRfsmOcj3s/e0bE8RioiMfTbnAGQeXBgxJmj8Z4cygc3/dnTHZqoIIVK7wLAgD9fhzqP85v7jDrD5DVF9wjOYtt3DWozVDG0edrIeiu6mZZVwHF/+8Nr3fxsiFzNlF32Dh8uRlnbzz1eUXI8dZDM9mYfr0ZYiijawnnMR4N56+UEG4oGfpteXHgo0Fa5q+5FH/nfQvNiznZnC8elfvsgrkvVecoeHi7OUnzadlohm/icr4danPZXRYQICoynGyvPvTf0vWo3Vxh6JJVByctlEH+fBQ1zC4y5UcpWJER0pQomcVuSwlQe8zJ1WZXNc0daRnbwTKiBjeR8jQgiR9Jtx+BFAya8/7v7xt7Ly61+PMjYghBDhKgTRE41SUc+JpaUjNjyF4oPuxkJVhBADy1Gf7br8sKCaZdlnBc94XiVVTK/vYHI0GKvfmpzC8VdmWytmTsNeX9wB7vSEjhQtl8o+b0NG3K6bdRdDIlNKhSK0udfy1NLHR5e//9HqH24UA2QciOhpQF4ITGMVguhPWwpMYht7BbjCfy2KHO0w2ME5IPZiLd8mAlOgaqy2CE1moWGONTAJvQCQOs9GrFjsXZrywpDGWE5P5iFllqWIIsYee59C2hd9FFWdPTZnQ/2hoPYiwSoE0Rux45Ls+op93oaNHrWdmcLxdxb307ZE1FzVwfdkJdxfai+voo1dDxQ1msEIYxl1FeBqlEWToweELNIvaGSsljqszgS4MFPeJcrQcc0jqEz0VT5VczxUfruCshyvvF1Bd5+VytUljTNRtCgyHvczDzfn2oiFqhBEf6hOASfKof7SuinjvHxDp8+b0l9CEyIy9tycD5B54IvwoOCIzxMfN9zkEKiS2i26CfyTQzER7kOdhg4bE/OfGq6RwGipy9Z8qEtbNytobPi0ORN7N3x8Nd7k0G6spimm58LrUF99eunY4c4ukVtu1fFZa12N5E/ZvXu+iq86Fzc5MCBy+bmKhhvu7fquuM9BxoG5QY4D7Qf5z9uXDey9pQNoihaqQhC9oSnGZuKmK4/rAACqS1I2+Mq/9BOZxbglSRnFAADPb59cMrabasy5tipJ95DVR+8UV8lPrql5/ih199S/qBZSXQM2XszleYCa0pub/JUHb2i+TS9orKakHb0XHU17BgBQl/HL6nC7hguJTUd++M/0QgCA56mHVX8ylvYIXXf8fnU9AAD77P7p5RENZ8tpr0IQBEEQBEEQBEEQBEEQBEEQBEEQBEFeG1ruMqYI3dEleuvpB89rFWa2RX6WDfsh9XM0t9xYrXcstT491J6OLXwtxeNNQ7oFG6SN+0dvuZpfATzPF9z4acFos9Z0Q7eBQl4CerqMpXYfJT9NT/h6aoiP96T3d6dzUH1ucleKtMLRrJ+xWo9Yav16KJCOLXAtoj2kW6BBhjYPPVjKV95Y+Y67k3vQoqP5HJvyfh8Dvbuhy0C9rmgxIFNmYadYtlAZ4EBTjP2yB1B5UvGyJR3cY/fezK4BADbv90MLPBqihsxdYjYkJd95WlkrP+Om5OJHvZWzh0BkhGb0dRkzjEjZvnjU9mK+ZM/bEkJa4WjWw1jdGB1jqVvvudaYjq3xWs2EdGtpkLIcf6kO7izqrnhbzfwTS+DpmiF6d6PZgdKIdPCic5ml8kl+z+6kzFLgClPWh3RW9MrUYX78tdwyAACor36Sut3LTPeA1jZEiwGZtgo7XwM5q4cpRs2w1xd3gD0VYUYzhDF0XXmXq7m9JSbQzXVk2FdHStnai7MVOxUN7D95wPH349+fMNbXw91zzNjx/r1VPnkCoUeaaL3LmCK02ZRTVdwf83uLCCH6O5pbbqxW7YPusdSt9lxrSMcWjMDWHtKtpUF5N1KmKeJLaQPrOSkAl9+Ry17fbmgdKG3IAo7UQPqSAOewbVnA3lka7BG1Lx+efDfc6MWe8vLk5ZHezsOc3X2CZ84cb6tH0ngbo2JAFjGdpvyrDjLXOEgpQggz6MvM+pojweaEEKZb5JXaujPK8aXMwk+xciMWIcTA/pP7UJHgp/nfuTbbsjZa7zIW953+cyE8XOepdKDp52jWw1ituF7LY6lb5bluko6tSwS2kMCaxm23s4/LAP523Nt/MaIpxtTGc1UaqOtcj25oGigBZAFHqvhL4zoTqX9iFX8pxIIY+R4o469OsCKUSBaYVM9lb/Oy/lNmLRW0G5CpTiFHK7knK4eJGVrkuPIxPNvvaUoIIYbe++Vb0RuRs2WIjCHNCayltM5lTBkNmX+mCIp+mK58m1vpaG6RsVpe0tJY6lb2sGk6ti4R2AIC0xS3TbUfGnPskeKYS47jAKDuWKBqvLAe3dA4UAIoBSbxS6jiL0VYEkPv/WX81UlvEUKItH9UYiZwXNG1hPV/D+gnM/hTbEfCBmTT0fufQcFGN6mpy448yF2rGH252XnPhEF9VbGzkj/btgLT22VMEdrUbVFKJZ8VH24tVnnn28TRrJuxWg1dY6lb00MN6diULhHYWgWmPW6bppiOVt27d7M2678wnYWUud1UXX/6daPpQAmgFJiR74Hy+gshFo0ERgihpB2dQuZvPHyjHNiqK3EeXYRaezkIGpAZWuSw4hGUJE6fnVTJ3pZ/hyGEEJuo6yyXFeek8fU3u0Rs1rbciOZcxtoaNOg64VgpnxsfoiaPNnE062qsblKlSyx1q3qoKR1blwhsrQJrLm6bMeo552Q1lCSGWDRTpXsSt+pACaAUmBI1gSnbkw748Aavclfm1dGcAZnp8e61ehYAypLClV9hKZHMfU0GsDXXdn08KdjfLyAkOmZekI1iLJuZwVp4k6N5l7HGBmmp26ZnXOmxqW72SvrbmNMUraejWS9jtZ6x1K3xXGtMxxa4lkBIt1CDlMy6/wjPgGkfbziTAVCXtsbHrNEftLQbOg5UY4QExhh6frljVexk31HOTs4jQxYkFLLcv2da/wm/rQkbkCmRzGd3EUDmiqGND7cRm7rO25X84DkA8Dxfcu/EpwMV92iEBdbi2/SkGZexFh+07Ye31Fcj/JUZ8oNA9HE062es1jeWWm/PteZ0bMFraQ3pFmiQljqvzeU47tm9a4mr5460Ub9ci7uh40CpXUVAYJIO3suOpuXVKF5RbtqJZaEWeDoYgiAIgiAIgiAIgiAIgiD/36ANjIa+t37rwqGtOh1aajd9ycbFfp3QwIO0Oa/5R1RsE5va6sRkM/8LwKbF2r7yn8lfDvolVrcOyv7z+1CRENTUC/OSr/Uqg7N1vlajHr5uH1HVzv8XuHfnRUz5oIcAAAAASUVORK5CYII=" alt="" />
master独自运行job 分布式运行
另一种方案:
在代码中添加: conf.set("mapred.job.tracker", "master:9000");
之后eclipse中的Run on hadoop 就会在云端执行(而不是仅仅在当前节点上map),但是其他节点会class not found异常(不知道是应为jar没找到还是主类没找到);
在eclipse工程上右键java build path->add external jar 上选中自己用jar -cvf生成的jar包,然后执行run on hadoop就可以分布式执行了。
使用这种方法之后,每次修改了程序后要先生成jar,然后在在eclipse中run on hadoop
(不知道对不对, java程序都是执行在vm上的, 不需要知道hadoop的安装路径,执行job相关的 程序就可以提交作业;jar是给别的节点用的,java程序是给自己执行的,
直接执行java程序仅仅是默认没有读取hadoop的配置文件,所以需要在程序中明确的指明hdfs和jobtracker)
在eclipse中开发,hadoop的配置文件什么时候生效,很困惑,直接在eclipse中点击Run运行,hadoop的conf下配置文件的有些选项不生效。
要在代码里conf.set("name",value);,但是单机模式时conf.set("mapred.child.java.opts","-Xss1m");也不生效,必须在RunConfigurations中配置VM变量。
从源码中可以看到,hadoop jar xxx.jar xxx ,hadoop脚本对参数jar的处理时调用RunJar类,该类的源码在src/core/org/apache/hadoop/util下,
其中的main函数(args, jar, manifest, unjar, runtime)
以上费劲周折使程序运行在真正的分步式化境中,但是这是又会有新的问题出现:
1.在程序中使用本地磁盘的路径:LocalFileSystem,或者对于FileInputStream这样的java API默认使用的就是本地地址,也可以使用LocalFileSystem
2.程序中的调试信息输出问题:eclipse中的虚拟环境为单机运行,所以程序执行起来与普通java程序类似,所有打印信息都在eclipse的console中输出,
当成许运行在分布式环境中时, main中的system.out,system.err等调试信息会在提交作业的终端中显示,但是mapper和reducer中的调试信息则不会显示到终端中,
查看他们的调试信息只能到web上的500030的JobTracker中找到相应的map task的log信息中查看。
参考:http://blog.csdn.net/azhao_dn/article/details/8003998
3.程序的修改,更改之后要先重新生成jar,才能是修改生效。
4. 在分布式环境中程序不仅仅运行在本节点上,所以一定要保证程序所需的 数据、链接库等在各个节点上都具备
从以上几点来看,在开发阶段使用eclipse的虚拟云可以大大提高开发和调试的效率。
分布式的节点在启动hadoop服务之后,会读取本节点中hadoop安装文件夹中conf下的配置文件,hdfs-site.xml下配置的相关
真正的分布式环境配置成功之后,如果启动了hdfs服务之后,有些datanode没有启动,则应该检查各个节点hadoop的配置文件是否相同。
(待思考的问题:各个分布式节点用户名要相同, hadoop安装目录要相同,hadoop配置文件要相同, java路径随便)
也可以尝试将各个节点/tmp/hadoop相关目录删除,yourhadooppath/logs/内容目录删除,重新namenode -format
跟namenode、jobtracker相关的配置选项只对master节点有作用,其他hdfs相关配置会对每个节点起作用。
配置文件、log文件、tmp临时文件对hadoop 各节点的正常运行还是很重要的。
hadoop中配置文件的相关选项还是不太了解。。 。。。 。
有时候4个节点都正常启动了,但是stop-all.sh之后再启动,关机某些节点在开启之后,启动服务,往往是最后启动的机器能够启动
dfs节点,这种开关机不同步导致的datanode节点不能正常完全启动的问题,。。。。全部重启,namenode -format,
集群节点较多的时候并不是mapper过程完全结束之后才开始reduce,很能在mapper完成大半后就开始reduce了。
setNumOfReduce之后,去掉了reduce过程,最终的输出结果是未经排序的!
有几个mapper就会将几个mapper的结果输出到最后的几个文件中part-m-0000
reduce的执行节点根据具体的执行状况分配给某节点进行,并不固定。
如果上传分块过小(<64M),则系统不再自动分块,也不会自动合并,按照上传分块为单位分配给mapper处理。
编写程序时的注意:
在反复遍历生成文件时,由于数据量很大,所以在循环内外的变量分配很重要, 不然内存会溢出。
打开的文件流,使用完之后一定要记得关闭。
定义的变量,在使用之前一定记得new 否则会产生null pointer的异常。
如果在mapper或者reducer中设置了全局变量,而在使用中这些变量又与初始值相关(如直接|、&、^,++等),则需要注意在一个单元使用完成之后进行
初始化,特别是reduce中对应的是key+list链表,一个key的所有value处理完成之后需要对全局变量进行初始化,为下一个key的处理做准备!
hadoop的运行效率不仅仅与节点的个数有关,job中的任务执行,文件的读写,数据量的划分,网路传输,精简不必要的工作
每个代码块的执行效率有关。
配置最大map和reduce并行task数量:
mapred-site.xml 配置:mapred.tasktracker.map.tasks.maximum和mapred.tasktracker.reduce.tasks.maximum, 默认都是2.
要求reduce<map, 这样可以留有一些备用节点,提高故障时作业的恢复时间,如果reduce>map,那么空余的reduce没有文件输入但是还是会
启动task生成空文件,浪费了系统资源和效率。
将大量的较小的中间键值对合并成较少的较大的中间键值对, 以减少网络中的流量(for reduce),在大数据处理时优势会显现的非常明显,可以通过web界面
查看各个过程处理的键值对数量。
mapreduce框架:
inputformat(文件分割+键值对生成) map 排序 combine(相同key的value累加) partitioner(hash映射分组)
排序+相同key构成value-list reduce(生成键值对) outputformat(写回HDFS)
过程中的系统默认:
TextInputFormat:将输入文件作为文本文件处理, 生成行行文本的键值对
IdentityMapper:将输入的键值对 原封不动的 输出
Combiner:null , 不进行中间结果的合并
Partitioner: HashPartitioner, 用hash进行分组
IdentityReducer: 将输入的中间结果键值对直接输出键值对
TextOutputFormat:将最终键值对结果生成文本文件, 每行一个<key, value>对, key 和 value之间用tab分隔。
---
setOutputKeyClass, setOutputValueClass来设置最终的key、value的类型 (?),setMapOutputKeyClass, 默认情况下和最终结果的键值对类型相同。
键值对 的类型对如何解析、理解和处理数据时有影响的。
combiner在编程时就等同于reduce,只有类的名称不同,(可能combiner的输入也是经过中间的排序和累加之后)
没有combine后,reduce接收到的就不再是<key,list<value>>的形式了。
综上: mapred默认框架是将输入文本文件按行解析, 排序, 最后按照<key, value>每行的形式输出到文本文件中。
其key的默认类型时Longwritabel, value的默认类型是Text,Long可以转化输出到文本中,显示的还是long数据的值。
当数据量很大时, 在reduce函数中进行本地排序往往不可行,算法对于不同大小的数据集不通用
通用框架的自定义: 根据输入的键值对进行处理,然后利用context上下文书写自己的下一阶段的键值对。
在进行combiner编程时,就把他当做是reducer。
要充分的利用系统的排序功能。
如果此时你run java aplication,你的程序只会在eclipse中虚拟的一个云环境中运行,而不会跑上云端去运行,所以无法再master:50070/jobtracker.jsp页面中监控到该作业;需要在main方法中添加几行代码,代码附录如下:
//在你的文件地址前自动添加:hdfs://master:9000/
conf.set("fs.default.name", "hdfs://master:9000/");
conf.set("hadoop.job.user","mango");
//指定jobtracker的ip和端口号,master在/etc/hosts中可以配置
conf.set("mapred.job.tracker","master:9001");
MapReduce框架
任务调度程序,将任务并行的分给节点进行计算
将输入文件划分为split供Mapper类使用, 考虑到数据要尽量进行本地运算,所以划分的split的大小应该<=文件块的存储大小(64M)
在完成combine和shuffle之后map的中间结果被直接写到本地磁盘,通知JpbTracker中间结果文件的位置, 再由JobTracker告知Reducer到
哪个Datanode上去取中间结果。 每个Reducer要想多个Mapper节点取得落在其负责范围内的中间结果然后执行reduce函数,形成一个最终结果文件。
执行过程:
job执行前的工作:
1.编写MapReduce代码,编译,打包成jar
2.将数据上传到HDFS上,分布式系统HDFS会自动将大文件划分成块进行分布式存储(FlieBlockLocations)
终端申请提交作业:
outputformat检查job输出目录是否已经存在,进行出错检查。
inputformat将HDFS上的job inputpath中的逻辑文件划分成逻辑的split块,(List<split>将记录每块在源文件中的偏移量和所在的主机)
将MapReduce的jar包、配置文件、split list信息上传到HDFS的一个目录中,(以Job ID作为目录名)
提交作业完毕
JobTracker:
将终端提交的作业放到作业队列中
开始执行当前作业时,首先将作业目录中的InputSplit信息取出来,根据实际运行情况为每个InputSplit创建一个Map任务
创建reduce等任务
TaskTracker
接到新任务后,将这个任务的程序jar文件、数据split从HDFS上复制到本地磁盘上,进行本地计算。
配置环境相关不错的网址:http://www.cnblogs.com/xia520pi/archive/2012/05/20/2510723.html