stub Uni3D: Sahaminta Matalaadda Midaysan ee 3D ee Miisaanka - Unite.AI
Connect nala

Sirdoonka Artificial

Uni3D: Sahaminta Matalaadda Midaysan ee 3D ee Miisaanka

mm
Updated on

Kor-u-qaadista matalaadda qoraalka iyo muuqaalku waxay ahayd diiradda ugu weyn ee cilmi-baarista sannadihii la soo dhaafay. Horumarka iyo cilmi baarista la sameeyay waayadii hore ayaa horseeday kacdoonno badan oo xagga barashada luqadda iyo aragtida. Si kastaba ha ahaatee, inkasta oo ay caan ku yihiin qoraalka is-miidaaminta iyo muuqaallada muuqaalka ah, miisaamidda matalaadda muuqaallada 3D iyo walxaha si ku filan loogama hadlin.

Maanta, waxaan ka wada hadli doonaa Uni3D, qaabka aasaasiga ah ee 3D kaas oo ujeedadiisu tahay sahaminta matalayaasha 3D ee midaysan. Qaab dhismeedka Uni3D wuxuu shaqaaleeyaa qaab-dhismeedka 2D-bilawga ah ee ViT, horay loo tababaray dhamaadka-ilaa-dhamaadka, si loo waafajiyo astaamaha qoraalka-muuqaalka astaamaha u dhigma ee 3D.

Qaab dhismeedka Uni3D wuxuu isticmaalaa hawlo marmarsiinyo ah iyo qaab dhismeedka fudud si looga faa'iidaysto tirada badan ee moodooyinka 2D ee horay loo tababbaray iyo moodooyinka qoraalka-ku toosan ee bilowga iyo bartilmaameedyada, siday u kala horreeyaan. Habkani wuxuu daaha ka qaadayaa kartida buuxda ee moodooyinka 2D iyo xeeladaha si loogu cabbiro adduunka 3D.

Maqaalkan, waxaan si qoto dheer u sii dhex geli doonaa 3D aragtida kombiyuutarka iyo qaabka Uni3D, sahaminta fikradaha muhiimka ah iyo qaab dhismeedka moodeelka. Markaa, aan bilowno.

Uni3D iyo 3D Matalaadda Barashada: Hordhac

Dhowrkii sano ee la soo dhaafay, aragtida kombuyuutarku waxay u soo baxday inay tahay mid ka mid ah meelaha aadka loo maalgeliyay ee warshadaha AI. Ka dib horumarro la taaban karo oo laga sameeyay qaab-dhismeedka aragga kumbuyuutarka ee 2D, horumariyayaashu waxay u weeciyeen diiradda aragtida kombuyuutarka 3D. Goobtan, gaar ahaan barashada matalaadda 3D, waxay ku biirtaa dhinacyada sawirada kumbuyuutarka, barashada mashiinka, aragga kombuyuutarka, iyo xisaabta si loo toosiyo habaynta iyo fahamka joomatari 3D. Horumarka degdega ah ee dareemayaasha 3D sida LiDAR, oo ay weheliyaan codsiyadooda baahsan ee warshadaha AR/VR, waxay keentay in 3D matalaad barashada ay hesho feejignaan dheeraad ah. Codsiyada suurtagalka ah waxay sii wadaan inay koraan maalin kasta.

In kasta oo qaab-dhismeedka jira ay muujiyeen horumar la taaban karo oo laga sameeyay qaab-dhismeedka moodeelka 3D, qaabaynta hawsha ku jihaysan, iyo ujeedooyinka waxbarashada, inta badan waxay sahamiyaan qaab dhismeedka 3D qiyaas yar oo leh xog xaddidan, cabbirro, iyo xaalado hawleed. Caqabadda barashada matalaada 3D ee la miisaami karo, oo markaas lagu dabaqi karo codsiyada waqtiga-dhabta ah ee jawiga kala duwan, ayaa weli ah mid aan si weyn loo sahamin.

Socodka, dhowrkii sano ee la soo dhaafay, miisaan moodooyinka luqadaha waaweyn kuwaas oo horay loo tababaray ayaa ka caawiyay kacaanka farsamaynta luqadda dabiiciga ah Domain, iyo shaqooyinkii u dambeeyay waxay muujiyeen tarjumaad horumarka ilaa 2D oo ka yimid luqadda iyadoo la adeegsanayo xogta iyo qaabaynta qaabaynta taasoo u sahlaysa horumariyayaashu inay isku dayaan oo isku dayaan guushan si ay u bartaan matalaadda 3D oo la miisaami karo & lagu wareejin karo codsiyada adduunka dhabta ah. 

Uni3D waa qaabdhismeed 3D ah oo hore u tababaran oo la miisaami karo oo la midaysan yahay oo la sameeyay iyada oo ujeedadu tahay in la barto matalo 3D baaxad weyn oo tijaabiya xadkeeda cabirka in ka badan bilyan halbeegyada, in ka badan 10 milyan sawirro oo lagu lamaanay in ka badan 70 milyan oo qoraal ah, iyo in ka badan hal milyan oo qaabab 3D ah. . Shaxanka hoose wuxuu isbarbar dhigayaa saxnaanta eber-ka-soo-baxa cabbirrada qaab-dhismeedka Uni3D. Qaab dhismeedka Uni3D wuxuu si guul leh u miisaamay matalaada 3D min 6 milyan ilaa in ka badan bilyan. 

Qaab dhismeedka Uni3D wuxuu ka kooban yahay 2D ViT ama Transformer aragga sida encoder-ka 3D kaas oo markaa hore loo tababaray dhamaadka-ilaa-dhamaadka si loo waafajiyo muuqaalada qoraalka-qoraalka ah ee isku toosan ee astaamaha 3D dhibic daruuraha. Qaab dhismeedka Uni3D wuxuu adeegsadaa hawlo marmarsiinyo ah iyo qaab dhismeedka fudud si uu uga faa'iidaysto tirada badan ee moodooyinka 2D ee horay loo tababbaray iyo moodooyinka qoraalka la jaanqaadaya sida bilawga iyo bartilmaameedyada siday u kala horreeyaan, sidaas awgeedna waxay sii daynaysaa awoodda buuxda ee moodooyinka 2D, iyo xeeladaha lagu cabbirayo iyaga oo la gaadhsiinayo adduunka 3D. Dabacsanaanta & miisaannaanta qaab-dhismeedka Uni3D waxaa lagu cabbiraa qaab ahaan

  1. Qaadista qaabka laga bilaabo 6M ilaa in ka badan hal bilyan. 
  2. Bilaabida 2D ee qoraalka laga ilaalinayo muuqaal waxbarasho is-maamulid
  3. Qaabka bartilmaameedka-sawir-ku-beegista oo miisaankeedu yahay 150 milyan ilaa in ka badan hal bilyan. 

Marka la eego qaabka dabacsan oo midaysan ee ay bixiso Uni3D, horumariyayaashu waxay eegaan kor u kaca isku xidhan ee waxqabadka marka ay timaado cabirida qayb kasta. Barashada matalaadda 3D ee ballaaran ayaa sidoo kale si weyn uga faa'iideeysa 2D la wadaagi karo iyo xeeladaha kor u qaadista. 

Sida lagu arki karo shaxanka hoose, qaabka Uni3D wuxuu soo bandhigayaa kor u kaca waxqabadka marka la barbar dhigo farshaxankii hore ee goobaha dhowr-shot iyo eber-shot. Waxaa xusid mudan in qaab dhismeedka Uni3D uu ku soo celiyay dhibcaha saxsanaanta kala soocida eber-shot oo ka badan 88% ModelNet taasoo la siman waxqabadka dhowr hab oo kormeerka fanka ah. 

Intaa waxaa dheer, qaabka Uni3D sidoo kale wuxuu keenaa saxsanaan heer sare ah & waxqabadka marka la fulinayo hawlo kale oo matale ah 3D sida qayb qaybinta, iyo fahamka adduunka furan. Qaab dhismeedka Uni3D wuxuu higsanayaa inuu soo afjaro farqiga u dhexeeya aragga 2D iyo aragtida 3D iyadoo la miisaaminayo moodooyinka aasaasiga ah ee 3D oo leh hab tababar hore oo midaysan oo fududna leh si loo barto matalo 3D adag oo badan oo kala duwan oo hawlo ah, taas oo ugu dambeyntii gacan ka geysan karta isku dhafka 2D iyo 3D aragti oo ka kooban qaabab kala duwan oo kala duwan.

Uni3D : Shaqada la xidhiidha

Qaab dhismeedka Uni3D wuxuu soo jiitaa dhiirigelin, wuxuuna wax ka bartaa horumarkii ay sameeyeen barasho matalida 3D ee hore, iyo moodooyinka aasaasiga ah gaar ahaan habab kala duwan. 

3D Matalaadda Barashada

Habka barashada 3D matalaadda waxay isticmaashaa dhibcaha daruuraha si ay u fahmaan 3D shayga, goobtan waxaa sahamiyay horumariyayaashu wax badan dhawaanahan, waxaana la arkay in dhibcahan daruuriga ah horay loogu tababari karo is-maamulid iyadoo la isticmaalayo gaar ah. Hawlaha marmarsiiyo 3D oo ay ku jiraan qaabaynta barta maaskarada, dib-u-dhiska, iyo barashada ka duwan. 

Waxaa xusid mudan in hababkani ay ku shaqeeyaan xog xaddidan, inta badanna ma baaraan matalaad badan oo 3D ah 2D ama NLP. Si kastaba ha ahaatee, guushii dhowayd ee qaabka CLIP ee soo celisa waxtarka sare ee barashada fikradaha muuqaalka ah ee qoraalka cayriinka ah iyadoo la adeegsanayo habka barashada ka soo horjeeda, oo dheeraad ah waxay raadineysaa inay barato matalaadda 3D iyada oo la waafajinayo sawirka, qoraalka, iyo sifooyinka dhibcaha daruuraha iyadoo la adeegsanayo habka barashada isbarbardhigga. 

Qaababka Aasaaska

Horumariyayaashu waxay si buuxda uga shaqaynayeen samaynta moodooyinka aasaasiga ah si ay kor ugu qaadaan oo ay u mideeyaan matalaada hababka kala duwan. Tusaale ahaan, qaybta NLP, horumariyayaashu waxay ka shaqaynayeen qaab-dhismeedyo kor u qaadi kara moodooyinka luqadda ee horay loo tababaray, waxayna si tartiib tartiib ah u isbeddelaysaa warshadaha NLP. Intaa waxaa dheer, horumarka ayaa lagu arki karaa goobta aragtida 2D sidoo kale sababtoo ah horumariyayaashu waxay ka shaqeynayaan qaab-dhismeedka isticmaala xogta iyo farsamooyinka cabbiraadda moodeelka si ay uga caawiyaan horumarka luqadda ilaa moodooyinka 2D, in kasta oo qaab-dhismeedkan ay adag tahay in lagu soo celiyo moodooyinka 3D sababtoo ah helitaan xaddidan ee xogta 3D, iyo caqabadaha la kulmay markii la mideynayo & kor u qaadista qaababka 3D. 

Markaad ka barato labada qaybood ee shaqada ee sare, horumariyayaashu waxay abuureen Qaab dhismeedka Uni3D, qaabka aasaasiga ah ee ugu horreeya ee 3D oo leh in ka badan hal bilyan oo cabbir kaas oo ka dhigaya adeegsiga ViT midaysan ama qaab-dhismeedka Vision Transformer kaas oo u oggolaanaya horumariyeyaasha inay cabbiraan moodalka Uni3D iyagoo isticmaalaya xeeladaha 3D ama NLP midaysan ee kor loogu qaadayo moodooyinka. Horumariyayaashu waxay rajeynayaan in habkani uu u oggolaan doono qaabka Uni3D inuu kabo farqiga hadda kala saaraya 2D iyo 3D aragtida oo ay weheliso fududaynta isku dhafka qaababka badan.

Uni3D: Habka iyo Dhismaha

Sawirka kore wuxuu muujinayaa dulmarka guud ee qaabka Uni3D, qaab 3D hore ah oo la miisaami karo oo midaysan oo loogu talagalay barashada matalaadda 3D ee ballaaran. Soo-saarayaashu waxay adeegsadaan in ka badan 70 milyan oo qoraal ah, iyo 10 milyan oo sawirro ah oo lagu lammaaniyay in ka badan milyan qaab 3D si ay u cabbiraan qaabka Uni3D ilaa in ka badan bilyan cabbir. Qaab dhismeedka Uni3D wuxuu isticmaalaa 2D ViT ama Vision Transformer sida encoder-ka 3D ka dibna loo tababaray dhamaadka-ilaa-dhamaadka si loo waafajiyo xogta-sawirka qoraalka iyo astaamaha dhibicda daruuraha 3D, taasoo u oggolaanaysa qaabka Uni3D inuu keeno hufnaanta la doonayo & saxnaanta guud ahaan habayn ballaaran oo bartilmaameedyo ah. Aynu hadda si faahfaahsan u eegno shaqada qaabka Uni3D. 

Kordhinta Qaabdhismeedka Uni3D

Daraasadihii hore ee barashada matalaadda dhibcaha daruuraha ayaa dhaqan ahaan si weyn diiradda u saaray naqshadeynta qaab-dhismeedka moodeelka gaarka ah kuwaas oo bixiya waxqabad ka wanaagsan dhammaan noocyada kala duwan ee codsiyada, oo ka shaqeeya qadar xaddidan oo xog ah iyada oo ay ugu wacan tahay xog-ururinta. Si kastaba ha ahaatee, daraasadihii ugu dambeeyay waxay isku dayeen inay sahamiyaan suurtagalnimada isticmaalka tababbarka hore ee 3D laakiin ma jiraan natiijooyin waaweyn oo ay ugu wacan tahay helitaanka xogta 3D xaddidan. Si loo xalliyo dhibaatada miisaanka ee qaab-dhismeedka 3D, qaabka Uni3D wuxuu ka faa'iidaysanayaa awoodda qaab-dhismeedka transformer-ka vanilj kaas oo ku dhawaad ​​muraayada ah Transformer-ka, wuxuuna xallin karaa dhibaatooyinka is-miidaaminta iyadoo la adeegsanayo xeeladaha 2D ama NLP midaysan si loo cabbiro cabbirka qaabka. 

Prior studies on cloud point representation learning have traditionally focussed heavily on designing particular model architectures that deliver better performance across a wide range of applications, and work on a limited amount of data thanks to small-scale datasets. However, recent studies have tried exploring the possibility of using scalable pre-training in 3D but there were no major outcomes thanks to the availability of limited 3D data. To solve the scalability problem of 3D frameworks, the Uni3D framework leverages the power of a vanilla transformer structure that almost mirrors a Vision Transformer, and can solve the scaling problems by using unified 2D or NLP scaling-up strategies to scale the model size. 

Initializing Uni3D

Another major challenge encountered by prior works involved in the scaling of 3D representations, the difficulties in convergence, and overfitting that were a result of the large size of the models. An effective approach to overcome this hurdle is to pretrain individual 3D backbones with specified 3D pretext tasks, and initialize pretrained parameters. However, the approach is accompanied with high training costs, and it is also difficult to establish a robust initialization for cross-modal learning thanks to the limited amount of 3D data available for training purposes. 

The Uni3D framework leverages a vanilla transformer, the structure of which closely resembles ViT. With this approach, the Uni3D framework can naturally adopt the pre-trained large models with other modalities to initialize the Uni3D framework. 

Multi-Modal Alignment

The Uni3D framework attempts to learn multi-model alignments across image, language, and point clouds by making use of paradigms similar to OpenShape, and ULIP frameworks. Furthermore, to ensure a fair comparison with other methods, the Uni3D framework uses the ensembled 3D dataset by OpenShape for training purposes. This ensembled dataset by OpenShape consists 4 3D datasets: 

  1. Objaverse. 
  2. ShapeNet. 
  3. 3D-FUTURE. 
  4. ABO. 

Tijaabada iyo Natiijooyinka

The Uni3D framework is tested across different settings, and across various classification tasks including its performance in zero-shot, and few-shot settings, results around open world understandings, and more. Let’s have a detailed look into these results.

Zero Shot Shape Classification

To evaluate the performance of the Uni3D framework across zero-shot shape classification tasks, the developers conduct experiments across three benchmarks including ModelNet, ScanObjNN, and Objaverse-LVIS benchmark datasets. ModelNet, and ScanObjNN are datasets widely used for classification tasks, and they consist of 15, and 40 object categories respectively, whereas the Objaverse-LVIS benchmark is a cleaned & annotated dataset consisting of over 40,000 objects across 1,100+ categories. The comparison between the frameworks is demonstrated in the image below, and as it can be seen, the Uni3D framework significantly outperforms the previous state of the art frameworks across different settings. 

Few-Shot Linear Probing

In AI, Linear Probing is a common method used to evaluate the representations that a framework or a model learns. To evaluate Uni3D’s linear probing ability, the developers freeze the parameters of the Uni3D framework using the common settings as OpenShape. Following this, the developers train a linear classifier for Uni3D using few-shot class labels. The figure below demonstrates the linear probing ability of different frameworks on the Objaverse-LVIS dataset, and demonstrates the average performance of the model across 10 random seeds. As it can be seen, the Uni3D framework outperforms existing methods significantly under different few-shot settings. 

Open-World Understanding

To evaluate the capability of the Uni3D framework to understand real-world shapes & objects in real-time, developers use ScanNet and CLIP datasets to explore Uni3D’s performance. It is worth noting that the ground truth instant segmentation is available, and the primary motive is to recognize the category of every scene’s individual instant in a zero-shot setting. The results are demonstrated in the image below. As it can be seen, the Uni3D framework delivers exceptional results when performing real-world understanding & recognition. The Uni3D framework outperforms existing frameworks by a significant margin despite never training on real-world datasets. 

Cross-Modal Retrieval

The multi-modal representations learned by the Uni3D framework can allow the framework to retrieve 3D shapes naturally either from texts or images. To retrieve the 3D shapes, the model calculates the cosine similarity between the embeddings of 3D shapes, and the embeddings of a query text prompt or a query image. The framework then makes use of the KNN or K Nearest Neighbour algorithm to generate 3D shapes that resemble the query the most, and the results are demonstrated in the figure below. As it can be seen, the Uni3D framework successfully uses real-world images to retrieve 3D shapes. Furthermore, it is worth noting that training images are only for rendering purposes, and the gap between real-world and training images is substantial. Additionally, the model also takes two input images, and retrieves shapes similar to both input images by using the cosine similarity between the embedding averages of both the images, and their embedded 3D shapes. The results are interesting as they demonstrate Uni3D’s ability to learn diverse 3D representations, and perceive multiple 2D signals. 

In the first column, the framework uses 2 query images to return 3D shapes that are most similar to the query images. In the second column, the framework uses two input images to retrieve 3D shapes that resemble both the input images. Finally, in the final column, the model uses query texts, and returns 3D shapes that resemble the text query the maximum. 

Afkaarta Final

In this article, we have talked about Uni3D, a scalable and unified pretraining 3D framework developed with the aim to learn large-scale 3D representations that tests its limits at the scale of over a billion parameters, over 10 million images paired with over 70 million texts, and over a million 3D shapes. The developers of the framework have included a vanilla transformer with its structure equivalent to ViTs that allows them to scale up the Uni3D framework using unified 2D or NLP scaling strategies. Furthermore, the Uni3D framework can leverage a wide array of pre-trained 2D frameworks and 2D strategies to the 3D world. The experimental results have already demonstrated the huge potential of the Uni3D framework as the Uni3D framework returns accurate & efficient results across a wide array of settings, and outperforms existing state-of-the-art frameworks. 

"Injineer xirfad ahaan, qoraa qalbigiisa". Kunal waa qoraa farsamo oo leh jacayl qoto dheer & faham AI iyo ML, oo ​​u heellan fududaynta fikradaha adag ee dhinacyadan iyada oo loo marayo dukumeenti hawleed iyo macluumaad leh.