7dbe
ÏàÖúµ¥Î»°üÀ¨Ç廪¡¢¹ú¿Æ´ó¡¢ÉϺ£½»´ó¡¢°¢Àï°Í°Í¡£±¾ÎĵÚÒ»×÷ÕßΪÒó¶«Ë¶£¬Ç廪´óѧÅÌËã»úϵ¡¸Ë®Ä¾Ñ§Õß¡¹²©ºó£¬ÖпÆÔº²©Ê¿£¬ÔøÒÔÒ»×÷Éí·ÝÔÚ Nature Communications¡¢IEEE CVPR¡¢IEEE ICCV¡¢ACM MM¡¢IEEE TITS µÈ¹ú¼ÊÆÚ¿¯/¾Û»á½ÒÏþÂÛÎÄ£¬²¢ÈÎ NeurIPS¡¢CVPR¡¢ICCV¡¢ICLR¡¢IEEE TIP¡¢IEEE TMM µÈ¾Û»áÆÚ¿¯Éó¸åÈË¡£Ôø»ñ¡¸Öйú¿ÆÑ§ÔºÔº³¤½±¡¹£¬²¢Óë΢ÈíÑÇÖÞÑо¿Ôº MSRA ºÍ°¢Àï°Í°Í¼¯ÍžÙÐпÆÑÐÏàÖú¡£Ñо¿Æ«Ïò°üÀ¨ÅÌËã»úÊÓ¾õ¡¢²ÎÊý¸ßЧ΢µ÷¡¢ÊÓÆµÌìÉú¡¢¶àģ̬ÒÔ¼°Ò£¸ÐͼÏñ½âÒëµÈ¡£ Mona £¨Multi-cognitive Visual Adapter£©ÊÇÒ»ÖÖÐÂÐÍÊÓ¾õÊÊÅäÆ÷΢µ÷ÒªÁ죬ּÔÚ Í»ÆÆ¹Å°åÈ«²ÎÊý΢µ÷£¨full fine-tuning£©ÔÚÊÓ¾õʶ±ðʹÃüÖеÄÐÔÄÜÆ¿¾± Mona ÒªÁìͨ¹ýÒýÈë¶àÈÏÖªÊÓ¾õÂ˲¨Æ÷ºÍÓÅ»¯ÊäÈëÂþÑÜ£¬½öµ÷½â 5% µÄÖ÷¸ÉÍøÂç²ÎÊý£¬¾ÍÄÜÔÚʵÀýÖ§½â¡¢Ä¿µÄ¼ì²â¡¢ÐýתĿµÄ¼ì²âµÈ¶à¸ö¾µäÊÓ¾õʹÃüÖÐÓâԽȫ²ÎÊý΢µ÷µÄЧ¹û£¬ÏÔÖø½µµÍÁËÊÊÅäºÍ´æ´¢±¾Ç®£¬ÎªÊÓ¾õÄ£×ӵĸßЧ΢µ÷ÌṩÁËеÄ˼Ð÷¡£ ÂÛÎÄÁÁµã Ëæ×ÅÏÖ´úÉî¶ÈѧϰµÄÉú³¤£¬ÑµÁ·Êý¾ÝºÍÄ£×Ó¹æÄ£µÄÔöÌí³ÉΪģ×ÓÐÔÄܵÄÖ÷ÒªÔöÌíµã£¬µ«ËæÖ®¶øÀ´µÄÊÇÄ£×ӵıÊÖ±Ó¦ÓúÍ΢µ÷±¾Ç®ºÍÄѶȵÄÌáÉý¡£ ¹Å°åÈ«Á¿Î¢µ÷ÐèÒª¸üÐÂÄ£×ÓËùÓвÎÊý£¨Èç GPT-3 µÄ 1750 ÒÚ²ÎÊý£©£¬ÅÌË㱾Ǯ¼«¸ß¡£×ÝÈ»ÒÔÔçÆÚµÄ BERT ΪÀý£¬µ¥¿¨ÑµÁ· 100 ÍòÊý¾ÝÒ²Ðè 5-7 Сʱ£¬¶ÔÓ²¼þ×ÊÔ´ºÍʱ¼äµÄÒªÇóÏÞÖÆÁËÑо¿¸´ÏÖºÍÏÖʵӦÓᣠͬʱ£¬Ëæ×ÅÄ£×Ó²ÎÊý´ÓÒÚ¼¶ÂõÏòÍòÒÚ¼¶£¬Ö±½Ó΢µ÷²»µ«±¾Ç®¸ß°º£¬»¹¿ÉÄÜÒò¹ýÄâºÏµ¼ÖÂÐÔÄÜϽµ¡£±ðµÄ£¬¶àʹÃü³¡¾°ÏÂÐèΪÿ¸öʹÃüÉúÑÄÍêÕûÄ£×Ó¸±±¾£¬´æ´¢±¾Ç®¾çÔö¡£ ²ÎÊý¸ßЧ΢µ÷£¨Parameter Efficient Fine-Tuning£¬PEFT£©Í¨¹ý¼á³ÖԤѵÁ·Ä£×Ó²ÎÊý¶³½á£¬½öµ÷½âÉÙÁ¿²ÎÊý¾Í¿ÉʵÏÖ´óÄ£×ÓÔÚ±ÊÖ±Ó¦ÓÃÁìÓòµÄ¸ßЧÊÊÅä¡£µ«ÏÖÔÚ´ó´ó¶¼ PEFT ÒªÁ죬ÓÈÆäÊÇÊÓ¾õÁìÓòµÄ PEFT ÒªÁìµÄÐÔÄÜÏà½ÏÓÚÈ«Á¿Î¢µ÷¶øÑÔ»¹±£´æÁÓÊÆ¡£ Mona ͨ¹ý¸üÊʺÏÊÓ¾õÐźŴ¦Öóͷ£µÄÉè¼ÆÒÔ¼°¶ÔԤѵÁ·ÌØÕ÷ÂþÑܵĶ¯Ì¬ÓÅ»¯£¬ÔÚСÓÚ 5% µÄ²ÎÊý±¾Ç®ÏÂÊ×´ÎÍ»ÆÆÁËÈ«Á¿Î¢µ÷µÄÐÔÄܼÏËø£¬ÎªÊÓ¾õ΢µ÷ÌṩÁËеĽâ¾ö¼Æ»®¡£ ±¾ÎĵĽ¹µãÔÚÓÚÇ¿µ÷£º£¨1£©PEFT ¹ØÓÚÊÓ¾õÄ£×ÓÐÔÄÜÉÏÏÞµÄÌáÉý£¨ÓÈÆäÊDzÎÊýÄ¿½Ï´óµÄÄ£×Ó£©£»£¨2£©ÊÓ¾õÄ£×ÓÔÚȫ΢µ÷£¨ÓÈÆäÊÇÉÙÑù±¾ÇéÐΣ©»á±£´æÑÏÖØµÄ¹ýÄâºÏÎÊÌ⣻£¨3£©1¡ÁLVM + n¡ÁAdapter ģʽÔÚÏÖʵӪҵÖÐDZÔÚµÄÐÔÄܺÍЧÂÊÓÅÊÆ¡£ ¹ØÓÚÏêϸӪҵÀ´Ëµ£¬ÓÐЩÓõ½ LVM »òÕß¶àģ̬´óÄ£×Ó£¨Èç OCR µÈʹÃü£©µÄʹÃü»á¶ÔÊÓ¾õ±àÂëÆ÷²¿·Ö¾ÙÐÐÀο¿»ò½ö΢µ÷ linear ²ãÀ´Ë³Ó¦ÏÂÓÎÊý¾Ý¡£Mona µÄ±£´æÀíÂÛÉÏ¿ÉÒÔ½øÒ»²½ÌáÉý LVM¡¢¶àģ̬´óÄ£×Ó¶ÔÊÓ¾õÌØÕ÷µÄÃ÷È·ºÍÖØ¹¹£¬ÓÈÆäÊǹØÓÚһЩÉÙÑù±¾ post-training ÎÊÌâ¡£ ÒªÁì Mona °üÀ¨½µÎ¬¡¢¶àÈÏÖªÊÓ¾õÂ˲¨Æ÷¡¢¼¤»îº¯ÊýºÍÉýάµÈÄ£¿é£¬²¢ÔÚÊÊÅäÆ÷ÄÚ²¿¼ÓÈëÁËÌøÔ¾ÅþÁ¬£¨Skip-Connections£©£¬ÒÔÔöǿģ×ÓµÄ˳ӦÄÜÁ¦¡£ÕâÖֽṹÉè¼ÆÊ¹µÃ Mona Äܹ»ÔÚ¼á³Ö¸ßЧµÄͬʱ£¬ÏÔÖøÌáÉýÊÓ¾õʹÃüµÄÐÔÄÜ¡£ ¶àÈÏÖªÊÓ¾õÂ˲¨Æ÷ Mona ÒªÁìµÄ½¹µãÔÚÓÚÒýÈëÁ˶àÈÏÖªÊÓ¾õÂ˲¨Æ÷£¬ÕâЩÂ˲¨Æ÷ͨ¹ýÉî¶È¿ÉÊèÉ¢¾í»ý£¨Depth-Wise Convolution£©ºÍ¶à±ê×¼¾í»ýºË£¨3¡Á3¡¢5¡Á5¡¢7¡Á7£©À´ÔöÇ¿ÊÊÅäÆ÷¶ÔÊÓ¾õÐźŵĴ¦Öóͷ£ÄÜÁ¦¡£Óë¹Å°åµÄÏßÐÔÊÊÅäÆ÷²î±ð£¬Mona רÃÅÕë¶ÔÊÓ¾õʹÃüÉè¼Æ£¬Äܹ»¸üºÃµØ´¦Öóͷ£¶þάÊÓ¾õÌØÕ÷£¬Í¨¹ý¶à±ê×¼ÌØÕ÷ÈÚºÏÌáÉýÄ£×Ó¶ÔÊÓ¾õÐÅÏ¢µÄÃ÷È·ÄÜÁ¦¡£ ÊäÈëÓÅ»¯ Mona ÔÚÊÊÅäÆ÷µÄǰ¶Ë¼ÓÈëÁËÂþÑÜÊÊÅä²ã£¨Scaled LayerNorm£©£¬ÓÃÓÚµ÷½âÊäÈëÌØÕ÷µÄÂþÑÜ¡£ÕâÖÖÉè¼ÆÄܹ»ÓÅ»¯´ÓÀο¿²ãת´ï¹ýÀ´µÄÌØÕ÷ÂþÑÜ£¬Ê¹Æä¸üÊʺÏÊÊÅäÆ÷µÄ´¦Öóͷ££¬´Ó¶øÌá¸ß΢µ÷ЧÂÊ¡£ ʵÑéЧ¹û ʵÑéÉèÖà ÂÛÎÄÔÚ¶à¸ö´ú±íÐÔÊÓ¾õʹÃüÉϾÙÐÐÁËʵÑ飬°üÀ¨£º ʵÑéʹÓÃÁË SwinTransformer ϵÁÐ×÷ΪÖ÷¸ÉÍøÂ磬²¢»ùÓÚ ImageNet-22k Êý¾Ý¼¯¾ÙÐÐԤѵÁ·¡£ ÐÔÄܱÈÕÕ ÊÕÁ²ÐÔÆÊÎö ÔÚËùÓÐÒªÁìÖУ¬Mona ÊÕÁ²ËÙÂʸü¿ì£¬²¢ÇÒÏÔ×ÅÁè¼ÝÁËȫ΢µ÷¡£ ¼´²å¼´ÓÃÄ£¿é import torch.nn as nn import torch.nn.functional as F # ------------------------------ Mona Ä£¿é ------------------------------ INNER_DIM = 64 class MonaOp(nn.Module): def __init__(self, in_features): super().__init__() self.conv1 = nn.Conv2d(in_features, in_features, kernel_size=3, padding=3 // 2, groups=in_features) self.conv2 = nn.Conv2d(in_features, in_features, kernel_size=5, padding=5 // 2, groups=in_features) self.conv3 = nn.Conv2d(in_features, in_features, kernel_size=7, padding=7 // 2, groups=in_features) self.projector = nn.Conv2d(in_features, in_features, kernel_size=1, ) def forward(self, x): identity = x conv1_x = self.conv1(x) conv2_x = self.conv2(x) conv3_x = self.conv3(x) x = (conv1_x + conv2_x + conv3_x) / 3.0 + identity identity = x x = self.projector(x) return identity + x class Mona(BaseModule): def __init__(self, in_dim, factor=4): super().__init__() self.project1 = nn.Linear(in_dim, INNER_DIM) self.nonlinear = F.gelu self.project2 = nn.Linear(INNER_DIM, in_dim) self.dropout = nn.Dropout(p=0.1) self.adapter_conv = MonaOp(INNER_DIM) self.norm = nn.LayerNorm(in_dim) self.gamma = nn.Parameter(torch.ones(in_dim) * 1e-6) self.gammax = nn.Parameter(torch.ones(in_dim)) def forward(self, x, hw_shapes=None): identity = x x = self.norm(x) * self.gamma + x * self.gammax project1 = self.project1(x) b, n, c = project1.shape h, w = hw_shapes project1 = project1.reshape(b, h, w, c).permute(0, 3, 1, 2) project1 = self.adapter_conv(project1) project1 = project1.permute(0, 2, 3, 1).reshape(b, n, c) nonlinear = self.nonlinear(project1) nonlinear = self.dropout(nonlinear) project2 = self.project2(nonlinear) return ½áÂÛ Mona ÒªÁìͨ¹ý¶àÈÏÖªÊÓ¾õÂ˲¨Æ÷ºÍÊäÈëÓÅ»¯£¬ÏÔÖøÌáÉýÁËÊÓ¾õʹÃüµÄ΢µ÷ÐÔÄÜ£¬Í¬Ê±´ó·ùïÔÌÁ˲ÎÊýµ÷½âÁ¿¡£ÕâÒ»ÒªÁì²»µ«ÔÚ¶à¸öÊÓ¾õʹÃüÖÐÓâÔ½Á˹ŰåÈ«²ÎÊý΢µ÷£¬»¹ÎªÎ´À´ÊÓ¾õÄ£×ӵĸßЧ΢µ÷ÌṩÁËÐÂµÄÆ«Ïò¡£ Ô¤Ó¡°æÊ±´ú£¬Mona Òѱ»¸´µ©¡¢Öпƴó¡¢ÄÏ´ó¡¢Îä´óµÈ¶à¼Òµ¥Î»µÄÊÂÇéÊÓΪ SOTA ÒªÁìÔËÓÃÔÚҽѧ¡¢Ò£¸ÐµÈÁìÓò¡£Mona µÄ¿ªÔ´´úÂ뽫½øÒ»²½Íƶ¯ÕâÒ»ÁìÓòµÄÑо¿ºÍÓ¦Óá£