Nome |
# |
Artpedia: A New Visual-Semantic Dataset with Visual and Contextual Sentences in the Artistic Domain, file e31e124d-bbd7-987f-e053-3705fe0a095a
|
926
|
Spaghetti Labeling: Directed Acyclic Graphs for Block-Based Connected Components Labeling, file e31e124d-c7be-987f-e053-3705fe0a095a
|
872
|
Towards Reliable Experiments on the Performance of Connected Components Labeling Algorithms, file e31e124d-3934-987f-e053-3705fe0a095a
|
852
|
Attentive Models in Vision: Computing Saliency Maps in the Deep Learning Era, file e31e124c-e62a-987f-e053-3705fe0a095a
|
834
|
Shot and Scene Detection via Hierarchical Clustering for Re-using Broadcast Video, file e31e124b-0f5e-987f-e053-3705fe0a095a
|
777
|
Automatic Image Cropping and Selection using Saliency: an Application to Historical Manuscripts, file e31e124d-038f-987f-e053-3705fe0a095a
|
761
|
Meshed-Memory Transformer for Image Captioning, file e31e124e-51a5-987f-e053-3705fe0a095a
|
711
|
Modeling Multimodal Cues in a Deep Learning-based Framework for Emotion Recognition in the Wild, file e31e124d-2fe2-987f-e053-3705fe0a095a
|
654
|
Optimized Connected Components Labeling with Pixel Prediction, file e31e124c-2440-987f-e053-3705fe0a095a
|
530
|
Layout analysis and content classification in digitized books, file e31e124c-98c2-987f-e053-3705fe0a095a
|
471
|
Historical Document Digitization through Layout Analysis and Deep Content Classification, file e31e124c-21e5-987f-e053-3705fe0a095a
|
459
|
Connected Components Labeling on DRAGs, file e31e124d-5133-987f-e053-3705fe0a095a
|
435
|
Hand Segmentation for Gesture Recognition in EGO-Vision, file e31e124c-22cf-987f-e053-3705fe0a095a
|
429
|
Image-to-Image Translation to Unfold the Reality of Artworks: an Empirical Analysis, file e31e124d-b759-987f-e053-3705fe0a095a
|
417
|
Learning to Read L'Infinito: Handwritten Text Recognition with Synthetic Training Data, file e31e124f-d7d1-987f-e053-3705fe0a095a
|
416
|
YACCLAB - Yet Another Connected Components Labeling Benchmark, file e31e124c-2289-987f-e053-3705fe0a095a
|
384
|
Visual Saliency for Image Captioning in New Multimedia Services, file e31e124d-305f-987f-e053-3705fe0a095a
|
368
|
Multi-Level Net: a Visual Saliency Prediction Model, file e31e124c-2297-987f-e053-3705fe0a095a
|
362
|
What was Monet seeing while painting? Translating artworks to photo-realistic images, file e31e124d-bbb5-987f-e053-3705fe0a095a
|
358
|
LAMV: Learning to align and match videos with kernelized temporal layers, file e31e124d-446a-987f-e053-3705fe0a095a
|
356
|
A Video Library System Using Scene Detection and Automatic Tagging, file e31e124c-f49b-987f-e053-3705fe0a095a
|
350
|
Watch Your Strokes: Improving Handwritten Text Recognition with Deformable Convolutions, file e31e124f-ab5d-987f-e053-3705fe0a095a
|
336
|
A Hierarchical Quasi-Recurrent approach to Video Captioning, file e31e124d-7858-987f-e053-3705fe0a095a
|
333
|
M-VAD Names: a Dataset for Video Captioning with Naming, file e31e124d-bbb3-987f-e053-3705fe0a095a
|
328
|
Connected Components Labeling on DRAGs: Implementation and Reproducibility Notes, file e31e124d-9e95-987f-e053-3705fe0a095a
|
324
|
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions, file e31e124d-b3fa-987f-e053-3705fe0a095a
|
321
|
Measuring scene detection performance, file e31e124b-1ae1-987f-e053-3705fe0a095a
|
308
|
A Browsing and Retrieval System for Broadcast Videos using Scene Detection and Automatic Annotation, file e31e124c-294a-987f-e053-3705fe0a095a
|
291
|
Analysis and Re-use of Videos in Educational Digital Libraries with Automatic Scene Detection, file e31e124c-298a-987f-e053-3705fe0a095a
|
280
|
The Unreasonable Effectiveness of CLIP features for Image Captioning: an Experimental Analysis, file a943c66a-788a-49ab-90aa-183459851f07
|
272
|
Gesture Recognition in Ego-Centric Videos using Dense Trajectories and Hand Segmentation, file e31e124c-2876-987f-e053-3705fe0a095a
|
270
|
Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model, file e31e124d-66dd-987f-e053-3705fe0a095a
|
267
|
Context Change Detection for an Ultra-Low Power Low-Resolution Ego-Vision Imager, file e31e124c-2282-987f-e053-3705fe0a095a
|
264
|
Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-to-Image Translation, file e31e124d-b3f8-987f-e053-3705fe0a095a
|
258
|
Recognizing and Presenting the Storytelling Video Structure with Deep Multimodal Networks, file e31e124c-2497-987f-e053-3705fe0a095a
|
216
|
Hierarchical Boundary-Aware Neural Encoder for Video Captioning, file e31e124d-3733-987f-e053-3705fe0a095a
|
203
|
A Deep Multi-Level Network for Saliency Prediction, file e31e124c-2946-987f-e053-3705fe0a095a
|
195
|
Multimodal Attention Networks for Low-Level Vision-and-Language Navigation, file e31e124f-9e2f-987f-e053-3705fe0a095a
|
187
|
From Show to Tell: A Survey on Deep Learning-based Image Captioning, file d26a40e8-6a72-410f-be39-2f2313456c3c
|
184
|
SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability, file e31e124e-7540-987f-e053-3705fe0a095a
|
178
|
Learning to Select: A Fully Attentive Approach for Novel Object Captioning, file e31e124f-a3d5-987f-e053-3705fe0a095a
|
171
|
Shot, scene and keyframe ordering for interactive video re-use, file e31e124f-3935-987f-e053-3705fe0a095a
|
170
|
Scene-driven Retrieval in Edited Videos using Aesthetic and Semantic Deep Features, file e31e124b-922a-987f-e053-3705fe0a095a
|
169
|
Revisiting The Evaluation of Class Activation Mapping for Explainability: A Novel Metric and Experimental Analysis, file e31e124f-980b-987f-e053-3705fe0a095a
|
168
|
Improving Indoor Semantic Segmentation with Boundary-level Objectives, file e31e124f-c5b7-987f-e053-3705fe0a095a
|
164
|
Explaining Digital Humanities by Aligning Images and Textual Descriptions, file e31e1250-62b0-987f-e053-3705fe0a095a
|
159
|
Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters, file e31e124d-baa3-987f-e053-3705fe0a095a
|
145
|
Assessing the Role of Boundary-level Objectives in Indoor Semantic Segmentation, file e31e124f-df41-987f-e053-3705fe0a095a
|
145
|
Explore and Explain: Self-supervised Navigation and Recounting, file e31e124f-c015-987f-e053-3705fe0a095a
|
138
|
SynthCap: Augmenting Transformers with Synthetic Data for Image Captioning, file 620f2945-aa20-4a3a-a40e-af017100253a
|
125
|
Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates, file a99b6a8b-c907-49e7-ac67-4aed1196b9d3
|
124
|
Out of the Box: Embodied Navigation in the Real World, file e31e124f-d7d3-987f-e053-3705fe0a095a
|
116
|
Working Memory Connections for LSTM, file e31e124f-f685-987f-e053-3705fe0a095a
|
115
|
A Novel Attention-based Aggregation Function to Combine Vision and Language, file e31e124f-b7c8-987f-e053-3705fe0a095a
|
105
|
Dual-Branch Collaborative Transformer for Virtual Try-On, file 7bba21eb-0494-44f6-ac66-092e4b086be9
|
100
|
Focus on Impact: Indoor Exploration with Intrinsic Motivation, file e31e124f-d645-987f-e053-3705fe0a095a
|
96
|
Unveiling the Impact of Image Transformations on Deepfake Detection: An Experimental Analysis, file 5e39d149-d12a-480e-8014-c9d734d065a8
|
73
|
Towards Explainable Navigation and Recounting, file 0d9c276e-68d4-4826-90bd-c73a3549c594
|
60
|
Embodied Navigation at the Art Gallery, file e31e1250-97b3-987f-e053-3705fe0a095a
|
59
|
ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval, file c05bfbcb-7a53-482d-9d81-d682bdaead40
|
42
|
CaMEL: Mean Teacher Learning for Image Captioning, file c2e71732-a7e1-44a8-a605-cbf655a9f90d
|
41
|
Retrieval-Augmented Transformer for Image Captioning, file 183d1cd2-2511-416e-91eb-df4d9ef1fad1
|
40
|
Video action detection by learning graph-based spatio-temporal interactions, file ecc4305b-eaf2-42f7-82b9-a28d4c374bfe
|
30
|
A Computational Approach for Progressive Architecture Shrinkage in Action Recognition, file c90e761e-64aa-45eb-9bf6-4b85a62c1d23
|
26
|
Gesture Recognition using Wearable Vision Sensors to Enhance Visitors' Museum Experiences, file e31e1250-86dc-987f-e053-3705fe0a095a
|
25
|
Boosting Modern and Historical Handwritten Text Recognition with Deformable Convolutions, file 6c606afc-bf53-4a18-9cd8-8a6f61e198ad
|
23
|
Spot the Difference: A Novel Task for Embodied Agents in Changing Environments, file f82b842f-9944-4898-a5a0-c3b4528041b4
|
19
|
Explaining Transformer-based Image Captioning Models: An Empirical Analysis, file 7b16781d-cc40-4c2e-aebb-e4dc80d6310b
|
18
|
A Deep Siamese Network for Scene Detection in Broadcast Videos, file e31e124c-28d8-987f-e053-3705fe0a095a
|
18
|
Scene segmentation using temporal clustering for accessing and re-using broadcast video, file e31e124b-153f-987f-e053-3705fe0a095a
|
13
|
Towards Video Captioning with Naming: a Novel Dataset and a Multi-Modal Approach, file e31e124c-b2f5-987f-e053-3705fe0a095a
|
13
|
Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation, file 89b61423-1732-474b-bd88-0f73ba3372ab
|
12
|
Gesture Recognition using Wearable Vision Sensors to Enhance Visitors' Museum Experiences, file e31e124c-294d-987f-e053-3705fe0a095a
|
12
|
Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model, file e31e124f-cd60-987f-e053-3705fe0a095a
|
12
|
Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets, file 919b12fe-eb45-43a7-aaad-5e5a1ad1d201
|
5
|
Superpixel Positional Encoding to Improve ViT-based Semantic Segmentation Models, file 0ec0790d-c001-4f14-9650-92e2c2070c72
|
4
|
Sharing Cultural Heritage—The Case of the Lodovico Media Library, file 69dc0d9d-16c5-4193-b5af-204077b6e76c
|
4
|
Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention, file e31e124f-c2a7-987f-e053-3705fe0a095a
|
4
|
NeuralStory: an Interactive Multimedia System for Video Indexing and Re-use, file e31e124c-9e64-987f-e053-3705fe0a095a
|
3
|
Towards Retrieval-Augmented Architectures for Image Captioning, file 87ac878f-a524-4e61-8b38-2ce438cf2905
|
2
|
Matching Faces and Attributes Between the Artistic and the Real Domain: the PersonArt Approach, file 8bdbf72e-ac09-4e04-ab1c-6dd51aa8782e
|
2
|
Explaining Digital Humanities by Aligning Images and Textual Descriptions, file e31e124f-3e22-987f-e053-3705fe0a095a
|
2
|
Positive-Augmented Constrastive Learning for Image and Video Captioning Evaluation, file 6df891d3-7cd7-4dae-b087-8570a4115837
|
1
|
Video Surveillance and Privacy: A Solvable Paradox?, file 986e799a-0b07-4152-94df-5ac4bbdb0c27
|
1
|
A Deep-learning-based approach to VM behavior Identification in Cloud Systems, file e31e124d-da34-987f-e053-3705fe0a095a
|
1
|
Totale |
19.512 |