Похожие чаты

Hi, I have following dataframes A, B, C and X. A, B, C

are independent each containing avg 10 millions of rows,  X is common to be used in join with A, B, C.
Steps:
Join A and X, do processing
Join B and X, do processing
Join C and X, do processing
As part of processing, I am doing calculations on columns.

Here,Will multithreading be beneficial?

6 ответов

17 просмотров

No, you probably want multiprocessing.

That probably depends on the processing that you need to do, since computers are fast. I'd recommend you implement the whole thing without any parallelization first, and just see how long it takes to process maybe 10k rows. There's a good chance everything is quick enough that you don't really need anything else. Even if the process takes a few hours, if you don't have to do it regularly or anything you're probably going to save time by just letting it run and doing something else instead of optimizing your code. And should this end up slow, you'll probably need the parts you've written anyways. If you do end up needing parallelization, as someone else pointed out, I think you'd need to use multi-processing, not multi-threading. Python has the Global Interpreter Lock, which in short means that only one thread of Python code can run in one process (with one interpreter) at once. Multiprocessing runs multiple interpreters which can actually do the work spread out on multiple CPU cores in parallel. There's a lot of nuance here: For example, the Python wiki says that some kinds of work that are not run in Python directly (like disk access or calculations in NumPy) are not affected by the GIL, and so can benefit from multi-threading. Also, it might end up that maybe your storage or even the network end up becoming the bottleneck that take up the most time, in which case parallelization won't do much. TLDR: Do a simple, non-parallelized implementation first. If it's too slow, you'll probably benefit more from multi-processing than multi-threading, but both can be worth a try.

Akshay-Dalvi Автор вопроса
Fayaz Khan
No, you probably want multiprocessing.

Can you please share any tutorial for same?

Akshay-Dalvi Автор вопроса
Akshay Dalvi
Can you please share any tutorial for same?

https://docs.python.org/3/library/multiprocessing.html

Akshay-Dalvi Автор вопроса

Похожие вопросы

Обсуждают сегодня

Какой-то там пердун в 90-х решил, что есть какая-то разная типизация. Кого вообще это волнует?
КТ315
49
Hi. Do we have a raid bot? Why nobody doing raids on X? Even RH mentioned this and nobody paying attention...whats the channel for hex memes? If mods cant run raids just insta...
H
31
Подскажите, а есть vault lite или ченить такое?) А то нужен вольт для похода в вольт, но весит он ~500 мб) как-то многовато для парочки запросов ))
Alexandr Orloff
17
блеать, почему так?? где в роутере это исправляется?
Арсен Маньяков 🇦🇲
16
void terminal_scroll() { memmove(terminal_buffer, terminal_buffer + VGA_WIDTH, buffer_size - VGA_WIDTH); memset(terminal_buffer + buffer_size - VGA_WIDTH, 0, VGA_WIDTH); ...
Егор
47
🌊 Ocean Nodes Dashboard Update 🚀 Hey, Oceaners! First off, a massive round of applause 👏to all of you for the amazing engagement since we launched Ocean Nodes. In just a few ...
KreigDK | Never DM first🌊
3
Всем привет! Подскажите, пожалуйста, в чем ошибка? Настраиваю подключение к MySQL. Либы лежат рядом с exe. Все как по "учебнику"
Евгений
16
А можете как-то проверить меня по знаниям по ассемблеру?
A A
132
Здравствуйте! У меня появилась возможность купить книгу "Изучай Haskell во имя добра!". Но я где-то слышал, что эта книга устарела. Насколько это правда??
E
22
люди, которые используют flameshot, к вам вопрос. Можно-ли поставить хоткей на создание скриншота? В программе есть отдел "горячие клавиши", но там все для редактирования, скр...
ThunDer104
11
Карта сайта