top of page

Search Results

229 results found with an empty search

  • How machine learning is a ‘requisite’ for ad fraud detection

    Marketers/advertisers are bundled with data today. They are collecting data behind every touchpoint the consumer makes, right from click data, install data, engagement data, etc. In today’s world, there are 2 major activities marketers are involved in: Using click and install data, marketers keep investigating different forms of campaigns to drive bigger volumes down the digital funnel Using engagement data, marketers study channels of engagement and message throughout the lifecycle of a digital consumer to enable a higher LTV However, this isn’t enough to study whether the incoming data is attributed correctly or not. Vanity matrices today are merely numbers on a digital dashboard, but the correctness is immensely suspected throughout. Finding attribution manipulation can be problematic and estimating an analogical behaviour of the traffic to a constant is merely impossible. For the same, because of the problem and the largeness of the data, it requires machine led understanding of the data over time. Usage of filters, boundary conditions, threshold, etc gives a good descriptive statistical understanding of the data in hand and can estimate rule-based anomaly finding. However, this misses on predictive and prescription data science. AI In Advertising Fraud In order to build a true machine learning model, one must look at the data very closely and build a homogeneous learning model that only injects consumer journey behaviour as a learning variable. Examples of Common Ad Fraud Schemes in which ML helps : Some sub-publisher based mobile application’s track consumer’s keyword search in google play store or iOS store, and if a consumer searches for a particular advertiser that is active and running performance led campaigns, a click is generated. These clicks hijack traffic from other networks and steal the organic traffic as well. A CTIT learning might not be enough to highlight such an anomaly, as these hijacks generally have a CTIT of more than 20 seconds. Some sub-publisher based mobile application’s track customer’s’ APK changes. In case, a customer installs a particular android or apple app package, a click is generated to hijack this form of traffic. Generally, these lie in the CTIT anomaly limit of 20 seconds, but a back timed click is sometimes even injected to claim the attribution. Incase of installs and engagement based KPI for performance campaigns, APK drops are a common thing for acquiring new customers from tier 2, tier 3 and rural India. The sub-publisher-based mobile application works as adware and takes the rights of installing new-APK on the mobile device. These kinds of installs are generally greyed at half a cent at the value to many aware marketers. Marketers opt for it for reaching the quick 1 million mark, or high listing on google play or apple store. However, these installs are generally not brand-safe and might allow data theft. Finding attribution manipulation is not easy. The objective behind the mere click manipulation is to hijack the last-click attribution model for monetary gains. Anyone with mere contacts of these adware/malware enabled apps can help one in growing the business in no time which in return fuels corruption, fake news, tax evasion and cross-border cyber warfare. Detection for these kinds of traffic sources is a must, as it incentivises the above, but also affects digital consumers and the country in the long run. As published in : Financial Express

  • Why is bot measurement important for brands

    According to Elon Musk, Parag Agarwal, the CEO of Twitter, has publicly refused to present proof that Twitter has fewer than 5% of accounts that are fake. Hence, CEO of Tesla, Elon Musk’s deal to buy Twitter, for approximately $ 44 Billion, might not see the light of the day due to massive mismatch between Twitter’s total and real user’s database. In fact, Twitter earlier this year also admitted that it had been overestimating its user base by 1.9 Million users every quarter for the previous three years. The platform in March 2019, launched a feature that allowed people to link multiple accounts together in order to conveniently switch between accounts. However, at that time an error was made at the time, such that actions taken in the primary account resulted in all linked accounts being counted as mDAU (Monetizable daily active users). This once again has drawn attention to the prevalence of bots and fake accounts on social media platforms. This now raises a bigger question – Are Twitter and other social media platforms as transparent as they are supposed to be? The answer is crucial since many enterprises and brands are spending 30-40 percent of their digital ad spending on social media and influencer marketing, unaware that a significant amount of their social media base could be fake. What exactly is a bot? To measure the prevalence of fake accounts and bots on Twitter, a clear definition of them is necessary. Fake accounts are those that impersonate people. Bots, on the other hand, are accounts that are partially controlled by software and can automatically post content or carry out simple interactions, like retweeting. Bots are meant to either skew enterprises’s advertising matrices or to mix with existing human traffic to increase financial gain for ad fraudsters. Bots can be used to increase impression, click and engagement matrices and are often mixed with traffic that is either organically hijacked or hijacked via last click attribution. If you are a website owner, it becomes your duty to provide such content to your audience which your website visitors and advertisers can trust upon. After all, advertisers and readers are like the two wheels of the bicycle of a website’s bottom line because they help us in generating revenue over our content. And, Invalid Traffic (IVT) or bots can majorly affect your relationship with your audience. Importance of bot measurement Bots and fake accounts are a known growing evil in the digital ecosystem becoming more sophisticated in today's world of emerging AI and ML technologies, portraying real and human-like persona. Automated bots can generate massive volumes of conversation, chats and trolls on social media platforms. While fake bots/ accounts may appear as an easy and simple way to gain followers and promote your brand, they can actually dilute your brand’s image and credibility. Bots can reduce engagement and lead to higher customer acquisition costs. Consider a campaign where the marketers spend money on targeting and retargeting such bots. While the digital medium is becoming the most important medium for advertisers, accuracy still exists as a challenge across social media platforms and marketers should not judge performance and effectiveness solely based on these platforms. What is the industry's strategy for dealing with bots? The majority of Twitter’s revenue comes from selling ad space on its platform to global advertisers. And to attract advertisers, it requires a huge and growing user base. In 2021, Twitter generated more than $4.5 billion in advertising, up from $77 million a decade ago. Leading car and other significant brands are partnering with Twitter for marketing campaigns and product launches. Bots and fake accounts exist on all social media platforms and it is an open secret to all advertisers. Marketers now wonder if the controversy will have an impact on their enterprises and ad revenue. Many advertisers employ bots for promotional activities in order to push for organic communication. However, many of them are consciously making decisions to not use such bots, releasing it will affect the brand in the longer run. Customer lifetime value is becoming the key focus for the enterprises, and they will have to look beyond the campaign numbers. Enterprises do realise how damaging bot traffic can be to their digital advertising strategy. Bot traffic leads to businesses wasting money on fraudulent ad clicks that will not generate any revenue. With distorted numbers that can be terrible for enterprises, it's critical to understand how to identify bot traffic and protect your digital campaign from it in the most effective way possible. Advertisers who wish to drive performance via social media won’t be affected right away. However, enterprises and brands may want to decrease their budgets or pull out until all of this gets cleared. There may not be an immediate impact, but in the longer run, revenue will get affected when the response of the ad spend won’t come out as expected.

  • Clickjacking : Methods & Ways

    To begin, let's define what clickjacking means and how it can lead to ad-fraud. When the user clicks on the hijacked link, the attacker will start downloading the malware. In a certain area of ​​the screen where the attacker knows that the user is clicking, the attacker can replace the real and hidden cursor with a fake cursor, and manipulate the screen in such a way that the user knows that they are clicking on a malicious link instead of clicking on something else. The successful Tweet bomb attack in 2009 was a continuous loop. Users have clicked the tweet link to open the web page, clicked the link to open the tweet, and then tweeted the link to their account to encourage followers to click the link. Clickjacking is one of the leading causes of ad-fraud in the tech industry. Clickjacking or clickjacking is a network attack in which an invisible malicious link is placed on the user interface of a website. Clickjacking can facilitate or facilitate other cyber-attacks, such as XS. Classic clickjacking means a situation where a scammer deploys a secreted layer on a web page to manipulate the targeted user's cursor, causing the target to click. Clickjacking is an attack that makes the target user click on parts that are indistinguishable or disguised as different items. Clickjacking attacks attempt to induce users to click on unexpected elements on web pages. The attacks are generally carried out by allowing users to see invisible HTML elements or iframes at the top of the page. On the page that is clicked, the attacker loads the page as the original page with a transparent overlay and prompts the user to take action, even if the result is not as expected. The user believes that they have clicked on the visible page, even if they just clicked on an invisible item or moved to an additional page from the visible page. An example of a click page that causes users to take unwanted actions by clicking on a hidden link. In similar hacking attacks, if a user clicks on the current link, they will be tricked into clicking the Facebook button. How does it work? As we have learned that clickjacking is basically an interface-based scam or an attack which targets the users and deceives them into clicking on an actionable content on a concealed websites or additional content on trap websites. Network users can win prizes by clicking the link provided in the email or clicking the button to visit the decoy page. Clickjacking, commonly referred to as a countervailing attack, refers to the use of large amounts of transparent or opaque coatings by scammers to get specific users to click on the page they want to click, rather than a button or link on the homepage. The attacker tricked the network user into pressing a spare "hide" button to make payment to the account on the website. This is a complex form of click spam, and it is even more insidious because the user's net CPI payment device may be hijacked by criminals. In addition, click injection (also known as clickjacking) has long been one of the most popular types of CPI ad fraud. Click on malware that can be hidden in applications, legitimate applications downloaded from third-party app stores, people who sent you copies of false click reports, or network hijackers click to perform detection of potential client installations. Clickjacking is one of the most common ad-fraud and click spam mapping methods. Clickjacking is a click-to-install mobile ad fraud that sends a fraudulent click report immediately after the actual click. Click flooding (also known as click spam) is another type of scam that occurs when bad actors report a large number of fraudulent clicks in the hope of obtaining credit for biological application installations. Clickjacking is classified as a user interface attack (or repair), which is a malicious technique that tricks users into clicking on something outside of their perception, revealing sensitive information, and allowing others to control it. By clicking on harmless objects, your computer, including websites. The most common method of clicking is to show users a combination of two or more hierarchical websites or browser windows to stimulate some motivation to click at a specified location. Finally, the user clicks on the part named iframe on the target web page with the cursor, so that the browser window can be divided into several parts so that different elements can be shown or hidden, and attackers can be launched as necessary. The attacker first loads the vulnerable web page into an iframe, places it completely transparently, and places it in front of the created malicious web page to trigger clicks in the appropriate location. The attacker then hides the iframe behind a harmless link on the website (such as the New York Times headline or Digg button). When the victim clicks on the link, the cursor will click on the iframe. For example, an attacker may want to entice users to purchase items from a retail website, but the item must be added to the shopping cart before an order can be placed. This attack is different from the CSRF attack in that the user must take an action, such as clicking a button, and the entire request must be spoofed without the user's knowledge or input. We have developed a new detection method for this type of attack, which is based on the behaviour and reaction of the active content on the website when the user clicks on the request. In our experiments, we found that our detection method can detect advanced and scalable vector graphics attacks (SVG-based attacks) that most modern tools cannot. Having understood click hijacking it must not be hard to understand how this is one of popular means of conducting ad-fraud. How to prevent? The clickjacking scam/ attack cloaks a page where the targeted user believes the iframe, and then displays invisible elements at the top of the frame. To ensure that your site is not used for clickjacking attacks, you must ensure that malicious sites cannot wrap it in an iframe. This can be made possible by instructing the browser directly via HTTP headers, or in older versions of browsers by use client-side JavaScript (frame termination). Some suggested ways include: Framebusting or framebreak: Before support for new HTTP headers becomes widespread, website developers must implement special frame buster (or frame killer) scripts to prevent their pages from being framed. To be assured that this is the current page, the preliminary framebusting script verifies and checks top.location; if not then, top.location is set to self. However, these scripts are easily blocked or ignored by external frameworks, so more complex solutions have been developed. Even so, there are still plenty of ways to bypass the more complex frame-breaking programs, and such scripts should only be used to provide basic protection for older browsers. The existing method suggested by OWASP is to hide or conceal the complete body of the HTML document and show it only after the verification page has no frame. 2. X Frame Options: The best solution at this point may be to use the HTTP XFrameOptions (XFO) response header in the server response. Microsoft on its Internet Explorer 8 and later versions originally introduced and formalised RFC 7034, in which the XFO header is employed to postulate and specify if the page can be embedded in & lt; frame & gt;, & lt; iframe & gt;, & lt; embed> or the element & lt; object>. The header supports three possible commands: deny to block all framing attempts, same origin only allows framing of pages from the same source, or allow form to allow pages of a specific URI to be framed. However, several browsers (including Chrome and Safari) don't support allow from, so if you need to specify the font, it's better to use CSP (see below). For overall anti-frame protection, one only needs to postulate XFrameOptions: deny or XFrameOptions: sameOrigin in the server header. 3. Content Security Policy with frame ancestors: The ContentSecurityPolicy (CSP) HTTP header was originally developed to prevent XSS and other data injection attacks. However, it also provides a frame ancestors directive to specify the source (in ,

  • The Importance of Ad Fraud Detection in the Digital Landscape

    Cybercrime and cyber threats do not have a very long or illustrious history. In fact, it just came into the picture when the internet was booming. Despite its novelty, its severity cannot be understated. Cybersecurity experts predict that global cybercrimes are expected to rise and could result in $10.5 trillion losses annually by 2025. As a result, there has been a necessity for more comprehensive and organised cybersecurity measures to prevent rising crimes on the internet. Furthermore, various industries in India, particularly in the aftermath of the COVID outbreak, are experiencing massive engagement of audiences and customers through digital means. Resultantly, it makes them more and more vulnerable to digital frauds and attacks. An Overview: Digital Ad Fraud Cybercrime isn't confined to banking or financial frauds only, that defraud customers of their hard-earned money. Instead, fraudsters are becoming more inventive, developing new malicious ways to steal money from marketers and major brands as well. Digital Marketing was envisioned as the future of marketing, predicted to overtake the traditional methods, because of its immense potential. However, it became a hub for scammers looking to deceive marketers and brands. In 2021, $59 billion were reported in loss, due to digital ad frauds. Networks of malicious bots are costing brands every day and this is happening at an alarming rate. Enterprises have started spending more and more on digital advertising, which has also piqued the interest of criminals looking to make quick money. The more money an enterprise spends on advertising, the more it stands to lose due to digital ad fraud. Fraudsters tend to use fraudulent practices such as Click Injection, Ad Stacking, Domain Spoofing, etc to defraud marketers. As a result, ad fraud detection has become an essential step in ensuring the safety of brands and the interests of their customers. Importance of Ad Fraud Detection Like any other cybercriminal activity, digital ad frauds are also relatively new. Thus its modern roots make it challenging to combat such frauds. In most cases, marketers are uninformed or unaware of the fact that they have been scammed. Thus, it is more important than ever to fight such malicious activities. Most ad fraud affects the advertising budget of an organisation without producing results. These ad frauds claim credit for coincidental site visits or generate fake clicks and impressions, which results in wastage of ad spending of marketers. Eventually, this will have a negative impact on the marketing strategies of the enterprises and wasted efforts. In addition, it can also do reputational damage to brands, when associated with inappropriate or objectionable content. When a brand is a victim of ad fraud for an extended period of time, it can significantly lose its potential customers. Heading towards the good part, it is safe to say that the Indian startup ecosystem is fortunate to have some of the most talented and committed individuals who are determined to tackle these problems. Many of these young entrepreneurs have recognised the importance of cybersecurity, especially in this ever-growing digital space. A recent article published by YourStory details the path and goals of six Indian cybersecurity startups that are redefining the digital security landscape. We, at Com Olho, are proud to be able to share this platform and convey our goals and aspirations; to remove digital ad fraud from our digital landscape. All enterprises, large and small, should understand the importance of digital security. With the rise in cyberattacks, Com Olho has realised the significance of cybersecurity solutions and is committed to assisting brands and marketers to safeguard their interests. Com Olho takes pride in actively contributing to the advancement of India's cybersecurity business and is determined to add value to this ever-growing industry. We are a Gurugram-based cyber security startup that uses patented technology for non-rule-based digital ad fraud detection. In Conclusion Every organisation must protect itself from the costs of ad fraud and they must detect it early and put a stop to it as soon as possible. Com Olho, in this aspect, has always helped brands and is on a mission to assist Enterprises and the Government to create a Digital Safe India.

  • Elon Musk's takeover of Twitter and its impact on Brand Safety

    Last year, when advertisers and marketers expressed their concerns regarding the safety of their brand, social media site Twitter claimed to have placed their interests as its top priority. Caitlin Rush, the Head of Global Brand Safety Strategy at Twitter stated, “Brand safety is not only about brands, but it is about people.” She further added, “When we focus on the safety of people, we also protect brands from the reputational damage of supporting things like hate, abuse, and misinformation with their ad dollars.” This has been viewed as Twitter’s response to safeguarding its ad revenue from brands, accounting for 4.5 billion USD. However, Elon Musk's decision to take over Twitter, and his outright plans with the platform, may worry many brands. Ad income accounts for roughly 90 percent of Twitter's total revenue. It still is nowhere close to its competitors and other social media platforms. Moreover, Twitter's user reach is also substantially smaller than its competitors, with around 200 million users seeing advertising compared to 800 million on LinkedIn and almost 2 billion on Facebook. It is understandable, given that Twitter has a niche audience, and marketers prefer platforms with a large user base and engagements. Nonetheless, Twitter has been hell-bent to make its platform as much accessible for brands as possible. This has been evident in their various policies, including content moderation, manual human review assisted with machine learning, and brand safety policies. In addition, Birdwatch and Conversation settings are two other recent Twitter applications for safe conversations. It allows users to recognize potentially misleading information in Tweets and then report or add notes that provide context. More than 11 million individuals have used the conversation settings that allow everyone to determine who can reply to their Tweets. And there have been further efforts by the company to make it more accessible for brands. Until now. The discussion around an edit button and its impact on Brand Safety Earlier this year, Elon Musk conducted a poll, surprisingly on Twitter itself, expressing his desire to have an edit button on Twitter. This feature has been long missing from the platform and would allow users to edit and make changes to their posts. Surprisingly, over three-fourths of over 4 million voters agreed that an edit button is needed. Meanwhile, Twitter also announced that they are working on an edit feature to be implemented on their platform. This is likely to be good news for users and brands, who can always have an option to modify their earlier tweets. Brands are always mindful of their status, and any objectionable posts on a platform with 300 million monthly active users can always result in backlash. On the other hand, Twitter has always been viewed as a platform where real conversations take place. Your viewpoint or stance is constantly in the public domain since you can't alter your tweets or statements. And that is what makes Twitter unique. The addition of an edit feature may detract from the distinctiveness of Twitter posts, as a result, lowering their relevance. For Marketers, it is important to identify and select the right medium to market their product, while ensuring brand safety. An edit option can have a significant impact on how marketers approach campaign plans, and it will undoubtedly affect Twitter's forthcoming businesses. Brands have always welcomed any initiative that allows them to be associated with posts or tweets, that won't do them any reputational damage. Twitter has always shown efforts to make its platform a safe haven for brands of all sizes. All of that could change if Elon Musk takes over the social media platform later this year. Elon Musk and his plans with Twitter Just days after his post about the edit button, news surfaced that Elon Musk will now acquire Twitter. The CEO of Tesla and SpaceX has agreed to buy Twitter for $44 billion, and if it goes through, it'll be one of the largest leveraged buyouts ever. Now that it is evident that Elon Musk will be the owner of the platform, we are not yet certain how the platform is going to function in the future. And one of the concerns for marketers is the billionaire’s repeated admiration for free speech. This will have a direct impact on the principles that the platform has been building over the years. And now, with limited content moderation and the freedom to express oneself freely, the platform may introduce new complications. Giving a free speech platform, for example, has the potential to spread hate speech and other forms of misinformation. As a result, no brand wants to be associated with such content and may opt for a different medium. Some advertisers are worried that Elon Musk's potential takeover of Twitter will push the app away from the brand safety path that Twitter has established through standards and relationships with the advertising industry over the years. Furthermore, several advertising executives have stated that if Elon Musk removes the features that allowed Twitter to remove objectionable content, they are willing to allocate their ad spending elsewhere. In Conclusion Elon Musk has made it clear that advertising is not a priority. He said that he wants to loosen the service's content moderation policies, which marketers say have helped keep ads from appearing alongside hate speech and misinformation. Additionally, he has mentioned making money from Twitter in other ways, such as charging some users to use the service. It will be interesting to see how the world's richest person manages to strike a balance between his vision for Twitter and the prior business partnerships that the platform has built through its security measures.

  • The mandate on VPN and its implications on Data Privacy and User Safety

    It was last year when the Government of India reported that they were working on a measure to prohibit the use of all types of VPNs (Virtual Private Networks) in the country. Regarding the same, the Indian Government has now mandated all the VPN providers to collect and store data of their users for five years. Although they stated that this is being done amid security concerns, the impact it can have on all parties involved is alarming, to say the least. As per a new directive issued by CERT-IN (Indian Computer Emergency Response Team), companies will now be required to store user data, including their IP addresses, emails, names, contact numbers, and addresses, for up to five years even after a user has terminated their service. Furthermore, the ministry can request this information at any moment, and VPN providers will be required to cooperate under the new regulation. As a result, there are growing tensions among the service providers as well as users. The ministry said the move was an effort to “coordinate response activities as well as emergency measures with respect to cyber security incidents” and help it fill “certain gaps” that cause hindrance in handling cyber threats. What are VPN services? Simply put, Virtual Private Network (VPN) allows the users to establish a secure network connection. This way, the service protects a user's identity by hiding their device's IP address, encrypting their data, and routing it through secure networks.There are more than 270 million Indians who use virtual private networks (VPNs). People use VPNs to get access to websites that might have been restricted by the government and browse the internet safely, without being monitored at all. Additionally, it can also be used to browse internet content accessible in other states or countries or utilise it for privacy on the internet, which is rife with marketing tracking. Another common use for VPN is to protect oneself when connecting to a public network. When connected to a public Wi-Fi, users often expose themselves to the risk of security breaches and data theft. VPN enables the user to establish a secure network connection. It encrypts internet traffic and conceals a user's identity, making it difficult for third parties to track and steal user data. However, these regulations will simply contradict the established intent of using VPNs. If there is no data privacy and user data are not protected, users will be hesitant to use VPN services, affecting the businesses of these service providers. The use of VPNs in corporations Additionally, VPNs are also used by organisations for data protection. Many companies and enterprises instruct their employees to use an internal VPN to access the office network. However, their use of VPN differs significantly from that of the general public. A business VPN, as it is called, is uninterested in surfing restricted content, but rather it is used to track its employee’s digital footprint. In some ways, this is what the government intends to achieve with the country's new VPN mandate. The new regulation will most likely not affect enterprises or private VPNs since they already collect user data and information for so-called “data and user safety”. However, it will be interesting to see the impact of this regulation on major public VPN service providers. Overall Impact According to several reports, as soon as the new regulation surfaced, major VPN service providers in India, like Nord and Surfshark have stated that they will relocate their servers from India instead of complying with the new rules. This was expected since most of these services prioritise data privacy and user safety. More importantly, these service providers offer a no-log policy, which means they don't keep track of what users do with their VPN. As a result, they won't be able to assist the ministry with any data they might request, and thus, it seems difficult if they will be able to comply with these regulations. Only if these VPN services adjust their practices in a way that makes them less secure can they comply with Indian regulations. However, this will simply go against their promise of securing the user data and providing data privacy. As a result, other VPN providers are likely to dismiss their operations in the country. VPNs that do not comply with Indian regulations will be temporarily blocked. In Conclusion VPNs indeed allow users to cloak themselves, allowing them to engage in malicious activities which could be a concern. However, many experts consider these measures to be excessive. These rules are likely intended for state-sponsored surveillance and defeat the purpose of user privacy. They have been designed such that, to drive all VPN services that provide privacy and anti-censorship out of the nation. By the looks of it, it appears that the government has taken the first step in achieving its initial goal of outright banning VPN services. Whether VPNs comply with the new rule or not, it is the user's privacy that will be put at risk. The new VPN rules in India will take effect in June. For the time being, this will be strictly enforced. Interesting fact: Many countries that either ban or regulate VPNs include China, Russia, Iraq, North Korea, Belarus, the United Arab Emirates, and Oman.

  • Beginning my second year at Com Olho with exponential growth.

    “Never doubt that a small group of thoughtful, committed people can change the world. Indeed, it is the only thing that ever has.” – Margaret Mead I was the first employee at Com Olho. And I believe that the best analogy for what it’s like to be the first employee is that it's extremely similar to the experience and feelings of being a cofounder. I was a part of every conversation and every decision. The open culture and transparency between senior management and other employees is one of the best aspects of Com Olho. I joined Com Olho with no expertise in marketing, but with a strong belief in the company’s mission - “To assist Enterprises and the Government to create Digital Safe India.” However, as I reflect on my journey, the opportunities that have been provided to me have allowed me to grow both personally and professionally. I'm getting goosebumps today thinking about how far we've come. In the last one year, I have gained significant insight into where my true strengths lie by wearing many hats. Nothing makes me happier than taking on new tasks and overcoming unexpected problems, or, to put it another way, stepping outside of my comfort zone. As the company grew and pivoted, I adapted and moved from one set of responsibilities to another. Com Olho has provided me with a diverse range of experience that has helped me advance in my career. It takes commitment to build a successful business. Working hard entails more than just putting in long hours. It's all about commitment to one's belief and aiming for greatness. And the beauty of working here at Com Olho is that everyone works hard towards the same goal to get it done. Working here has been an exciting journey and an amazing learning experience. I'm grateful to our Co-founder, Abhinav Bangia for believing in me from the start, giving me responsibility and the freedom to do whatever I wanted. It has equipped me with invaluable hands-on experience, allowing me to grow my skills, and knowledge. I believe I still have a lot to learn and grow in, and I'm looking forward to doing so with Com Olho. Connect with me on LinkedIn: Link

  • Reading large CSV files in Python - A perpetual problem

    The growing pace of data is exponential in today's society, where every business and institution is transforming itself into a data-savvy entity. As a result, dealing with large amounts of data has become necessary. The CSV (Comma-Separated Values) format is one of the most frequent ways to store data efficiently. Importing a large CSV file directly into a Python script can cause an 'Out of memory' error or a system crash owing to a lack of RAM. The internet has plenty of tips and strategies for reading large CSV files at once, such as defining the chunksize of the data in the pd.read csv() command or utilising Dask dataframes or Datatables. After extensive testing and hours spent developing the best code to read massive quantities of data, I personally believe that all of these solutions have some form of barrier at some point of time. For example, defining chunksize and breaking the data into chunks necessitates an extra step of concatenating the data into one dataset, which takes almost as long as simply reading the data. And the first obstacle with Dask dataframes is specifying the dtype for all of the columns (even when there are 200+ columns); second, dealing with Dask dataframes is not as straightforward as working with Pandas dataframes. Following extensive research, one feasible and efficient method for reading large dataframes is to not read them all at once. This leads us to the concept of 'Structurization.' Most datasets can be divided into subsets based on the year, quarter, month, day, or any other criterion. Creating subsets while saving the data according to the datetime column makes it very simple to read and concatenate the required data. Amongst the various ways to create subsets of a dataset, one very efficient way is described below: 1. From the timestamp column in dataframe, create a new column of just the Year (or Month, or Date): df['year'] = df['timestamp'].dt.year df['month'] = df['timestamp'].dt.month df['date'] = df['timestamp'].dt.date 2. Use groupby function on one of the columns that you created above: grouped_df = df.groupby('year') 3. Using for-loop, you can print all of the data-frame groups created and their shape: for name, group in grouped_df: print(str(name)) print(group.shape) 4. To save the subsetted groups as CSV in a folder, use the same for-loop as above and specify the folder path: output_folder_path = "C:\\Users\\ABC\\year_wise_files\\" for name, group in grouped_df: output_file = str(name) + '.csv' output_dir = Path(output_folder_path) output_dir.mkdir(exist_ok=True) group.to_csv(output_dir/output_file)

  • Accelerate your path to GDPR and India's PDPB compliance with Com Olho

    What is GDPR? The European Union (EU), 20 years ago through the Data Protection Directive 95/46/EC introduced its data protection standard. Since the European Union needs each member state to implement a directive into national law, Europe ended up with a patchwork of different privacy laws across different countries. Additionally, increasing security breaches, rapid technological developments, and globalisation over the last 20 years saw new challenges for the protection of personal information come to the forefront. In order to address this situation, the EU developed the GDPR, which is directly applicable as law across all member states. What is India’s PDPB? India's Personal Data Protection Bill (PDPB) is one of the most comprehensive data privacy laws in the world. The Personal Data Protection Bill (PDPB) will impose obligations on practically all businesses operating in India. PDPB requires businesses to reassess all of the company's data processing practices, policies, and safeguards. Why does GDPR and India’s PDPB matter? With the increase in user-generated data and the exponential industrial value of data, it’s becoming vital that necessary steps are being taken to protect the data rights of the citizens. Data protection regulations ensure the security of individuals’ personal information and regulate the collection, usage, transfer, and disclosure of the said data. They also provide access to data of the individuals and place accountability measures for organisations processing personal data information and supplements it by providing remedies for unauthorised and harmful processing. Privacy laws like the EU’s General Data Protection Regulation (GDPR), and India’s PDPB have changed two things: They acknowledge that devices like smartphones are an intrinsic part of a person’s identity, and hence, any data and information that can be used to profile an individual comes under the ambit of laws; and These laws articulate what is consent and that it should be free, informed, specific, clear, and capable of being withdrawn. How is Com Olho GDPR and India's PDPB compliant? Privacy, security and protection of the customer data are shared responsibilities between the clients and Com Olho. This shared responsibility in the context of the General Data Protection Regulation (GDPR) is defined by two key actors: Data Controller: Determines how personal data information is processed and the purposes for which it's processed. Data Processor: An entity that maintains and processes personal data records only at the controller’s command. India's Personal Data Protection Bill (PDPB) scope is broader than General Data Protection Regulation (GDPR). PDPB regulates the processing of personal data by the state, any citizen of India, or any individual or body incorporated or created under Indian law. Com Olho ensures that the data rights access fulfilment — and automate processes for client’s individual requests. Under India's PDPB, data principles receive certain rights similar to those covered by GDPR. These data rights include: – the right to access data – the right to correction – the right to data portability – the right to erasure – the right to be forgotten Accelerate your path to GDPR and India's PDPB compliance with Com Olho Com Olho is committed to help businesses develop a strategy to achieve GDPR security and India’s PDPB compliance. We give our clients a SaaS advantage by offering service that is designed to be secure at every layer—for their entire business. Managing your business’s data is easier when there is one centralised location you can trust for storing it, instead of it being spread across a range of different storage media and what better source you can trust than your own server. Com Olho stores and maintains the clients data by deploying AI agents on the clients server itself. This reduces the risk of data theft/manipulation and offers simplicity, with a single set of policies and standards for your business processes. Our intelligent and secure service- lightens the load for administrators and users alike, allowing you to focus more on your business. In a constantly changing regulatory landscape, Com Olho can help your organisation address regulatory compliance more efficiently and easily. Businesses all over the world are focusing on ensuring their systems, processes, and policies support GDPR and India’s PDPB guidelines. All their teams continue to be tasked with implementing changes in the way they manage processes, people, and technical controls in order to comply with the legislation. Com Olho welcomes the positive changes the GDPR and India’s PDPB has brought to our services and we remain committed to helping our clients address GDPR and India’s PDPB requirements that are relevant to our services.

  • Nginx Security: How To Harden Your Server Configuration

    As of March 2021, one in three websites on the internet runs on Nginx, according to a web survey by Netcraft. Nginx web server powers high performance applications in a responsive, efficient manner and is useful for load balancing, HTTP caching, mail proxying, and reverse proxying. With the ability to handle 40,000 inactive HTTP connections with just 10Mb of memory, it is the go-to choice for high-traffic sites. This blog will cover the hardening tips to improve your cybersecurity posture. Step 1. Disable Any Unwanted nginx Modules When you install nginx, it automatically includes many modules. Currently, you cannot choose modules at runtime. To disable certain modules, you need to recompile nginx. It’s recommend to disable any modules that are not required as this will minimize the risk of potential attacks by limiting allowed operations. To do this, use the configure option during installation. In the example below, we disable the autoindex module, which generates automatic directory listings, and then recompile nginx. # ./configure --without-http_autoindex_module # make # make install Step 2. Disable nginx server_tokens By default, the server_tokens directive in nginx displays the nginx version number. It is directly visible in all automatically generated error pages but also present in all HTTP responses in the server header. This could lead to information disclosure – an unauthorized user could gain knowledge about the version of nginx that you use. You should disable the server_tokens directivr in the nginx configuration file by setting server_tokens off. Step 3. Control Resources and Limits To prevent potential DoS attacks on nginx, you can set buffer size limitations for all clients. You can do this in the nginx configuration file using the following directives: • client_body_buffer_size – use this directive to specify the client request body buffer size. The default value is 8k or 16k but it is recommended to set this as low as 1k: client_body_buffer_size 1k. • client_header_buffer_size – use this directive to specify the header buffer size for the client request header. A buffer size of 1k is adequate for most requests. • client_max_body_size – use this directive to specify the maximum accepted body size for a client request. A 1k directive should be sufficient but you need to increase it if you are receiving file uploads via the POST method. • large_client_header_buffers – use this directive to specify the maximum number and size of buffers to be used to read large client request headers. A large_client_header_buffers 2 1k directive sets the maximum number of buffers to 2, each with a maximum size of 1k. This directive will accept 2 kB data URI. Step 4. Disable Any Unwanted HTTP methods We suggest that you disable any HTTP methods, which are not going to be utilized and which are not required to be implemented on the web server. If you add the following condition in the location block of the nginx virtual host configuration file, the server will only allow GET, HEAD, and POST methods and will filter out methods such as DELETE and TRACE. location / { limit_except GET HEAD POST { deny all; } } Another approach is to add the following condition to the server section (or server block). It can be regarded as more universal but you should be careful with if statements in the location context. if ($request_method !~ ^(GET|HEAD|POST)$ ) { return 444; } Step 5. Install ModSecurity for Your nginx Web Server ModSecurity is an open-source module that works as a web application firewall. Its functionalities include filtering, server identity masking, and null-byte attack prevention. The module also lets you perform real-time traffic monitoring. We recommend that you follow the ModSecurity manual to install the mod_security module in order to strengthen your security options. Step 6. Set Up and Configure nginx Access and Error Logs The nginx access and error logs are enabled by default and are located in logs/error.log and logs/access.log respectively. If you want to change the location, you can use the error_log directive in the nginx configuration file. You can also use this directive to specify the logs that will be recorded according to their severity level. For example, a crit severity level will cause nginx to log critical issues and all issues that have a higher severity level than crit. To set the severity level to crit, set the error_log directive as follows: error_log logs/error.log crit; Step 7. Monitor nginx Access and Error Logs If you continuously monitor and manage nginx log files you can better understand requests made to your web server and also notice any encountered errors. This will help you discover any attack attempts as well as identify what can you do to optimize the server performance. You can use log management tools, such as logrotate, to rotate and compress old logs and free up disk space. Also, the ngx_http_stub_status_module module provides access to basic status information. You can also invest in nginx Plus, the commercial version of nginx, which provides real-time activity monitoring of traffic, load, and other performance metrics. Step 8. Configure Nginx to Include Security Headers To additionally harden your nginx web server, you can add several different HTTP headers. Here are some of the options that we recommend. X-Frame-Options You use the X-Frame-Options HTTP response header to indicate if a browser should be allowed to render a page in a or an

  • Common Nginx Misconfigurations

    As of March 2021, one in three websites on the internet runs on Nginx, according to a web survey by Netcraft. Nginx web server powers high-performance applications in a responsive, efficient manner and is useful for load balancing, HTTP caching, mail proxying, and reverse proxying. With the ability to handle 40,000 inactive HTTP connections with just 10Mb of memory, it is the go-to choice for high-traffic sites. This blog will cover the Common Nginx misconfigurations that leave your web server open to attack. Common Nginx Misconfigurations 1. Passing Uncontrolled Requests to PHP Most Nginx example configs for PHP advocate for passing every URI ending in .php to the PHP interpreter which could result in arbitrary code execution by third parties on most PHP setups. In this example, all requests that the .php file extension will be passed to the FastCGI backend. A default PHP configuration is set so that it attempts to guess the file you want to execute in cases where the full path specified does not lead to a file that exists on the system. Let's say you request for /cyber/security/nginx.php, which does not exist while /cyber / security /nginx.gif actually does exist; the PHP interpreter will process /cyber / security /nginx.gif. If nginx.gif contains embedded PHP code, it will execute. 2. Alias LFI Misconfiguration Inside the Nginx configuration look the "location" statements, if someone looks like: There is a LFI vulnerability because: Transforms to: The correct configuration will be: So, if you find some Nginx server you should check for this vulnerability. Also, you can discover it if you find that the files/directories brute force is behaving weird. 3. Missing Root Location The root directive is positioning in your configuration matters. One of the Nginx configuration pitfalls that administrators are strongly warned against is putting the root directive inside location blocks. If you add root to every location block individually, then an unmatched location block will lack root, which would cause errors. Conversely, failure to put the root directive in a location block would give access to the root folder of the server block. In the above example, the root folder is /etc/nginx/app meaning that files in this folder are available to us. However there is no location for / i.e location / { } but only for /cybersecurity.jpeg. As such, a request like GET ../nginx.conf would show the content of the config file etc/nginx/nginx.conf As such, requests to / will take you to the path specified in the root directive which is globally set. The most common root paths were the following: 4. Using non-standard document root locations Deviating from the standard root document locations laid out in the Filesystem Hierarchy Standard might seem like a fun idea sometimes. That is of course until someone requests for a file they should not be able to access and you end up getting compromised. In the above example, a request for /etc/passwd would reveal your etc/passwd file meaning attackers would have your user list and password hashes and if your Nginx workers run as root, how your passwords have been hashed as well. 5. Unsafe variable use Some frameworks, scripts and Nginx configurations unsafely use the variables stored by Nginx. This can lead to issues such as XSS, bypassing HttpOnly-protection, information disclosure and in some cases even RCE. SCRIPT_NAME With a configuration such as the following: The main issue will be that Nginx will send any URL to the PHP interpreter ending in .php even if the file doesn’t exist on disc. This is a common mistake in many Nginx configurations, as outlined in the “Pitfalls and Common Mistakes” document created by Nginx. An XSS will occur if the PHP-script tries to define a base URL based on SCRIPT_NAME USAGE OF $URI CAN LEAD TO CRLF INJECTION Another misconfiguration related to Nginx variables is to use $uri or $document_uri instead of $request_uri. $uri and $document_uri contain the normalized URI whereas the normalization in Nginx includes URL decoding the URI. Volema found that $uri is commonly used when creating redirects in the Nginx configuration which results in a CRLF injection. An example of a vulnerable Nginx configuration is: The new line characters for HTTP requests are \r (Carriage Return) and \n (Line Feed). URL-encoding the new line characters results in the following representation of the characters %0d%0a. When these characters are included in a request like http://localhost/%0d%0aDetectify:%20clrf to a server with the misconfiguration, the server will respond with a new header named Detectify since the $uri variable contains the URL-decoded new line characters. 6. Raw backend response reading With Nginx’s proxy_pass, there’s the possibility to intercept errors and HTTP headers created by the backend. This is very useful if you want to hide internal error messages and headers so they are instead handled by Nginx. Nginx will automatically serve a custom error page if the backend answers with one. But what if Nginx does not understand that it’s an HTTP response? If a client sends an invalid HTTP request to Nginx, that request will be forwarded as-is to the backend, and the backend will answer with its raw content. Then, Nginx won’t understand the invalid HTTP response and just forward it to the client. Imagine a uWSGI application like this: And with the following directives in Nginx: proxy_intercept_errors will serve a custom response if the backend has a response status greater than 300. In our uWSGI application above, we will send a 500 Error which would be intercepted by Nginx. proxy_hide_header is pretty much self explanatory; it will hide any specified HTTP header from the client. If we send a normal GET request, Nginx will return: But if we send an invalid HTTP request, such as: We will get the following response:

  • Lambda function reduces python scripting lines by 80%

    In Python, traditionally the functions are declared with the def keyword, while anonymous functions are defined without a name using the lambda keyword. The syntax of a Lambda function is - lambda arguments: expression Lambda functions can take any number of parameters but can only execute one expression. We use lambda functions when we require a nameless function. At first, Lambda functions seem difficult to grasp. They are brief in length yet can be a challenge for a beginner. So, in this blog, you'll discover the potential of lambda functions in Python and how to apply them to fundamental list and data frame operations. Let us first load the pandas library and a sample dataset to work on: >>> import pandas as pd >>> from vega_datasets import data >>> df = data.barley() >>> df Output: List Operations >>> site_names = df['site'].unique().tolist() Traditionally, we use for loops to iterate through a list of elements and apply simple functions. But these for loops can be inconvenient, making the Python code big and untidy. Let us see an example of a for loop and how we can efficiently obtain similar results through Lambda. >>> for i in site_names: >>> i = ''.join(i.split()) >>> i = i.lower() >>> print(i) Output: 1. Example using Map() The map() method uses a lambda function and a List and performs the lambda function to all the elements and returns a new List. >>> a = site_names >>> b = list(map(lambda x: ''.join(x.split()).lower(), a)) >>> print(b) Output: 2. Example using Filter() The filter() method uses a lambda function and a List and performs the lambda function to all the elements while filtering the data. >>> yield_list = df['yield'].tolist() >>> sub_list = list(filter(lambda x: x > 50, yield_list)) >>> sub_list Output: 3. Example using Reduce() Using the Reduce() function, the function described by lambda is applied to the first two elements and the result is stored. Thereafter, the function is next applied to the result and third element, and so on. Finally, the list is reduced to a single value at the end. >>> from functools import reduce >>> reduce(lambda a,b: a if (a > b) else b, sub_list) Output: Dataframe Operations 1. Add a new column by applying function on an existing column using Dataframe.assign() >>> df = df.assign(yield_Percentage = lambda x: (x['yield']/df['yield'].sum()) * 100) >>> df Output: Here, we created a new column ‘yield_Percentage’, and populated it by converting the yield values to percentages. 2. Add a new column using if-else on an existing column using Dataframe.apply() >>> df['yield_category'] = df['yield'].apply(lambda x: 'Low' if x < 40 else 'High') >>> df Output: Here, we created a new column ‘yield_category’ and using an if-else condition on the column ‘yield, assigned ‘Low’ if the yield is less than 40 units or else ‘High’. 3. Iterating over dataframe using Dataframe.apply() Similar to the Map() function, the Apply() method takes a function as input and applies it to the entire dataframe. First, we define a function: >>> def filtering(site, yield_Percentage): >>> if(site in ['University Farm', 'Waseca', 'Morris']) and yield_Percentage > 1: >>> return 1 >>> else: >>> return 0 Secondly, the lambda function is used to iterate across the rows of the dataframe. For every row, we feed the ‘year’, ‘site’, and the ‘yield_Percentage’ column to the filtering function. Finally, axis=0 or axis=1 is mentioned to specify whether the operation is to be applied to the columns or rows, respectively. >>> df["invest"] = df.apply(lambda row: filtering(row["site"], row["yield_Percentage"]), axis=1) >>> df Output: Here, we created a new column ‘invest’ based on the function ‘filtering’ where value 1 is assigned to the rows where yield percentage in the sites 'University Farm', 'Waseca', 'Morris' is more than 1, and otherwise 0.

bottom of page