Introduction to Conflict Event Data

Published

June 16, 2026

General Introduction

Monitoring violent conflicts is difficult yet important for various actors. If you’re working in an international NGO, you may want to trace the progress of violent clashes to direct humanitarian aid or forecast the expansion of battlefields. If you are working in research, you are probably interested in identifying conflict-prone areas and episodes, and correlate them with potential causes or consequences of violence. And if you are working in the news and information branch, you probably want to provide real-time updates of unfolding battles and embed them in the historical context of the conflict.

For all these purposes, different research institutions took up the challenge to provide accurate and reliable information on conflicts and battles. However, the supply of data is vast, and sources differ in their method, coverage, resolution, and focus. For data scientists, it is therefore important to maintain an overview of the different datasets that are available, and carefully evaluate which dataset is best suited for the task at hand.

This document provides and overview and discussion of the three most common conflict event datasets currently in use:

  1. The Geocoded Event Database (GED) from the Uppsala Conflict Data Program (UCDP)
  2. The Armed Conflict Location & Event Data Project (ACLED)
  3. The Global Terrorism Database (GTD)

Bear in mind that these datasets are currently the most commonly used conflict event datasets. Yet, there are other, new conflict data projects being developed and improved over time, which may at some point win their spot in the list of the most commonly used conflict event datasets. One of them is the Global Database of Events, Language, and Tone (GDELT), which sources a large amount of news and other real-time data to constantly provide up-to-date information. However, extracting GDELT data is (at the moment) significantly more cumbersome than drawing from the ready-to-use conflict event datasets named above, which is one of the reasons it is not being used as much yet.

In addition, there are numerous high-accuracy datasets available for specific conflicts. The Significant Activities (SIGACTs) dataset introduced in Shaver & Wright (2016) provides highly detailed information on the Afghanistan war from 2002 until 2014. Other than the main databases named above, the SIGACTs dataset does not rely on news reports, but on actual military ground reports from the U.S. army and other international actors. Similar datasets are available for other specific conflicts. Such conflict-specific datasets provide a much more accurate account of battles and other conflict-related incidents, suffering much less under media-bias (see below). Yet, datasets like SIGACTs come with significant disadvantages, too. For one, they only cover single conflicts and do not allow to draw empirical comparisons across geographic contexts or different actors. Second, data based on military records are often not openly available, but require negotiations with the responsible governments and may oppose restrictions for their usage. And finally, while military-based conflict data are less prone to media-bias, they inherently provide one-sided information. This is the main difference between these types of datasets. News-based data sources can compare information from different news outlets and generate consistent information. Military reports do not allow to cross-check all information with independent sources and can hence provide a biased view of the conflict.

Introduction and Comparison of the Main Conflict Event Databases

All three main datasets listed above offer a comprehensive account of the main civil violence episodes of the past decades. In addition, all three receive regular updates to also include more recent episodes of violence. While all three are Conflict Event Databases, they differ in their coverage and information focus, as well as in their underlying methods to collect the data. This paragraph wants to introduce the three datasets and point out each datasets advantages and disadvantages.

We will start with a general introduction into the nature and Characteristics of Conflict Event Data. Afterwards, we will look at different ways to Analyze Event Data, before we dig deeper into each of the Specifics of each Data Base.

What are Conflict Event Data?

We use the term Conflict Event Data to describe datasets that were compiled at the level of singular events. This means that each conflict-related event receives an own entry in the dataset - if it suffices the dataset’s definition of conflict. We will discuss below the differences across the datasets, i.e., how they differ in their definitions “conflict-related events” as a restriction to include them in the dataset, and what type of information they provide for each of their events. In this paragraph, we will first talk about the general characteristics of event data across the sources.

What is a conflict event?

Generally speaking, a conflict event describes any activity by a state or non-state actor, confined in space and time, that is meant to promote achieving a political or economic goal with violent means. This definition is quite broad, and each dataset focuses on different aspects of this definition to set the theme of events they cover (see below for each dataset’s actual definitions). The GTD restricts itself to non-state actors, i.e. terrorist groups, and only includes violent events where people were harmed. ACLED, on the other hand, employs the broadest definition of conflict events. They code any types of activities that can be somehow linked to insurgent activities. This lets them include activities of “strategic developments,” like changing the location of an insurgent group’s headquarters. Such an activity does not involve any fighting or harm, but can be useful for analyzing the development of insurgencies.

It is therefore important to check each source’s definition of what they consider a conflict event. While all conflict events across sources share a couple of characteristics, the distinctive natures of the events covered in each dataset can make a blind comparison problematic. For example, even though we can easily compare the locations of events across datasets, e.g., ACLED and GTD using GIS, we should think whether it makes sense to compare terrorist attacks and headquarter movements for the theory we want to test.

Characteristics of conflict events

We can describe all events of any kind in space and time. Depending on the type of analysis we have in mind, we must select the spatial or temporal resolution that works best for us. Similarly, we need to be careful with the accuracy of the data when we plan to work at high resolutions. All good databases provide an accuracy coding that tells you for each event what the highest resolution is at which you can work with it. For example, many events cannot be exactly pinned down in space. Take the movement of an insurgent group’s headquarters; this is something that the group would never make public. Usually, information on such events comes from rumors or intelligence reports and lack the spatial detail to code it exactly. It is therefore common that databases only have information of the sort “rebels moved their headquarter somewhere in Northern Nigeria.” This lack of detail will be mirrored in the events’ coding.

All conflict event databases provide information on space and time. Depending on the source, you will get additional information for each event (see below for details). Among other things, all the major databases attach an estimate of the total fatalities caused by an event. Datasets like GTD and UCDP GED specialize on tracing the identity of perpetrators and victims. They will therefore provide you with information on the (terrorist/insurgent/government) groups involved on the sending and receiving sides of a violent event. ACLED also provides additional details on the type of event, e.g., whether it was an open battle, a strategic development, or a bombing. Below, we will review the common characteristics of conflict event data: space, time, and fatalities.

Space: To locate conflict events in space, all databases will provide you with information in two dimensions: longitude and latitude. These form coordinates in the so-called WGS84 projection, which means that each number provides the degrees north/south of the equator (latitude) and the degrees east/west of Greenwich (longitude). These coordinates help you to define each event as a point in space, and relate it to different geometries. Without going too much into detail, note that there are a lot of different projections for spatial data. WGS84 is the most common one, and the only one using degrees. Other projections usually place points in meters north and east of a certain reference point. Most Geographic Information Systems (GIS) make it quite easy to transform datasets from one projection into another.

You will have to make sure though that all geometries come in the same projection. This is, you can only compare your WGS84 conflict event points with degrees longitude and latitude to other objects that are also measured in longitude and latitude in the WGS84 projection. Many spatial datasets use different projections, e.g. regional-specific projections that are more accurate than WGS84 when zooming in on that region, or the so-called Mollweide projection which makes sure to keep areas accurate at the expense of accurate directions, the main strength of WGS84. If you encounter sources with different projections, make sure to transform your conflict event dataset to this other projection (or vice versa) before starting your analysis.

In addition, always keep an eye on the accuracy attached to the coordinates given in the dataset. All the big datasets provide accuracy levels that help you select which conflict events are suited to your analysis. Usually, the datasets use numerical codes to distinguish different confidence level. For events with the highest resolution, we usually know the exact location, e.g., the village that was attacked or the spot where a car bomb exploded. Note already (and see below for more) that ACLED deviates here a bit as their highest resolution assigns coordinates based on the closest town to where an event occurred. If the information based on which an event was coded is less rich, the coordinates will be less precise. The accuracy level will therefore tell you that the coordinates provided for an event only identify, e.g., the administrative capital or geographic center of the district, province, or even country where an event occurred.

Make sure that you only use events that match the spatial resolution of your analysis. Let’s say you want to compare the locations of conflict events to locations of mining activity, as Berman et al. (2017) do in their paper. Then, you likely downloaded a spatial point dataset with coordinates that identify the exact locations of mines. In the next step, you’ll want to geometrically relate the location of these mines to the locations of conflict events of the ACLED or UCDP GED dataset. For example, you can calculate a distance matrix that tells you for each mine-conflict combination how far apart they are. The values in this matrix will however only be informative for conflict events with accurately identified locations. If you include events at a lower resolution, this might introduce severe noise in your dataset. Assume that there are some conflict events at very low resolution, such that the dataset only gives you the coordinates of the country capital as the coders did not know where to better put this event inside the country. By computing the distance between conflict event and mines, you will as a result measure the distance between mines and a country’s capital. This is likely not what you want. So, restricting your conflict dataset to events with the highest resolution is highly recommended. However, if your goal is to aggregate conflict events at the country level, e.g. to compare the number of events or fatalities across countries, even the lowest resolution in the data will work out for you. Selecting conflict events based on their spatial accuracy therefore strongly depends on the goal you have in mind, and on the spatial resolution of the other data you want to work with.

Time: The second dimension associated with a conflict event is time. Again, the accuracy and coding of the temporal component of conflict events differs across the data sources. Most commonly, databases will assign certain dates to an event to identify the exact day when a given event took place. For some events, e.g. riots or strategic developments in the ACLED database, the time stamp will identify more of a period than a date. For example, if the government moved its troops from one front to another in the course of two weeks, this two week-period will be associated with the event.

As for geographic locations, the information on which the coding of event was based is often not detailed enough to narrow the event down to an exact day. An event may be coded based on a local news report that reads “last week an attack of insurgents killed 10.” This report therefore does not allow pinning the event down to an exact date, which will be mirrored in the event coding by an estimated date and a lower confidence level for the temporal coding precision.

Yet, temporal inaccuracies usually pose less of a problem than spatial inaccuracies. More often than not, you will anyways find it helpful to aggregate your data to a higher temporal resolution. It is often cumbersome to summarize the development of a conflict in days, so turning to weeks, months or even years may prove more traceable. And when you plan to relate conflict events to other data that vary over time, e.g. GDP or aid disbursements, you will find that these other sources already come at lower temporal resolutions, e.g. months or years. So the temporal aggregation of conflict events will anyways be part of the data preparation process, and small inaccuracies in the temporal coding will not harm your analysis.

Fatalities: All databases provide estimates of the total fatalities involved in a conflict event. The UCDP GED dataset even distinguishes between fatalities on either of the two sides of a battle (i.e. the initiator and the target), civilian fatalities, and “unknown” fatalities that cannot accurately be ascribed to either of these three groups.

It is important to emphasize though that fatality counts are only estimates. No data source attempts to distinguish their fatality data by accuracy because fatality information is inherently imprecise for several reasons. First, (news) reports used for coding an event often provide only rough estimates for events with many casualties, e.g. “hundreds were killed.” Second, sources often disagree, i.e., reporters gather their information from different sources (e.g. initiator, target, civilians, ambulance,…), who will come up with different numbers. It is therefore common that conflict databases make a judgement call to, e.g., use the most trustworthy report, provide fatality averages across reports, or only give a rough fatality count.

Note further that fatality numbers are often mis-reported on purpose. Depending on the strategic goal, the initiators as well as the targets of a violent event can have incentives to purposefully over- or under-report the numbers. For example, groups targeted by violence may either over-report fatalities to receive more attention of international bystanders, or they may decide to under-report their fatalities to portray their enemy’s attack as less successful.

You should therefore take fatality numbers always with a grain of salt when you use them for your analysis. Be aware that numbers may be systematically biased, and do not over-interpret small differences in fatality numbers across regions, time, or type of event/perpetrator/victim. It is also advisable to check the conclusions you draw from your analysis for robustness to different ways of arranging your data. For example, do you receive similar results when looking at the plain numbers of fatalities and when aggregating them into categories and/or looking at the number of events (with fatalities above a certain threshold) instead of the number of fatalities?

How to Analyze Conflict Event Data

This section outlines three strategies to leverage conflict event data in empirical analyses. First, we discuss analyzing geometric relationships such as distances between battles and mines, towns or roads. Depending on the research question, conflict will enter either as an outcome or as a treatment for other variables. Next, we consider spatial aggregation, where events and covariates are mapped onto polygons (e.g., regular grid cells, administrative districts, or countries). This generates common cross-section or panel datasets with consistent spatial units fit for regressions. Finally, when modeling conflict intensity as a dependent variable, we highlight the pitfalls of treating event counts as continuous (e.g., heteroskedasticity, zeros, overdispersion) and recommend Poisson pseudo–maximum likelihood (PPML) and related count-data estimators that are robust to these features while accommodating fixed effects.

Geometric relationships

In geometric relationship designs, the unit is typically the conflict event (a point), which we relate to other geographic features. These other features can be points (mines, towns, bases), lines (roads, rivers, borders), or polygons (administrative areas, protected zones). We are interested to study whether conflict events and certain other features are more likely to occur close to each other than a purely random spatial distribution would suggest. Such analysis relies on spatial concepts such as bilateral distances, buffers, or nearest-neighbor links, which one can compute with GIS software such as ArcGIS or R implementations.

In a sense, geographic relationship tests ask whether the locations of conflict events are systematically correlated with observable features. In practice, we check whether events fall unusually close to points (e.g., mines), align with lines (e.g., roads/borders), or concentrate inside certain polygons (e.g., border districts). This can be implemented with simple mean-difference or more sophisticated distributional tests. For example, one can compare the distribution of distances between conflict events and mine locations to the distribution of distances between conflict events and randomly set points as “placebo”-locations in Monte Carlo Simulations. If the distances of conflict events to mines are systematically smaller than the distances to these randomly set placebo points, it suggests a spatial relationship between conflict and mines. Similarly, spatial diagnostic statistics such as Ripley’s K or Moran’s I can help quantify co-location over scales. In a sense, these statistics compare the actual spatial distributions to spatial noise and quantify how different from this noise (i.e. how clustered) they are.

Spatial aggregation

Another analysis option is a regression analysis based on a spatial aggregation of geodata. Spatial aggregation means that we overlay our spatial features (e.g. points and lines) with a fixed spatial template, i.e. a predefined set of polygons. The template can be regular grid cells (e.g., 25-km squares), administrative districts (ADM2/ADM3), or substantive zones (e.g., border belts or protected areas). The goal is a tidy cross-section or panel where each row is a polygon(-period) and columns contain aggregated outcomes and covariates. For conflict data, two simple implementations dominate: (i) indicators (did at least one event occur in the polygon during the period?) and (ii) counts (how many events or fatalities did occur inside a given polygon at a given time?). Other variables are aggregated with the function that fits the construct of the variable. One can, for example, take sums (for dollars spent on development aid projects), means/medians (rainfall, night lights), maximum or minimum (road quality), or rates (nightlights per 100k people or per km²).

If you plan a panel analysis, it is important to pick polygons that are stable over time (or use harmonized historical boundaries) and work in an equal-area or otherwise appropriate projection so areas and distances are meaningful. Choose the time window (monthly/quarterly/yearly) alongside the spatial unit. Once aggregated, one can commence with the empirical analysis. If you are planning to explain the occurrence of conflict by other variables (i.e. if conflict is your dependent variable), here is a short note of caution. Conflict event data aggregated to a spatial template always come in one of two forms: a binary indicator or a count variable. Both have their perks when using them as dependent variables in regressions. Binary outcomes invite linear probability or logit/probit estimations. Count variables include many zeros and are often overdispersed, i.e. have some rare observations with very high values. In the next subsection, we explain a typical regression set-up and discuss how to best handle conflict as the dependent variable.

Regression analysis (cross-sections and panels)

A regression links an outcome variable (\(Y\)) to explanatory variables in order to quantify how they move together, holding other factors constant. Assume you are interested in explaining the occurrence of conflict events by other indicators. Then, your outcome variable \(Y\) is a measure of conflict intensity aggregated to polygons and time.

\[ Y_i \;=\; \delta\,D_i \;+\; X_i^\top\beta \;+\; u_i \]

where - \(Y_i\) measures the occurence of conflict in polygon \(i\) as an indicator (conflict yes/no) or count (how many) variable. - \(D_i\) represents the explanatory variable, i.e. the variable for which you want to test whether it determines the occurence of conflict. - \(X_i\) are control variables. Depending on your theory of change, these should be variables of which you think that they determine both conflict and your variable of interest \(D_i\) (e.g., population, night lights, rainfall). - \(\delta\) is your parameter of interest—the association between \(D_i\) and \(Y_i\) after adjusting for \(X_i\). - \(u_i\) is all the variation left in \(Y_i\) after you included \(D_i\) and your controls.

Cross-sections are simple. Often, they are more feasible than panel estimations because you do not require repeated observations of your other variables of interest over time. However, cross-sections have the severe shortcoming that they cannot difference out time-invariant confounders. Especially for conflict, these time-invariant confounders are important: some locations just witness more conflict than others, maybe for historical reasons, because they house (secret) headquarters of rebel groups, or because they are just bigger/more populous. Not all of these confounders are observable, which means we cannot control for them.

Panel data allow you to conduct fixed effects regressions. These can account for a large part of these unobservable confounders. A typical panel regression looks somewhat like this:

\[ Y_{it} \;=\; \delta\,D_{it} \;+\; X_{it}^\top\beta \;+\; \mu_i \;+\; \tau_t \;+\; u_{it} \] See what is new: the variables of interest now vary over two dimensions, polygons \(i\) and time \(t\). In addition, we include two further parameters. These are unit (polygon) fixed effects \(\mu_i\), which account for time-invariant characteristics of place \(i\), and time fixed effects \(\tau_t\) to control for common shocks and trends at time \(t\).

The quantity we care about in both types of regressions is \(\delta\). It measures the relationship between conflict and your variable of interest \(D\) while holding all other components included in the regression constant. Usually, we are interested in a causal effect of \(D\) on \(Y\), i.e. whether a change in \(D\) leads to a change in \(Y\). To interpret \(\delta\) as such requires to assume that there is no unobserved determinant of \(Y\) (i.e. no part of \(u\)) that is also associated with \(D\). Without going too much into detail, let us just note that this is usually a very difficult assumption to make and requires a lot of different tests, discussion, and probably more sophisticated estimation methods than panel regressions with fixed effects. Hence, one should be careful in interpreting \(\delta\) as the effect of \(D\) on \(Y\).

Another difficulty in our setup concerns the looks of the dependent variable \(Y\). If conflict is your dependent variable \(Y\), it is either binary (any event this period) or a count (events/fatalities). Standard statistical estimators like Ordinary Least Squares (OLS) can get the relationship between \(D\) and \(Y\) in such a setting quite wrong.

If you construct \(Y\) as an indicator variable, an OLS regression would estimate a so-called linear probability model (LPM). You estimate it in any software just as you would estimate a simple OLS regression (i.e. the command and estimation procedure are the same), but tend to give it this specific name to highlight its difference from “normal” OLS regressions. In a LPM, you can read \(\delta\) as the change in probability (in percentage points) of \(Y\) per unit change in \(Y\). While an LPM can be tricky if you want to look at conflict predictions, it does perform quite well in terms of inference for \(\delta\). If you care about predicting conflict likelihood, Probit or Logit regressions perform better. But they have the important drawback that they become heavily biased if you use them together with fixed effects - which are so important in conflict regressions!

For count data on the left hand side of your regression, a practical workhorse is the Poisson pseudo–maximum likelihood (PPML) estimator (see Santos Silva & Tenreyro 2006). It handles many zeros, heteroskedasticity, and overdispersion in the presence of strict fixed effects without any difficulty. You then interpret \(\delta\) as the percent change in \(Y\) for a one-unit increase in \(D\). It is important to note however that with count data, OLS estimations can result in severely biased estimates. OLS assumes a normal distribution of the dependent variable. Count variable like conflict incidence are - even if you take their logarithm - often far away from such a normal distribution. OLS is just not made for such variables. Hence, always use estimators fit for count data - Poisson, Negative Binomial, or best PPML - for this sort of variables on the left hand side.

Differences across the Conflict Event Data Sources

At a high level, UCDP GED, ACLED, and GTD are all event-level datasets that are updated regularly and cover most episodes of civil violence in recent decades. But they differ in what they record and how they build the data. GTD focuses on terrorist attacks by non-state actors that cause harm. ACLED uses the broadest event concept, including some non-violent “strategic developments” (e.g., the reported relocation of rebels’ headquarters). The geographic GED dataset from UCDP is battle-oriented and tracks fatalities while distinguishing which sides and civilians were involved. Depending on what one wants to investigate, one or the other dataset is the better choice. For more information on the respective datasets, see Sundberg & Melander (2013) (UCDP), Raleigh et al. (2010) (ACLED), and LaFree & Dugan (2007) (GTD).

In addition to these “big three,” GDELT offers massive, near–real-time coverage but is harder to extract for ready-to-use conflict analyses (see Leetaru & Schrodt 2013). And there are also very detailed and reliable conflict-specific datasets like SIGACTs that are based on military intelligence, even though these can be far more narrow in scope and inherently one-sided. These trade-offs matter: cross-country research typically favors standardized public datasets (GED/ACLED/GTD), whereas theater-specific work may gain from military logs if access is feasible.

Critique and Caveats

All conflict event datasets come with their own advantages and disadvantages. Across all sources, there are two common caveats that you should keep in mind when using them: measurement bias and precision. All sources rely on media reporting on conflict events. Datasets vary on how much they rely on it - GDELT is purely relying on (social) media reports without much human coding, while e.g. GTD and UCDP GED invest a lot of work in cross checking publicly available information on events. Still, all these media-based pipelines can miss or wrongly characterize events. The reporting on events can depend on the availability of journalists on the ground and the freedom of the press in the conflict country. In addition, reports on the number of fatalities often vary across partisan sources. Here, UCDP GED does the best job in making researchers aware of such reporting bias. They provide information on the lowest, highest, and mean number of reported fatalities across sources for each event. Other databases like GDELT or ACLED are more likely to take numbers from much-cited media sources at face value.

Similarly, not all events can be coded to the exact location based on the available media information. Datasets like UCDP and ACLED provide precision scores that reflect how certain the coders are that they can pin down the exact location of an event. In many instances, newspapers and media sites only report on lose geographic locations, e.g. “rebels attack government troops Eastern Congo.” Conflict event datasets want to provide you with point coordinates for each event. In such vague cases, they might just provide you with the centroid of the potentially large region mentioned in the source. This can mean that the coded location can be hundreds of miles away from where the attack actually occurred. If you ask very local/detailed research questions, make sure to focus only on events where the precision level is high. And again, be aware that the number of precisely coded events depends on the availability of detailed (and free) news reporting from the conflict zones.

The most reliable reporting on occurrences and severity of events typically originates from specialized military logs. Because they rely on military intelligence, the data are usually very detailed and precise. However, military logs only reflect the information (and views) from one conflict actor. This, as well, can induce severe reporting biases in the data. Yet, information on the precision (and bias) in reporting is usually not available in data based on military logs.

Event definitions and the (non-)inclusion of events

Different inclusion rules mean that “the universe of events” is not the same across sources. GTD restricts to terrorist attacks by non-state actors that cause harm. ACLED deliberately casts a wider net and records activities linked to insurgencies, including non-violent moves. This, however, can lead to the inclusion of very specific types of events that might not reflect the type of conflict events you are interested in. For example, ACLED includes tactical developments such as the establishment of new rebel headquarters as event locations. If you are interested in violent events, you might want to filter out such data points. Other examples highlighted in the literature are reports on violent acts of (undefined) rebel groups that do not directly involve humans, such as an unknown group of perpetrators stealing cows from a local farmer. If you want to make sure to capture lethal events only, you might want to filter ACLED data on events with at least one (human) casualty.

UCDP GED centers on organized violence and reports rich fatality breakdowns. However, UCDP GED focuses on ongoing conflicts of known and identified violent groups. For a conflict event to show up in UCDP GED, it must be linked to an ongoing conflict between at least two entities (of which one can be the national government) that mandated at least 25 fatalities in a given year. This means that smaller events like one-off attacks of rebel groups, or even bigger riots that cannot be associated with an organized rebel group, do not show up in UCDP GED. For this reason, you will find that ACLED provides much more events on a given country and year than UCDP GED. However, you will have to decide what kinds of events you are interested in including into your study.

One more thing to keep in mind is that the different sources vary time coverage. GTD, while focusing on terrorist events only, provides the longest time frame going back to 1970. UCDP GED starts in 1989. For ACLED the coverage depends on the world region. ACLED started as a coding project for the African continent, and then only later expanded to other world regions. Here is a short overview of data availability by time and source:

  • GTD: Global coverage, includes events since 1970 and are published with temporal lag (e.g. in 2025, data are available until 2020). The year 1993 is missing: the data for this year literally fell of a truck while moving the database’s harddrives to another office location.

  • UCDP GED: Global coverage, includes events since 1989 (non-geocoded reports are available for earlier years too). Data are available for full years, e.g., the version 25.1 available in the year 2025 contains events until December 31st, 2024.

  • ACLED: Here, availability depends on the region of study, but the database provides almost real-time tracking of events with only some days of delay in coding:

  • Africa: 1997 - present.

  • South and Southeast Asia: 2010 - present.

  • Middle East: 2015 - present.

  • Latin America and Caribbean: 2018 - present.

  • East Asia: 2010 - present.

  • Europe: 2020 - present (coverage for some countries starting 2018).

  • North America: 2020 - present.

  • Oceania: 2021 - present.

Overall, a comparison that treats “an event” as interchangeable across sources can be misleading: GTD terrorist bombings, open battles between rebels and governments in UCDP, and headquarter relocations as captured in ACLED all involve quite different phenomena. Always align the dataset’s event concept with your theory and question (LaFree & Dugan 2007; Raleigh et al. 2010; Sundberg & Melander 2013).

Choosing the “right” dataset: a short guide

  • Violence-only vs. wider political activity.
    If you need violent incidents only (battles, attacks, killings), UCDP GED is the best fit. If you want to focus exclusively on terrorist events, GTD is the better fit. If you are interested in all sorts of social upheaval including also protests or strategic moves, consider ACLED.

  • Resolution and precision.
    If you plan fine-grained spatial tests (e.g., distances to mines/roads), make sure to filter your dataset of choice to high-precision events according to the dataset’s accuracy codes. If you want to aggregate conflict events to bigger spatial entities like provinces, you can include events with lower precision, too. Always check and use the datasets’ precision codes before you use/aggregate the data for your analysis!

  • Coverage vs. depth.
    If you want to use global, comparable panels with a longer time frame, UCDP GED or GTD (for terrorism) are the best choice. If you are interested in more detailed data on restricted regions such as Africa or Asia, ACLED might be the right choice. If your analysis focuses on a single war theater, consider finding conflict-specific military logs such as SIGACTs.

Bottom line. Start from your question and theory, then pick the dataset whose event definition, precision, and coverage match it. Use accuracy codes and sensible aggregation to align resolution on all inputs. Start your research with predicting occurrence and number of events, and treat the much more noisy fatality counts as robustness checks. If you model count data, consider PPML with fixed effects instead of simple OLS regressions.

References

  • Berman, Nicolas, Mathieu Couttenier, Dominic Rohner, & Mathias Thoenig (2017): “This Mine is Mine! How Minerals Fuel Conflict in Africa.” American Economic Review 107(6): 1564–1610.
  • LaFree, Gary, & Laura Dugan (2007): “Introducing the Global Terrorism Database.” Terrorism and Political Violence 19(2): 181–204.
  • Leetaru, Kalev, & Philip A. Schrodt (2013): “GDELT: Global Data on Events, Location, and Tone, 1979–present.” Paper presented at the International Studies Association Annual Meeting.
  • Raleigh, Clionadh, Andrew Linke, Håvard Hegre, & Joakim Karlsen (2010): “Introducing ACLED: An Armed Conflict Location and Event Dataset.” Journal of Peace Research 47(5): 651–660.
  • Shaver, Andrew C., & Austin L. Wright (2016): “Are Modern Insurgencies Predictable? New Evidence from the Afghanistan and Iraq Wars.” Working Paper.
  • Sundberg, Ralph, & Erik Melander (2013): “Introducing the UCDP Georeferenced Event Dataset.” Journal of Peace Research 50(4): 523–532.
  • Santos Silva, J. M. C., & Silvana Tenreyro (2006): “The Log of Gravity.” The Review of Economics and Statistics 88(4): 641–658.